Mastodon Federation Queue Stuck on Sidekiq: Recovery Steps

When the federation queue in Mastodon gets stuck, new posts from other instances stop appearing in your federated timeline. Outgoing messages to remote servers also fail to deliver. This problem is almost always caused by one worker process in Sidekiq that is processing a job that never finishes. This article explains why the queue freezes and provides the exact steps to clear the stuck job and restart normal federation.

Key Takeaways: Recovering From a Stuck Mastodon Federation Queue

Sidekiq Web UI > Busy tab: Identify the specific job that has been running longer than 10 minutes and is blocking the queue.
systemctl restart mastodon-sidekiq: Restarts all Sidekiq processes after you manually kill the stuck job to clear the queue.
RAILS_ENV=production bin/tootctl accounts unfollow: Removes a problematic remote account that may be causing repeated stuck jobs.

Why the Federation Queue Stops in Sidekiq

Mastodon uses Sidekiq to process background jobs, including all federation tasks. Each job represents an action such as delivering a status to a remote server or processing an incoming ActivityPub message. Sidekiq runs multiple worker threads, but the federation queue is processed by a dedicated sidekiq process named sidekiq_federation by default.

When one job in this queue encounters an error it cannot handle, the job may enter a retry loop or become stuck in a running state. A common cause is a malformed ActivityPub payload from a misconfigured remote instance. The job tries to deserialize the payload, hits an unhandled exception, and retries indefinitely. Because Sidekiq processes jobs sequentially within a queue, this single stuck job blocks all other federation jobs behind it.

Another cause is a network timeout that never resolves. If a remote server is down but does not close the TCP connection, the Sidekiq worker waits for the timeout to expire. The default timeout in Mastodon is 10 seconds per HTTP request. If the worker is stuck waiting for a response that never comes, the entire federation queue freezes until you manually intervene.

Steps to Clear a Stuck Federation Queue

Perform these steps on the server running Mastodon. You need SSH access and sudo privileges.

Access the Sidekiq Web UI
Open https://your-instance.example.com/sidekiq in a web browser. If you have not set up the Sidekiq web interface, you can enable it by adding the sidekiq gem to your Gemfile and mounting the route in config/routes.rb. The default credentials are the same as your Mastodon admin login if you use the require 'sidekiq/web' integration.
Identify the stuck job on the Busy tab
Click the Busy tab. Look for a job that has been running for more than 10 minutes. A normal federation job completes in under 5 seconds. The stuck job will show a long elapsed time, often exceeding 30 minutes. Write down the job ID, which is a string of hexadecimal characters.
Kill the stuck job from the command line
SSH into your Mastodon server. Run the following command to kill the specific Sidekiq process handling the stuck job:
sudo systemctl kill -s SIGTERM mastodon-sidekiq
This sends a graceful termination signal to all Sidekiq workers. The stuck job will be interrupted and moved to the retry queue. If the job does not stop, use sudo systemctl kill -s SIGKILL mastodon-sidekiq to force stop it. Then restart Sidekiq with sudo systemctl restart mastodon-sidekiq.
Delete the stuck job from the retry queue
After restarting, open the Sidekiq Web UI again and click the Retries tab. Find the job you killed. Click the Delete button next to it to remove it permanently. Do not retry the job because it will likely fail again with the same error.
Check the logs to find the root cause
Run sudo journalctl -u mastodon-sidekiq -n 100 to view the last 100 log lines. Look for error messages related to the stuck job. Common errors include ActivityPub::DeliveryWorker failures with HTTP timeout or ActivityPub::ProcessingWorker failures with JSON parsing errors. Note the remote server domain mentioned in the error log.
Remove the problematic remote account
If the error repeatedly involves a specific remote account, remove that account from your instance. Use the following command:
RAILS_ENV=production bin/tootctl accounts unfollow @
This removes all follows and stops future federation attempts with that account.

If the Federation Queue Still Stops After Recovery

Stuck job reappears after restart

If the same job reappears in the retry queue after you delete it, the remote instance is sending malformed data repeatedly. Block the entire remote instance. Run RAILS_ENV=production bin/tootctl domain block . This prevents any further federation with that domain. After the block, restart Sidekiq again with sudo systemctl restart mastodon-sidekiq.

Federation queue grows but never shrinks

If the queue size increases but jobs are not completing, you may have a memory issue. Check the Sidekiq memory usage with sudo systemctl status mastodon-sidekiq and look for the memory column. If memory usage exceeds 500 MB per worker, increase the number of workers or reduce the concurrency setting. Edit .env.production and add SIDEKIQ_CONCURRENCY=5 to limit concurrent threads. Restart Sidekiq after the change.

Sidekiq Web UI shows no Busy tab but queue is stuck

If the Sidekiq Web UI does not show any busy workers but the federation queue is not moving, the sidekiq process may have crashed silently. Check the process status with ps aux | grep sidekiq. If no sidekiq processes are running, start them with sudo systemctl start mastodon-sidekiq. Then monitor the logs to see if the queue starts processing.

Sidekiq Federation Queue vs Sidekiq Default Queue

Item	Federation Queue	Default Queue
Purpose	Deliver and receive ActivityPub messages to and from remote instances	Process local tasks such as sending emails, generating thumbnails, and updating timelines
Worker process name	`sidekiq_federation`	`sidekiq_default`
Typical job duration	Under 5 seconds	Under 2 seconds
Blocked by a single stuck job	Yes, because it processes jobs sequentially	Generally no, because multiple workers run in parallel
Recovery action	Kill the stuck job and delete it from retries	Restart the sidekiq process or increase worker count

You can now identify and clear a stuck federation queue in Mastodon by using the Sidekiq Web UI and the command-line tools. After removing the offending job and optionally blocking the problematic remote instance, the federation queue should resume normal processing. As an advanced tip, consider setting up a monitoring alert that notifies you if any Sidekiq job runs longer than 60 seconds. Use a tool like Prometheus with the sidekiq_job_duration_seconds metric to automate this check.

← Back to WiseChecker Home More in Windows & PC