Mastodon ‘Sidekiq Queue Backlog’ Causing Delays: Recovery
🔍 WiseChecker

Mastodon ‘Sidekiq Queue Backlog’ Causing Delays: Recovery

When your Mastodon instance suddenly feels sluggish, notifications arrive hours late, or posts fail to federate, a Sidekiq queue backlog is often the culprit. This backlog occurs when background jobs pile up faster than your server can process them. In this article, you will learn why this backlog happens, how to clear it, and how to prevent it from recurring.

Key Takeaways: Recovering from a Sidekiq Queue Backlog on Mastodon

  • Sidekiq dashboard at /sidekiq: Provides real-time queue depths and retry counts for diagnosing the backlog.
  • systemctl restart mastodon-sidekiq: Restarts all Sidekiq processes to clear stuck jobs and re-queue them.
  • RAILS_ENV=production bundle exec sidekiq -q default -q push -q pull -c 15: Manually runs Sidekiq with higher concurrency to drain a backlog faster.

Why a Sidekiq Queue Backlog Forms on Your Mastodon Instance

Sidekiq is the background job processor that handles nearly every asynchronous task in Mastodon. It sends notifications, processes incoming and outgoing federation data, generates media thumbnails, and updates timelines. Each job type belongs to a queue: default, push, pull, mailers, and scheduler.

A backlog forms when the rate of incoming jobs exceeds the processing capacity of your Sidekiq workers. Common triggers include:

  • A sudden spike in activity, such as a viral post or a new user importing a large list of followed accounts.
  • Insufficient Sidekiq concurrency settings relative to your server CPU and memory.
  • A slow database or Redis connection that causes jobs to time out and retry, adding back to the queue.
  • A single stuck job that blocks the queue because Sidekiq processes jobs sequentially per queue by default.

When the backlog grows, new jobs wait longer to be processed. Users experience delays in notifications, federation failures, and slow timeline updates. If left unchecked, the backlog can cause memory exhaustion and crash the Sidekiq process entirely.

Monitoring the Sidekiq Dashboard for Backlog Signs

The fastest way to confirm a backlog is to visit the Sidekiq dashboard at https://your-instance.com/sidekiq. If you have not enabled the dashboard, you can check the queue depth via the command line:

redis-cli LLEN queue:default
redis-cli LLEN queue:push
redis-cli LLEN queue:pull

A queue length above 1000 jobs per queue typically indicates a significant backlog that requires intervention.

Step-by-Step Recovery From a Sidekiq Queue Backlog

The recovery process involves three phases: clearing stuck jobs, draining the backlog, and adjusting settings to prevent recurrence. Follow these steps in order.

  1. Restart Sidekiq to Clear Stuck Jobs
    SSH into your Mastodon server. Run the following command to restart all Sidekiq processes:
    systemctl restart mastodon-sidekiq
    This action kills all current Sidekiq workers. Stuck jobs that were in progress will be re-queued. Wait 30 seconds, then check the Sidekiq dashboard to see if the queue depth decreases.
  2. Manually Run Sidekiq With Higher Concurrency
    If the backlog remains above 500 jobs per queue, increase the concurrency temporarily. Navigate to your Mastodon installation directory (usually /home/mastodon/live) and run:
    RAILS_ENV=production bundle exec sidekiq -q default -q push -q pull -c 25
    This command starts a single Sidekiq process with 25 threads instead of the default 10. Adjust the -c value based on your server CPU cores. For a 4-core server, use 20 to 30. Monitor the queue depth every 60 seconds. Stop this process with Ctrl+C once the backlog drops below 100 jobs per queue.
  3. Clear Retries and Dead Jobs From the Dashboard
    Open the Sidekiq dashboard in your browser. Click the Retries tab. If you see hundreds of retried jobs, click the Clear All button. Then click the Dead tab and click Clear All. This removes jobs that have failed multiple times and would otherwise continue retrying, wasting processing capacity.
  4. Verify Federation and Notification Delivery
    After the backlog clears, test federation by posting a public toot and checking if it appears on another instance. Send a direct message to a local user and confirm they receive the notification. If delays persist, check the Sidekiq dashboard again for new backlogs in the pull queue, which handles incoming federation data.
  5. Adjust Sidekiq Concurrency in the Production Configuration
    Edit the Sidekiq configuration file at /home/mastodon/live/config/sidekiq.yml. Change the concurrency value to a number that matches your server resources. For a server with 4 GB of RAM and 4 CPU cores, set concurrency to 15 to 20. Save the file and restart Sidekiq:
    systemctl restart mastodon-sidekiq
  6. Scale Sidekiq Processes (Optional for Large Instances)
    If your instance has more than 10,000 active users, consider running multiple Sidekiq processes. Create a systemd service file for a second process, for example /etc/systemd/system/mastodon-sidekiq-2.service, with a different -q argument set. Enable and start the service:
    systemctl enable mastodon-sidekiq-2 && systemctl start mastodon-sidekiq-2

If the Backlog Returns or Other Issues Appear

Sidekiq Shows High Memory Usage After Restart

A single Sidekiq process with high concurrency can consume more than 1 GB of RAM. If your server has limited memory, reduce the concurrency value in sidekiq.yml to 10 and add a second Sidekiq process instead. This distributes the memory load across two processes.

Federation Still Fails After Backlog Clears

If outgoing federation remains broken, the push queue may contain jobs that reference deleted accounts or invalid URLs. Use the Sidekiq dashboard to view the Dead queue. Delete all dead jobs, then restart Sidekiq. If the issue persists, check your instance firewall rules for outbound connections on port 443.

Redis Memory Exhaustion Causes Sidekiq to Crash

A backlog can fill Redis memory with job data. Run redis-cli INFO memory to check used_memory_human. If it exceeds 80% of your Redis maxmemory setting, clear the queues with redis-cli FLUSHDB (this removes all jobs, so use it only as a last resort). Then configure Redis to use the allkeys-lru eviction policy in /etc/redis/redis.conf.

Item Restart Sidekiq Manual High-Concurrency Run
Purpose Clear stuck jobs and reset workers Drain large backlog faster than default settings allow
Effect on queues Re-queues all in-progress jobs Processes jobs at up to 25x the normal rate
Risk Minimal; jobs are not lost High memory usage if concurrency is set too high
Best for Backlogs under 1000 jobs per queue Backlogs over 5000 jobs per queue

You can now identify a Sidekiq queue backlog, clear it using the Sidekiq dashboard or command line, and adjust concurrency settings to prevent future delays. For persistent backlogs, consider splitting Sidekiq into multiple processes or upgrading your server resources. As an advanced tip, set up a monitoring alert with Prometheus and the sidekiq_exporter to notify you when any queue exceeds 500 jobs.