4 tips on how to make more out of Sidekiq

Without a doubt, Sidekiq is one of the most popular Ruby gems. Next to Shoryuken and Resque its main purpose is processing background jobs. Almost every Ruby application I have worked with had sidekiq listed in its Gemfile. In this article, I would like to share some insights that I have learned about over time that may make your Sidekiq setup more efficient.

ℹ️ This article covers the free Sidekiq version.

1. Instant notifications about jobs in dead set

There is a high chance that you already know that Sidekiq provides an admin user interface out of the box. It comes with a good overview of current Sidekiq occupation (e.g. how many threads are busy at the moment), some historical data (e.g. how many jobs have been processed so far), and insights about special Sidekiq ScheduledSet, RetrySet and DeadSet sets.

Sidekiq Admin Web UI

Sidekiq Admin Web UI
Source: https://github.com/mperham/sidekiq

DeadSet is a special place in the Sidekiq world, where jobs end their lifecycle after being retried unsuccessfully X number of times. Where X depends on your configuration. I have observed a similar pattern about the sets in almost every project I have been involved in: a big number of jobs in DeadSet

A few weeks ago at DataCamp, we started an initiative to make some of the most important processes more reliable. One of them used to rely on HTTP communication between two services and requests had been failing for various reasons. The fallacies of distributed computing were defined for some reason 🙂. We decided, among other improvements, to replace synchronous communication with asynchronous event processing.

During event consumption, a Sidekiq job is enqueued. Such a job can fail out of the blue. In the worst case, it may end in DeadSet increasing its size by one. Even though the admin user interface provides very detailed information about dead jobs, developers (including myself 🤷) usually don’t visit the admin panel every day to review all dead jobs and try to fix them. As a result, the number of dead jobs grows over time.

This time we wanted to avoid that. The job takes care of one of the most important processes from a users point of view so we decided that we would like to be notified immediately about failed execution to act accordingly. It’s about a positive user experience in the end. Sidekiq to the rescue! It was as simple as extending config file (config/initializers/sidekiq.rb in our case) with a few lines:

Sidekiq.configure_server do |config|
  # A global callback when a job dies
  config.death_handlers << ->(job, ex) do
    Rails.logger.error("[SidekiqDiedJobs] #{job['wrapped']} just died with error: #{ex.message}.")
  end
end

Thanks to the above lines, a log entry is created every time when any job dies. Based on it, we were able to create a custom DataDog monitor that will notify us as soon as a job (based on its name) we are interested in will die. No job has died so far 🤓.

2. Better control over the number of job retries & its lifespan

Usually, background jobs have different importance. By default, Sidekiq retries every job up to 25 times over around 21 days before moving it to DeadSet. I am pretty sure it’s too many for the majority of jobs defined in the application(s) you develop.

We spent one of our recurring internal team meetings dedicated to architecture reviewing jobs in DeadSet and identified not-so-crucial ones. We reduced the default number of retries and even disabled sending some of them to the dead set. It’s nice when they succeed, but when they fail we are totally fine with that. As a result, DeadSet is not polluted with unimportant jobs.

class NotSoCrucialJob < ActiveJob::Base
  include Sidekiq::Worker
  sidekiq_options retry: 3, dead: false # retry up to 3 times and disappear

  def perform(...)
  end
end

3. Idempotency and transactionality

An obvious one, but worth reminding. Did you know that Sidekiq (version 6+) workers have 25 seconds to shut down by default? It’s 5 seconds less than Heroku and AWS ECS processes have. After 25 seconds Sidekiq puts all jobs in progress back to Redis to invoke them again when the workers will be up again. Such jobs will be invoked at least twice. The same is true for cases when jobs fail and are retried. It’s important to remember about making jobs idempotent and transactional. Forgetting about it may lead to unexpected outcomes like sending the same transactional email to the same user numerous times.

It’s usually a good idea to verify idempotency by writing a test that invokes the same job multiple times and checking expected results.

4. Separate Sidekiq Redis entries from cache entries

One of the most common ways of separating cache keys from Sidekiq entries inside a single Redis instance I have seen is relying on the redis-namespace gem.

This is the most naive option. With namespaces, the Redis client prefixes every key with the namespace, e.g. “cache”, so the keys for “cache” don’t conflict with the keys for “transactional”. Namespacing increases the size of every key by the size of the prefix. You don’t get to tune Redis for the individual needs of “cache” and “transactional”1

At the same time, not everybody is aware that a single Redis instance comes with 1️⃣6️⃣ databases available numbered from 0 to 15. Switching between the databases is as easy as adding a number at the end of Redis instance URL. If you have only one instance available switching to separate databases instead of namespacing keys is the way to go:

redis = { url: 'redis://localhost:6379/15' } # use databases no. 15 for Sidekiq keys

Sidekiq.configure_client do |config|
  config.redis = redis
end

Sidekiq.configure_server do |config|
  config.redis = redis
end

 Using separate databases is an easy way to put a “firewall” between datasets without any additional administrative overhead. Now you can FLUSHDB one dataset without affecting another dataset.1

Ideally, you should consider spinning up two distinct Redis instances. One dedicated for high-traffic cache data and another one for Sidekiq data which should be more worried about high availability to avoid losing jobs. Such instances could be tweaked separately based on the purpose they serve. An outage of one, wouldn’t affect the other one. Remember that Redis does not guarantee high availability out of the box.

👋 I hope you learnt something from this article! If you are interested in working with me at DataCamp, please take a look at our open Ruby position.

  1. https://www.mikeperham.com/2015/09/24/storing-data-with-redis/
 

Igor Springer

I build web apps. From time to time I put my thoughts on paper. I hope that some of them will be valuable for you. To teach is to learn twice.

 

Leave a Reply

Your email address will not be published. Required fields are marked *