noizZze

Using Sidekiq With Elastic Beanstalk

I’m a big fan of Sidekiq, but recently, one of my clients asked for hosting their new application on Amazon’s Elastic Beanstalk which imposed certain limitations, that I had to overcome to get my normal setup working. In this post I want to describe some of them.

I usually either use Sidekiq on the same node, or place it on a dedicated one to put demanding workers away from the main web app. It requires Redis to maintain the queue of tasks and the copy of application to be able to execute them.

Amazon’s Elastic Beanstalk is the framework with load balancing and autoscaling that lets you focus on your application, not infrastructure. The copy of your Rails application (in this case) is unziped and configured on each instance with Nginx and Passenger. If you need database, they provide RDS that runs MySQL (in this case).

It appears that the setup with a dedicated node for Sidekiq and one with Redis isn’t quite possible with current state of Beanstalk. There’s no distinction between nodes – every one of them is just like the sibling. At the same time, there’s a notion of a “leader”, being the first node handled during the deployment, but we can’t use it to identify a certain node reliably afterwards. It can be used for migrations or other one-off tasks during the deployment only, and can change from deployment to deployment. On top of this, the auto-scaler is free to terminate any instance during downscaling, and I read it that it leans towards older instances more often than not.

Here’s what I finally came up with:

  • use RedisToGo as an external Redis database. They host it on EC2 in the same zone and so the latency is super-small.
  • deploy workers on each web node. Even though it’s an unusual setup, it makes sence. As the load grows, the number of tasks to run is likely to increase as well, and so scaling workers makes sense.

To (re)start workers on the nodes, I placed this .ebextensions/sidekiq.conf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/50_restart_sidekiq":
    mode: "000777"
    content: |
      cd /var/app/current

      if [ -f /var/app/support/pids/sidekiq.pid ]
      then
        kill -TERM `cat /var/app/support/pids/sidekiq.pid`
        rm -rf /var/app/support/pids/sidekiq.pid
      fi

      . /opt/elasticbeanstalk/support/envvars.d/sysenv

      sleep 10

      bundle exec sidekiq \
        -e production \
        -P /var/app/support/pids/sidekiq.pid \
        -C /var/app/current/config/sidekiq.yml \
        -L /var/app/support/logs/sidekiq.log \
        -d

  "/opt/elasticbeanstalk/hooks/appdeploy/pre/03_mute_sidekiq":
    mode: "000777"
    content: |
      if [ -f /var/app/support/pids/sidekiq.pid ]
      then
        kill -USR1 `cat /var/app/support/pids/sidekiq.pid`
      fi

Basically, after unziping (pre-init task 1) and setting up environment (pre-init task 2), we tell Sidekiq not to accept any more jobs. Then new app version is deployed, and finally (post-init task 50), we restart workers by first, stopping them, giving time to terminate, and starting over again.

Figuring this out took about two days of dirty time, and so hopefully, this will be of any help to others.

Update - Feb 17, 2014

I was contacted by guys from Redis Cloud who generously set me up with one of their premium plans for some testing of their advanced features. What they offer is something that had to be there for quite some time yet, most notably:

  • Amazon AWS nodes – gents keep their nodes on Amazon EC2 infrastructure. You can choose the zone when creating subscriptions to keep latency with your other EC2 nodes close to nil.

  • Amazon Security Groups auth – loving this. If you are on Amazon AWS for the rest of your projects – there’s no need to use passwords for auth. Just use Security Groups to let your work nodes talk to Redis.

  • Seamless failovers – you keep your endpoint, and they almost instantly set you up with another database in case of the crash or node outage. It’s hard to test this kind of feature, but I can say that during a week of testing on a real system with several connections, there were no outages reported.

  • Instant Backups – they do that, and that’s useful.

  • Increased number of connections

  • Low pricing – that’s something I loved. For more resources we now (yes, I’m using them for a new project) pay much less.

From what I’m not using yet there are auto-scaling and replication. So far I’m enjoying the services and hope it keeps going this smoothly. Give it a try.

Update - May 7, 2014

Guillermo Carrion performed titanic work making Sidekiq run on new Elastic Beanstalk Ruby 2.0/Puma Environment. You can find the full set of his hooks in his Gist.

Thanks, Guillermo!

Comments