Distributed Background Processing on Rails
Posted on June 20th, 2008 in Scalability | Permalink
There are numerous options for performing background processing in Rails.
Here at Howcast, our method of choice is Backgroundjob (Bj).
“Backgroundjob (Bj) is a brain dead simple zero admin background priority queue for Rails. Bj is robust, platform independent (including windows), and supports internal or external management of the background runner process.”
Installing Bj
./script/plugin install http://codeforpeople.rubyforge.org/svn/rails/plugins/bj./script/bj setup
This will create all the migrations you need to generate the job tables (note that there are also archive and configuration tables).
Distributing Bj
With Bj there is a persistent job queue in the database that workers can query to pick up pending jobs. This allows Bj to be easily distributed where workers can be run across various servers.
In your environment.rb simply add:
This will mean that the default Bj worker will not run on the web server. This means you can run various other workers on other servers/slices. Simply add this to the crontabs of the servers you want to distribute to.
Extending Bj
Although Bj is a full fledged solution for managing and running background processes, we needed some additional functionality on Howcast — dependency jobs and specialized workers. We needed some jobs to be run serially after one another and so we needed to specify a dependency job id that would need to be completed before the job submitted would be started. With this requirement also came another requirement of constraining specific workers to run specific types of jobs. To accomplish this we added a dependency_id to the bj_jobs table and added an option for running job works with specific tags (–only-tag parameter). This would allow you to run a worker with the following command:
This would cause this worker to only run jobs that were submitted with the ’specialized’ tag:
There is also an option to start a worker with –exclude-tag option to do the reverse of the example above.
With these two features combined you can create a pretty complex flow structure for jobs. Now you can split up larger jobs into smaller specialized jobs that specific workers will run in parallel or serially with the dependency_id set.
Installing the enhancements
Both these enhancements have been open sourced and are available here: http://github.com/howcast/backgroundjob/commits/dependencies_and_tags
We’ve found this to be an easy way to distribute background processing tasks and think some of you might find it to be useful as well.



