Sunday, November 23, 2008

Some Tricks for Background Processing in Rails

This post describes a bunch of tricks that I'm using to do background processing in my Rails applications, Movie Nights and MIT's 6.006 Course Site.

If you want the whole picture, read ahead. If you just want to see the tricks, skip the next section. There's also code at the end, for your copy-pasting pleasure. And it all works with the newly released Rails 2.2.

Big Picture
I'm doing my background processing in one or more long-lived processes. I keep their code in script/background, and use simple-daemon to make them... daemons. I use Starling to pass messages between my Rails front-end processes and the background processors. I don't like starting anything by hand, so I use daemonz to start up Starling and my daemons.

Trick 1: ActiveRecord + Long-Running Process
When using ActiveRecord in long-lived processes, you'll see their connections drop. You know you're experiencing this if you see the following in your logs
Error processing task - ActiveRecord::StatementInvalid: Mysql::Error: MySQL server has gone away:

I fixed this problem by having ActiveRecord re-check its connections every time a Starling request is processed:

ActiveRecord::Base.verify_active_connections!


If your background tasks are really long, and you're working with your database (for instance, updating some status when a task completes) you might need to run the line above several times during a task.

Trick 2: ActiveRecord + Fork
I know I'm not supposed to fork. But I need to. My course website runs student-submitted programs, and then processes their stdouts. So my background processor daemon needs to fork/exec to be able to run those programs.

The problem is, forking copies the parent's memory. So ActiveRecord's connection pools will get copied. And when the child exits, ActiveRecord will close the database connections, and screw over the parent process. spawn used to handle this for me, then it stopped working in Rails 2.2. So, after some experimentation, the following seems to be the most concise fix:

ActiveRecord::Base.connection_handler.instance_variable_set :@connection_pools, {}


Keep in mind that my forked children run non-rails code, or exec something else right away. I haven't tried using ActiveRecord in them, and I suspect it would break.

Code
My daemonz configuration (daemonz.yml) for Starling and the background processor is below. I have 4 task processors, because the production server has 4 cores and therefore can run 4 student submissions in parallel.



My background processor boilerplate code is below. It uses two queues, pulls and pushes, and tasks in pulls have priority over tasks in pushes. It also works with daemonz.yml to get multiple instances of the same daemon, which is non-trivial when using simple-daemon.



The code above calls into OfflineTasks, which I define in lib/offline_tasks.rb. The code for my course site is below. Each background task has a method that pushes it into Starling, and a case branch that executes it when it's popped from the Starling queue.



And finally, a snippet of code that I use to fork:



Thanks for reading! I hope you found this useful.

2 comments:

  1. There is an easier way of handling mySQL disconnects, as of Rails 2.3:

    http://guides.rubyonrails.org/2_3_release_notes.html#_reconnecting_mysql_connections

    ReplyDelete