Victor Costan: November 2008

Sunday, November 23, 2008

Some Tricks for Background Processing in Rails

This post describes a bunch of tricks that I'm using to do background processing in my Rails applications, Movie Nights and MIT's 6.006 Course Site.

If you want the whole picture, read ahead. If you just want to see the tricks, skip the next section. There's also code at the end, for your copy-pasting pleasure. And it all works with the newly released Rails 2.2.

Big Picture
I'm doing my background processing in one or more long-lived processes. I keep their code in script/background, and use simple-daemon to make them... daemons. I use Starling to pass messages between my Rails front-end processes and the background processors. I don't like starting anything by hand, so I use daemonz to start up Starling and my daemons.

Trick 1: ActiveRecord + Long-Running Process
When using ActiveRecord in long-lived processes, you'll see their connections drop. You know you're experiencing this if you see the following in your logs
Error processing task - ActiveRecord::StatementInvalid: Mysql::Error: MySQL server has gone away:

I fixed this problem by having ActiveRecord re-check its connections every time a Starling request is processed:

ActiveRecord::Base.verify_active_connections!

If your background tasks are really long, and you're working with your database (for instance, updating some status when a task completes) you might need to run the line above several times during a task.

Trick 2: ActiveRecord + Fork
I know I'm not supposed to fork. But I need to. My course website runs student-submitted programs, and then processes their stdouts. So my background processor daemon needs to fork/exec to be able to run those programs.

The problem is, forking copies the parent's memory. So ActiveRecord's connection pools will get copied. And when the child exits, ActiveRecord will close the database connections, and screw over the parent process. spawn used to handle this for me, then it stopped working in Rails 2.2. So, after some experimentation, the following seems to be the most concise fix:


ActiveRecord::Base.connection_handler.instance_variable_set :@connection_pools, {}

Keep in mind that my forked children run non-rails code, or exec something else right away. I haven't tried using ActiveRecord in them, and I suspect it would break.

Code
My daemonz configuration (daemonz.yml) for Starling and the background processor is below. I have 4 task processors, because the production server has 4 cores and therefore can run 4 student submissions in parallel.

My background processor boilerplate code is below. It uses two queues, pulls and pushes, and tasks in pulls have priority over tasks in pushes. It also works with daemonz.yml to get multiple instances of the same daemon, which is non-trivial when using simple-daemon.

The code above calls into OfflineTasks, which I define in lib/offline_tasks.rb. The code for my course site is below. Each background task has a method that pushes it into Starling, and a case branch that executes it when it's popped from the Starling queue.

And finally, a snippet of code that I use to fork:

Thanks for reading! I hope you found this useful.

Wednesday, November 19, 2008

Post-install / post-update scripts for ruby gems

This post outlines a hack that allows a gem to run Ruby code when the gem is installed or updated, which in effect gives post-install / post-update hooks. The method described here works with any reasonable version of ruby and rubygems (I only tested as far back as ruby 1.8.4 and rubygems 0.9.4, and everything looked good.)

Summary
Add a fake extension, use extconf.rb to run your code and then simulate successful compilation. Use the links at the end for example code.

Detailed Description
RubyGems isn't supposed to run arbitrary code during gem installations, but it supports building extensions for gems. This is a really sweet feature, and makes rubygems a nice tool for cross-platform package management. That's good and all, but the part that we care about is that the process of building an extension starts by running extconf.rb in the extension's directory, which is responsible for producing a Make file that will orchestrate the building process.

Knowing this, the first thing that comes to one's mind is - let's add an extension to the gem, and put the hook code in extconf.rb. However, there's one more issue left. If rubygems believes that your gem's extension hasn't been built properly, it will not finish installing the gem, and it will spit out a nasty error message.

In order to work around that, we need to trick rubygems' build process, so we need to bypass 3 checks:

a Make file is generated - create an empty Makefile
make all and make install run successfully - generate a make file that contains empty all and install targets; create fake make binaries to cover the case when the user doesn't have a build environment (Linux/Mac: an executable make with a /bin/sh shebang should work; Windows: an empty nmake.bat should trick the Windows port of rubygems)
an extension binary is generated - create empty files your_extension_name.so and your_extension_name.dll

You can implement this yourself, or you can depend on my zerg_support gem, and use the method there, like the example code below does. I promise I won't mind if you copy-paste the code, so you don't have to take an extra dependency :)

I haven't tested out the Windows plan yet, but I have good reasons to believe it should work (I've built gems with real extensions on Windows some time ago).

Code Map
Adding an extension to your gemspec (assumes you're using hoe or echoe):
http://rails-pwnage.rubyforge.org/svn/trunk/zerg/Rakefile

Placing your hook in extconf.rb:
http://rails-pwnage.rubyforge.org/svn/trunk/zerg/ext/zerg_setup_hook/extconf.rb

Tricking rubygems into thinking an extension was built (see emulate_extension_install):
http://rails-pwnage.rubyforge.org/svn/trunk/zerg_support/lib/zerg_support/gems.rb