Victor Costan: 2008

Thursday, December 11, 2008

Tips on building a CS paper or thesis

I'll describe some techniques that found useful when writing my own M.Eng. thesis and papers. My focus is on improving the process of building the paper, not writing it. So I'll write about some things you can do to make the process easier. This is not yet another compilation of writing tips.

Latex
Get a complete distribution, and install all of it. Don't try to be smart about packages. 1Gb of your disk space is worth less than the time you'll spend debugging missing dependencies.

I use texlive, because it's available on all platforms:

MacOS: http://www.tug.org/mactex/
Ubuntu: apt-get install texlive-full
Windows: http://www.tug.org/texlive/ (you'll get the same distribution as above, but you have to install it yourself)

Tools
I'm a developer. When I write something that's not an e-mail, I'm in Eclipse. So I got myself an Eclipse plug-in for Latex at http://texlipse.sourceforge.net/. You can plug this link directly in the Update Manager to get the plug-in.

I recommend configuring your texlipse project to use pdflatex. This gets you the PDF that you need, and lets you use .pdf files as figures. Most software used for figures has pdf output.

Version Control
I feel uneasy if my work is not under version control... it's a disaster waiting to happen. Latex is all text, so you can even work on multiple machines, and merge the changes intelligently.

I recommend subversion or git. They both have Eclipse plugins:

subversion: http://subclipse.tigris.org/
git: http://www.jgit.org/

At the time of this writing, the git plug-in isn't self-sufficient... you'll need to know git's command-line to get stuff done. On the other hand, git's cheap branches might be worth the hassle.

Bibliography
There's no way in hell I'm typing in the data for my 100-200 bibliography references. Here's how I get my .bib entries:

Go to Google Scholar Preferences
Under Bibliography manager, select Show link to import citations into
Make sure you have Bibtex selected as the format, then click OK.
Find the cited paper / book on Google Scholar: http://scholar.google.com/
Click on the Import into Bibtex link. The entry is revealed for your copy-pasting pleasure.

I know this might seem obvious, but it took me a while to try it out. Scholar's advanced search is useful when you're looking for a certain author's work, and that work is referenced a lot by newer papers. If you don't believe me, try finding the paper on Dijkstra's algorithm without advanced search.

Act Like You're Building Software
Writing in latex is more like building software than writing a humanities paper. Once I embraced that, I became less miserable.

Unlike MS Word, latex lets you distribute your work across many files. A good directory / file organization can make re-organizing sections really easy.

Writing in latex is also like building software in that you can use libraries. They call them packages. Here's what used in my thesis:

listings - code listings
url - lets you add URLs to bibliography entries (why is that not in base, again?)
graphicx - figures? I think you need it to import PDF files, I'm not sure
amssymb, latexsym, amsmath - useful stuff for mathy formulas (I include them everywhere, it makes my life easier)
boxedminipage - nice borders around code listings
times - I forgot what it does
clrscode - format your pseudocode CLRS style (I like it)

When I need something that seems general, I try to find a package that does it for me. People who write packages tend to write nice documentation on them, and I'd rather read that than fight with latex to figure out how to do something on my own.

I'd like to push the building software similarity even further, but I haven't figured out how to set up a continuous build yet :)

Conclusion
Writing in latex isn't as intuitive as using MS Word. On the other hand, there are some advantages to describing the contents of your paper in code. Knowing and taking advantage of them has made my life easier, and saved me from potential disasters.

I hope you found this useful. If you have more tips, please leave a comment!

Sunday, November 23, 2008

Some Tricks for Background Processing in Rails

This post describes a bunch of tricks that I'm using to do background processing in my Rails applications, Movie Nights and MIT's 6.006 Course Site.

If you want the whole picture, read ahead. If you just want to see the tricks, skip the next section. There's also code at the end, for your copy-pasting pleasure. And it all works with the newly released Rails 2.2.

Big Picture
I'm doing my background processing in one or more long-lived processes. I keep their code in script/background, and use simple-daemon to make them... daemons. I use Starling to pass messages between my Rails front-end processes and the background processors. I don't like starting anything by hand, so I use daemonz to start up Starling and my daemons.

Trick 1: ActiveRecord + Long-Running Process
When using ActiveRecord in long-lived processes, you'll see their connections drop. You know you're experiencing this if you see the following in your logs
Error processing task - ActiveRecord::StatementInvalid: Mysql::Error: MySQL server has gone away:

I fixed this problem by having ActiveRecord re-check its connections every time a Starling request is processed:

ActiveRecord::Base.verify_active_connections!

If your background tasks are really long, and you're working with your database (for instance, updating some status when a task completes) you might need to run the line above several times during a task.

Trick 2: ActiveRecord + Fork
I know I'm not supposed to fork. But I need to. My course website runs student-submitted programs, and then processes their stdouts. So my background processor daemon needs to fork/exec to be able to run those programs.

The problem is, forking copies the parent's memory. So ActiveRecord's connection pools will get copied. And when the child exits, ActiveRecord will close the database connections, and screw over the parent process. spawn used to handle this for me, then it stopped working in Rails 2.2. So, after some experimentation, the following seems to be the most concise fix:


ActiveRecord::Base.connection_handler.instance_variable_set :@connection_pools, {}

Keep in mind that my forked children run non-rails code, or exec something else right away. I haven't tried using ActiveRecord in them, and I suspect it would break.

Code
My daemonz configuration (daemonz.yml) for Starling and the background processor is below. I have 4 task processors, because the production server has 4 cores and therefore can run 4 student submissions in parallel.

My background processor boilerplate code is below. It uses two queues, pulls and pushes, and tasks in pulls have priority over tasks in pushes. It also works with daemonz.yml to get multiple instances of the same daemon, which is non-trivial when using simple-daemon.

The code above calls into OfflineTasks, which I define in lib/offline_tasks.rb. The code for my course site is below. Each background task has a method that pushes it into Starling, and a case branch that executes it when it's popped from the Starling queue.

And finally, a snippet of code that I use to fork:

Thanks for reading! I hope you found this useful.

Wednesday, November 19, 2008

Post-install / post-update scripts for ruby gems

This post outlines a hack that allows a gem to run Ruby code when the gem is installed or updated, which in effect gives post-install / post-update hooks. The method described here works with any reasonable version of ruby and rubygems (I only tested as far back as ruby 1.8.4 and rubygems 0.9.4, and everything looked good.)

Summary
Add a fake extension, use extconf.rb to run your code and then simulate successful compilation. Use the links at the end for example code.

Detailed Description
RubyGems isn't supposed to run arbitrary code during gem installations, but it supports building extensions for gems. This is a really sweet feature, and makes rubygems a nice tool for cross-platform package management. That's good and all, but the part that we care about is that the process of building an extension starts by running extconf.rb in the extension's directory, which is responsible for producing a Make file that will orchestrate the building process.

Knowing this, the first thing that comes to one's mind is - let's add an extension to the gem, and put the hook code in extconf.rb. However, there's one more issue left. If rubygems believes that your gem's extension hasn't been built properly, it will not finish installing the gem, and it will spit out a nasty error message.

In order to work around that, we need to trick rubygems' build process, so we need to bypass 3 checks:

a Make file is generated - create an empty Makefile
make all and make install run successfully - generate a make file that contains empty all and install targets; create fake make binaries to cover the case when the user doesn't have a build environment (Linux/Mac: an executable make with a /bin/sh shebang should work; Windows: an empty nmake.bat should trick the Windows port of rubygems)
an extension binary is generated - create empty files your_extension_name.so and your_extension_name.dll

You can implement this yourself, or you can depend on my zerg_support gem, and use the method there, like the example code below does. I promise I won't mind if you copy-paste the code, so you don't have to take an extra dependency :)

I haven't tested out the Windows plan yet, but I have good reasons to believe it should work (I've built gems with real extensions on Windows some time ago).

Code Map
Adding an extension to your gemspec (assumes you're using hoe or echoe):
http://rails-pwnage.rubyforge.org/svn/trunk/zerg/Rakefile

Placing your hook in extconf.rb:
http://rails-pwnage.rubyforge.org/svn/trunk/zerg/ext/zerg_setup_hook/extconf.rb

Tricking rubygems into thinking an extension was built (see emulate_extension_install):
http://rails-pwnage.rubyforge.org/svn/trunk/zerg_support/lib/zerg_support/gems.rb

Monday, September 22, 2008

Daemons in Ruby on Rails applications

Summary
I have created a Rails plug-in, daemonz, that automatically starts and stops the daemons associated with an application, without too much configuration. It works in both production and developer mode. Read below if you care about the motivation behind not using an existing tool.

The Story
In the life of every Ruby on Rails app, there comes a time when the boundaries of a single process are simply too narrow. It might be the need for speed-ups (memcached), full-text indexing (ferret), or background work (simple-daemon), or something entirely different. One thing is certain - daemons will start accompanying your applications.

My application, Movie Nights, desperately needed daemons for all the above reasons. I've been putting that off for as much as possible, because I didn't want to make my developers learn how to start all these processes, and I didn't want the hassle of having to remember all that. When I really had to do this, I knew there is only one right answer: automate.

So I looked into monit and god. They're awesome for the production environment, which didn't really help - I already have a deployment tool, which can handle daemons. I wanted something that would work under the development environment, and would automatically start/stop everything, and I accepted that I'll have to build my own.

Sadly, I don't have time to do this the right way (an integrated system that manages all the dependencies of a Rails app), so I settled for the quick way. Enter daemonz, a bandaid (not really a full solution) for the problem I've described above. Once it is configured, it knows how to start and stop the daemons in most non-trivial circumstances, e.g. having a cluster of app servers, or having the Rails framework brought up by tools. It's not a nice and elegant solution, but it works, and it took a few hours to code.

Enjoy!
(If you're not a Rails programmer, hopefully you can at least enjoy the speed that Movie Nights has gained by performing all its Facebook API calls in a background process.)

Monday, September 1, 2008

Never Gone

I've been at Google for the past couple of months. But I haven't been (completely) idle. I dusted off my IAP creation (Movie Rails), and adapted (read: pretty much re-wrote) it to work with the new Facebook. I give you Movie Nights.

Why did I do this? Mainly to satiate my passion for Ruby and Rails (no, I didn't want to say Ruby on Rails). And there's also fbFund, which has been accepting submissions until this past Friday. If they happen to like the entry that some of my bold friends and I have put together, we'll have some nice cash to turn this idea into the best movie application ever. And my mom will finally use something I've coded.

Saturday, April 26, 2008

Ubuntu 8.04 on Mac Mini

Update: please read my new post if you want to install Ubuntu 8.10 or 9.04.

So, Ubuntu 8.04 came out yesterday. I promptly got my Mini to not boot by trying to install it using the same procedure I used for 7.10. One day later, I figured out a (small) sequence of steps that yields a working dual-boot of Leopard and Ubuntu 8.04. Here's what I did:

Use the Leopard (Desktop or Server) install disc to re-partition the disk to 1 partition, then install Leopard.
Install Software Updates. Needs to be done twice.
Start up Boot Camp, and resize your OSX partition. Quit Boot Camp when it offers to start the Windows installation.
Download and install rEFIt: http://refit.sourceforge.net/
Make rEFIt take over the boot process (it should do that by itself but that doesn't work for me):
- Open Terminal
- cd /efi/refit
- ./enable-always.sh
Optionally, switch the boot default to Linux: open /efi/refit/refit.conf in TextEdit, and uncomment the line saying #legacyfirst (at the very bottom)
Reboot and insert your Ubuntu 8.04 disc. The rEFIT screen should pop up. Don't worry if you've never seen it before. You'll notice the difference.
Start installing Ubuntu. Choose manual partitioning, delete the big FAT32 partition that Boot Camp created (leave the EFI partition alone though), and create the root and swap partitions in the free space.
On the last install screen, click Advanced, and replace (hd0) with (hd0,2). This is necessary so that Grub installs in the right place.
Upon rebooting, go to Partitioning Tool (second icon in the bottom row) in rEFIt. It will offer to update the MBR to reflect the EFI partition table. Accept. Then the Mini will reboot again.
Power off the Mini. Then power it back on. Now you can boot Ubuntu or Leopard.

I know that the last steps look like black magic (pulled out of my behind). I didn't get it to work otherwise. These steps work. Please comment if you find a shorter sequence.

Asides from getting Ubuntu to boot, rEFIt is handy because it rescues you in case something blows up. When I bricked the Mini, I was running on Apple's boot loader. rEFIt was still be able to boot OSX, even when I messed up the grub install options.

Enjoy your Mini servers :)

Tuesday, February 19, 2008

Interesting eBay affiliate strategy

Call me dumb, but I didn't see this before.

This company registered as an eBay affiliate, and its 'value add' is that it gives people money back on their eBay purchases. I feel dumb for not having done that myself (ok, maybe I don't feel so bad... I'm not quite yet allowed to make money in the US anyway).

This is a refreshing turn from the usual deal, which is more like "give us $10 a month and you might qualify for some of our awards if you go to really expensive places and buy stuff at list price".

If you sign up for kickitback, please mention victorcostan as the referrer :)