Victor Costan: Managing Software Dependencies

Some software development decisions are more important than others. This post argues that decisions involving dependencies are among the very important ones, and describes my approach to managing dependencies.

What Are Dependencies
For the purpose of this post, dependencies are pieces of software outside the project or component that you are considering. Software development does entail other dependencies, like the value of a local currency, but those are outside the scope of my write-up.

Why Worry About Dependencies
Decisions where we take dependencies are among the most important software development decisions we take, because dependencies come with costs and constraints.

Maintenance costs are the ongoing cost associated with keeping the dependency. This cost does include traditional maintenance, such as staying informed about new versions, and applying security updates, but it can go much further. For example, taking a dependency on a Windows-only API in a Web server imposes the cost of a Windows license on every machine running the server. Furthermore, maintenance costs aren't always easy to estimate. For example, the biggest cost in using a library developed by a small group of people is not licensing or integration, but rather the potential cost of having to take on the development of that library, if the initial developers cease working on the library.

Replacement costs are more straightforward -- they are the price paid to completely remove the dependency on a piece of software. Their importance lies in the implication that replacement costs are the maximum "premium" that you will pay in maintainance cost for a dependency, over the optimum cost. The explanation for this is: if the maintainance cost for using Windows becomes so large that it's cheaper to pay the replacement cost for Linux, and the maintenance cost for Linux, then you will switch to Linux. So the biggest premium that you will pay to stick with Windows is how much it would take to replace it.

Incompatibility constraints come with every dependency taken. Technical incompatibilities tend to be obvious, for example DirectX requires Windows, Cocoa requires MacOS, so there is no straightforward way to write a Cocoa application using DirectX. Other incompatibilties are more subtle, like licensing. The GPL license is the most well-known pain, because GPL code cannot be linked together with code released under some other free licenses. Last but not least, there are "versioning hell" incompatibilities, where library A requires library B, at most version 1.0, and library C requires library C, version 1.1 or above, and for this reason, A and C cannot be used together.

These costs and constraints are the factors I consider first when considering taking new dependencies, which I describe below.

Managing Dependencies
In a nutshell, my strategy around dependencies is as follows. Avoid unnecessary dependencies, and take cheap dependencies. Failing that, make the expensive dependencies easy to replace.

Unnecessary Dependencies
To me, the most important aspect of managing dependencies is being aware when I'm taking them. For example, Linux or OSX developers can habitually use fork or POSIX filesystem permissions. This habit becomes a problem when developing multi-platform code, because the features are not present on Windows. Higher-level languages are not immune to platform dependencies either. In SQL, it's all too easy to use a database-specific extension, and popular scripting languages (ruby, python) have extensions which may not be available on Windows, or may crash on OSX. Versioning hell dependencies are also a pain, and keeping track of them requires a perspective that is more commonly posessed by accountants than by coders.

Fortunately, continuous builds can be used to delegate the tedious bookkeeping to computers. Continuous builds set up to run on Windows and Mac OSX protect from taking an unwanted dependency on Linux. A continuous build running tests against SQLlite and PostgreSQL database backends protects from dependencies on MySQL. Continous builds warn about troublesome code early on, when programmers will still be inclined to fix it. For example, it's easier to replace the fork / exec pair with a system call before it becomes a pattern sprinkled around the entire codebase.

Awareness is only the first step. Most of the time, a dependency has to be taken in return for extra functionality, and I have to decide what dependency I'm taking, and write the integration code. In this case, I consider the issues I presented in the previous section.

Cheap Dependencies
If the maintainance cost will clearly be low, I don't worry too much about the dependency. For example, if I'm using ruby, I assume the Rubygems library is installed or easily available, so I don't think twice before using its functionality. When figuring out maintainance cost, I pay most attention to incompatibility constraints. The following findings ring alarm bells in my head:

platform dependencies; Example: if it doesn't work on Windows, I can't use it in a deskop application.
restrictive licenses; Examples: GPL, licenses forbidding using code in a commercial setting
patents; A subtle example is that Adobe's supposedly open Flex platform uses the Flash file format, which is patented by Adobe. Though Adobe published a specification of the Flash format, it prohibits the use of the specification to build competing Flash players
niche open-source; Ohloh tracks some statistics that can indicate a potentially troublesome open-source project, like a short revision history, a single committer, and uncommented code

Expensive Dependencies
When the maintainance cost of a dependency will be high, I take extra precautions to lower the replacement cost. I try to learn about at least one alternative, and write the integration code in such a way that it would be easy to swap that alternative in. The goal behind this is to develop a good abstraction layer that insulates the rest of my application from the dependency, and keeps the replacement cost low. Two common examples of this practice are JavaScript frameworks, which insulate application code from browser quirks, and ORM layers such as ActiveRecord that put a lot of work into database independence.

Having good automated tests provides many advantages that prolong the life of a codebase. One of them is reducing the replacement costs for the all the dependencies. Uprooting a dependency is a nightmare when developers have to sift through piles of code by hand. The same task becomes routine when the computer can point at the code that needs to be changed. Without a good automated test suite, dependencies can become really rigid ("this application only works with Rails 2.2, it'd take forever to port to Rails 2.3" versus "we spend a few hours to update the application when a new version of Rails comes out").

The effort that goes into keeping replacement costs low is typically repaid many times over by the benefits of being able to replace old or troublesome dependencies. Of course, this only holds for long-lived projects, and I wouldn't pay as much attention to how I integrate my dependencies when I'm exploring or building a throw-away prototype.

Conclusion
Many good software projects don't shine because of their dependencies (example: Cocoa, because it only runs on Mac OS X). The total cost of long-lived projects is largely influenced by the cost of living with their dependencies. Therefore, it makes sense to invest effort into steering away from dependencies that may bring trouble or even doom the project down the line. Hopefully, this post has presented a few considerations that will help you spot these troublesome dependencies, and either avoid them or at least insulate your codebase from them.

One More Thing
I promise I won't make this a habit, but I want to end this post with something for you to think about. As a programmer, choosing which skill to learn next is closely related to the dependencies problem explored above. We learn new technologies to use them in our projects, which means the projects will take dependencies on those technologies. So, we might not want to learn technologies which translate into troublesome dependencies.

I will write more about looking at dependencies from this different angle, next week.

Victor Costan

Monday, March 30, 2009

Managing Software Dependencies

1 comment: