This post describes my approach to replicating
GitHub's awesome setup for hosting SSH repositories. GitHub's engineers already outlined the main ideas in their blog, so this post fills in the details by going over all the little roadblocks that I've stumbled on, and explaining the solutions I adopted for each of them.
Assumptions
GitHub allows you to create a git repository via a Web interface, then interact with the repository using a few protocols. I focused on git+ssh, because it's the only protocol that allows pushing to repositories.
I assume that the Git server is a Linux box, and I tested my work on Ubuntu. The setup will most likely require changes for MacOS. I use the already-available infrastructure, such as git and ssh. On Debian, you should have the
git-core and
openssh-server packages installed.
I assume you're not willing to change the OpenSSH server configuration, as you want to stick to the secure defaults on your production infrastructure.
My Web application is written in Ruby on Rails, and it uses the
grit gem, also written by the GitHub engineers. While the code is specific to this technology, most of the article is relevant outside the Rails world.
To avoid edge cases in putting together shell commands, I assume repositories and Web users have very sane names (only letters, digits, underscores).
The code for this article is available
here (GitHub). All the code links in this article reference a particular tree, so that the code would match the writing. Future revisions are available in the same repository, but they may be more optimized and harder to follow.
Big Picture
A GitHub blog post describes their SSH serving architecture (section
Tracing a SSH Request). Scaling aside, these are the main components:
- the server has a git user; git pushes and pulls get processed under that user account
- authorized_keys (the sshd configuration) for the git user contains all the public SSH keys in the Web application (GitHub modified openssh to query a database, since their authorized_keys would have been huge)
- authorized_keys also sets up sshd to run a restricted shell instead of the user's normal login shell (git provides git-shell for this purpose), so users can't use the git account to get shell access on the server
- sshd is not pointed to git-shell directly; instead, a proprietary wrapper checks that the SSH key's owner is allowed to access the repository and, if the operation succeeds, flushes or updates any Web application caches associated with the modified repository
The Git User
I use
this script to set up the
git user. The script accepts any user name, but to keep it simple, I'll use
git in this article. The script's effects are undone by
this script, which I won't cover here.
I don't expect that the Web application will run under the
git account, so instead I set the
git user's primary group to be the same as a group on the Web application's account. I assume that the Web application will run under its own user, and nothing else will use that user and group. It's also possible to create a
git group, and add the Web application to it. What matters is that the Web application can write to the
git user's home directory.
The git repositories are stored in the
repos directory, under git's homedir. The repos directory must be writable by the Web application, so its
permission bits are 770 (
rwxrwx---).
Recent versions of
sshd are very strict with the permissions on
authorized_keys. I appease them by setting the git home directory bits to 750 (
rwxr-x---), setting
.ssh's bits to 700 (
rwx------), and setting
.ssh/authorized_keys bits to 600 (
rw-------). So the Web application will not be able to change
authorized_keys directly, which is a problem, since I'd like to manage authorized SSH keys in the Web application.
To compensate for the issue above, I create an
install_keys setuid executable in
git's homedir that overwrites authorized_keys with the contents of
repos/.ssh_keys. Due to the setup above, only the Web application should be able to write to this file. Furthermore,
install_keys's bits are 750 (
rwxr-x---), so it can only be run by the Web application.
I encountered two more minor but annoying issues.
install_keys cannot be a script, because the setuid flag doesn't work for scripts with shebang lines. My setup script writes out a C program and runs
gcc to obtain a binary. The setuid bit is lost when a file is
chowned, so
chmod must be called after
chown.
SSH Configuration and Integration
authorized_key has one line for each key. The code that generates the lines is
here. I used all the options I could find to lock down
git's shell. The command line points to a shell script in my Rails application, and passes the key's ID as an argument, so I can identify the Web application user.
My keys are standard Rails models, and I use the
after_save and
after_destroy ActiveRecord callbacks to keep
authorized_keys in sync with the database. More specifically, for each key addition or modification, I re-generate the contents of
authorized_keys, write it to
/home/git/repos/.ssh_keys and then run install_keys to get the actual file modified. If response time becomes an issue, I can move this process to a background work queue.
git's restricted shell script (
stub here,
real code here,
test here) is stored in my Rails application, for my development convenience. Once the implementation is crystallized, it could be moved in
git's home directory, so the
git user doesn't need read access to the Web application's directory. The script performs the following steps:
- parses the command line to extract the git command to be executed, the repository path, and the key ID
- checks that the git command line matches the list of allowed commands, and determines whether the command does a pull or a push
- issues a HTTP request against the Web application to check whether the SSH key's owner has permission to pull or push to the repository
- runs the git command, prefixing the repository path with repos/
- if the command succeeds, pings the Web application so it can update its caches
- relays the git command's exit code
Web Application and Testing Considerations
An application user is modeled by a
Profile, and a git repo is represented by a
Repository. Both models use ActiveRecord callbacks to synchronize on-disk state: profiles correspond to directories under
repos/ (
sync code here) and repositories correspond to bare Git repositories created by grit (
sync code here). The callbacks are slightly more complicated than for SSH keys, because I need the old name when a profile or repository's name changes, in order to rename the corresponding directory.
Unit tests stub out the methods that compute the path to repos/ (
code here and
here), so they can run without a full setup. This is desirable so I can get
CI running later. I also have a giant integration test (
code) that runs all the code described in this article. Since it creates a new user, it requires sudo access, which makes it unlikely that I'll ever be able to set up CI for it.
A particularly challenging for the integration test was pointing git to a SSH key to be used when pushing and pulling. Git doesn't take a command-line argument, and the easiest solution I found was to override the
ssh command used by git with a custom wrapper (
wrapper-generating code) that contains the options I need to point to a custom key. Overriding is achieved by setting the
GIT_SSH environment variable (
code). Also, the permission bits on the key file must be set to 600 (
rw-------), otherwise
ssh will ignore the key.
Motivation
I find GitHub's setup to be awesome, and I always wanted to have my own server implementation to play with. Since I read the team's blog post on how to do it, I wanted to give it a try. When I finally found the time, my experience was rewarding, but also frustrating. POSIX permission bits are limiting, and working around them is non-trivial, at least for me.
Conclusion
I hope you found this post useful, or at least interesting. I look forward to your comments and suggestions for simplifying the setup, or improving its security!