As a scientist, most of my work consists of text files: source code, data, and papers. Keeping track of changes to these files with version control helps me avoid losing work, track down bugs, reuse code, and keep records of my research. Rather than renaming files
code.py.oldversion, code.py.aug01, code.py.brokendontuse, version control gives me a single, current source file as well as a database of changes that I can use to restore any previous version at will.
Older version control programs (e.g., CVS, SVN) were designed for large teams collaborating on a single code base, so they store changes in a single, centralized repository. For the solitary scientist, newer “decentralized” version control systems like Git and Mercurial work better: they are lightweight and easy to set up, and since the entire change history is carried with the code, it’s easy to keep your work synced across multiple computers.
Among software engineers, Git is far more popular than Mercurial, thanks largely to the online repository github . Git is unnecessarily complicated for most of the work I do, though, so I prefer Mercurial for day-to-day use. With either system, though, a single scientist only really needs a few simple commands to get most of the benefits of version control. The Win-Vector blog published a guide for git; here is mine for Mercurial.
hg add <filename or pattern>
hg init creates a new respository in the current directory. You only need to run it once, when you first start tracking files for a project. It creates a hidden directory
.hg in the current directory to store changes in.
hg add <filename or pattern> tells Mercurial to track changes for the specified files. Again, you only need to run it once for a given file .
hg commit records any changes to all tracked files and allows you to make notes on the changes you’ve made. You should run this frequently, particularly after any significant change in the state of the code. I often specify the message at the command line with the
hg commit -m "This is my commit message."
That’s it! So, a typical session might look like this:
make changes to code.py
hg add code.py
hg commit -m "Initial commit of code.py"
hg add README.txt
hg commit -m "added README.txt; added functions to code.py"
These commands are enough to ensure you’ve got all your changes backed up, and for some projects that’s all you’ll ever need . Eventually, you’ll probably want to list the revision history of a file (
hg log), check which files have uncommitted changes (
hg status), compare versions of the code (
hg diff or
hg vimdiff), and sync with another machine (
hg clone/push/pull/update/merge). To learn how, you can use the command-line help system (
hg help), the official tutorials and Definitive Guide, and Joel Spolskly’s tutorial.
Echoing the Win-Vector Blog, though, you can learn all that when you need it. Starting today, you can get the benefits of version control with just three simple commands:
hg init, hg add, hg commit.
- It is possible to use Mercurial with github with the hg-git plugin. ↩
- Macports users can use
port install mercurial, while if you have python’s pip, it’s
pip install mercurial. ↩
- While the name of the program is Mercurial, the program is invoked with “hg,” the atomic symbol for the element mercury. ↩
- In contrast, git requires you to specify the files to commit explicitly each time. ↩
- One unsung benefit of version control is that it allows you to delete all the commented-out code you might be leaving in “just in case.” ↩