Virtual Python environments

May 24, 2009

Judging from some of the questions about Dumbo development that keep popping up, virtual Python environments are apparently not that widely known and used yet. Therefore, I thought it made sense to write a quick post about them.

The virtualenv tool can be installed as follows:

$ wget
$ python virtualenv

While you’re at it, you might also want to install nose by doing

$ python nose

since running unit tests sometimes doesn’t work if you don’t install this module manually. Once you got virtualenv installed, you can create and activate a virtual Python environment as follows:

$ mkdir ~/envs
$ virtualenv ~/envs/dumbo
$ source ~/envs/dumbo/bin/activate

You then get a slightly different prompt to remind you that you’re using the isolated virtual environment:

(dumbo)$ which python
(dumbo)$ deactivate
$ which python

Such a virtual environment can be very convenient for developing and debugging Dumbo:

$ source ~/envs/dumbo/bin/activate
(dumbo)$ git clone git://
(dumbo)$ cd dumbo
(dumbo)$ python test  # run unit tests
(dumbo)$ python develop  # install symlinks
(dumbo)$ which dumbo
(dumbo)$ cd examples
(dumbo)$ dumbo start -input brian.txt -output out.code
(dumbo)$ dumbo cat out.code | head -n 2
A       2
And     4

Anything you change to the source code will then immediately affect the behavior of dumbo in the virtual environment, and none of this interferes with your global Python installation in any way. Soon enough, you’ll start wondering how you ever managed to live without virtual Python environments.

Note that running on Hadoop won’t work when you installed Dumbo via python develop. The develop command installs symlinks to the source files (such that you don’t have to run it after each change when you’re developing), but in order to be able to run on Hadoop an egg needs to be generated and installed, which is precisely what python install does.