Why virtualenvwrapper is (Mostly) Not Written In Python¶
If you look at the source code for virtualenvwrapper you will see that
most of the interesting parts are implemented as shell functions in
virtualenvwrapper.sh
. The hook loader is a Python app, but doesn’t
do much to manage the virtualenvs. Some of the most frequently asked
questions about virtualenvwrapper are “Why didn’t you write this as a
set of Python programs?” or “Have you thought about rewriting it in
Python?” For a long time these questions baffled me, because it was
always obvious to me that it had to be implemented as it is. But they
come up frequently enough that I feel the need to explain.
tl;dr: POSIX Made Me Do It¶
The choice of implementation language for virtualenvwrapper was made for pragmatic, rather than philosophical, reasons. The wrapper commands need to modify the state and environment of the user’s current shell process, and the only way to do that is to have the commands run inside that shell. That resulted in me writing virtualenvwrapper as a set of shell functions, rather than separate shell scripts or even Python programs.
Where Do POSIX Processes Come From?¶
New POSIX processes are created when an existing process invokes the
fork()
system call. The invoking process becomes the “parent” of
the new “child” process, and the child is a full clone of the
parent. The semantic result of fork()
is that an entire new copy
of the parent process is created. In practice, optimizations are
normally made to avoid copying more memory than is absolutely
necessary (frequently via a copy-on-write system). But for the
purposes of this explanation it is sufficient to think of the child as
a full replica of the parent.
The important parts of the parent process that are copied include
dynamic memory (the stack and heap), static stuff (the program code),
resources like open file descriptors, and the environment variables
exported from the parent process. Inheriting environment variables is
a fundamental aspect of the way POSIX programs pass state and
configuration information to one another. A parent can establish a
series of name=value
pairs, which are then given to the child
process. The child can access them through functions like
getenv()
, setenv()
(and in Python through os.environ
).
The choice of the term inherit to describe the way the variables and their contents are passed from parent to child is significant. Although a child can change its own environment, it cannot directly change the environment settings of its parent because there is no system call to modify the parental environment settings.
How the Shell Runs a Program¶
When a shell receives a command to be executed, either interactively
or by parsing a script file, and determines that the command is
implemented in a separate program file, it uses fork()
to create a
new process and then inside that process it uses one of the exec
functions to start the specified program. The language that program is
written in doesn’t make any difference in the decision about whether
or not to fork()
, so even if the “program” is a shell script
written in the language understood by the current shell, a new process
is created.
On the other hand, if the shell decides that the command is a function, then it looks at the definition and invokes it directly. Shell functions are made up of other commands, some of which may result in child processes being created, but the function itself runs in the original shell process and can therefore modify its state, for example by changing the working directory or the values of variables.
It is possible to force the shell to run a script directly, and not in
a child process, by sourcing it. The source
command causes the
shell to read the file and interpret it in the current process. Again,
as with functions, the contents of the file may cause child processes
to be spawned, but there is not a second shell process interpreting
the series of commands.
What Does This Mean for virtualenvwrapper?¶
The original and most important features of virtualenvwrapper are
automatically activating a virtualenv when it is created by
mkvirtualenv
and using workon
to deactivate one environment
and activate another. Making these features work drove the
implementation decisions for the other parts of virtualenvwrapper,
too.
Environments are activated interactively by sourcing bin/activate
inside the virtualenv. The activate
script does a few things, but
the important parts are setting the VIRTUAL_ENV
variable and
modifying the shell’s search path through the PATH
variable to put
the bin
directory for the environment on the front of the
path. Changing the path means that the programs installed in the
environment, especially the python interpreter there, are found before
other programs with the same name.
Simply running bin/activate
, without using source
doesn’t work
because it sets up the environment of the child process, without
affecting the parent. In order to source the activate script in the
interactive shell, both mkvirtualenv
and workon
also need to
be run in that shell process.
Why Choose One When You Can Have Both?¶
The hook loader is one part of virtualenvwrapper that is written in Python. Why? Again, because it was easier. Hooks are discovered using setuptools entry points, because after an entry point is installed the user doesn’t have to take any other action to allow the loader to discover and use it. It’s easy to imagine writing a hook to create new files on the filesystem (by installing a package, instantiating a template, etc.).
How, then, do hooks running in a separate process (the Python interpreter) modify the shell environment to set variables or change the working directory? They cheat, of course.
Each hook point defined by virtualenvwrapper actually represents two hooks. First, the hooks meant to be run in Python are executed. Then the “source” hooks are run, and they print out a series of shell commands. All of those commands are collected, saved to a temporary file, and then the shell is told to source the file.
Starting up the hook loader turns out to be way more expensive than most of the other actions virtualenvwrapper takes, though, so I am considering making its use optional. Most users customize the hooks by using shell scripts (either globally or in the virtualenv). Finding and running those can be handled by the shell quite easily.
Implications for Cross-Shell Compatibility¶
Other than requests for a full-Python implementation, the other most common request is to support additional shells. fish comes up a lot, as do various Windows-only shells. The officially Supported Shells all have a common enough syntax that the same implementation works for each. Supporting other shells would require rewriting much, if not all, of the logic using an alternate syntax – those other shells are basically different programming languages. So far I have dealt with the ports by encouraging other developers to handle them, and then trying to link to and otherwise promote the results.
Not As Bad As It Seems¶
Although there are some special challenges created by the the
requirement that the commands run in a user’s interactive shell (see
the many bugs reported by users who alias common commands like rm
and cd
), using the shell as a programming language holds up quite
well. The shells are designed to make finding and executing other
programs easy, and especially to make it easy to combine a series of
smaller programs to perform more complicated operations. As that’s
what virtualenvwrapper is doing, it’s a natural fit.
See also
Advanced Programming in the UNIX Environment by W. Richard Stevens & Stephen A. Rago
Fork (operating system) on Wikipedia
Environment variable on Wikipedia