How I Learned to Start Worrying and Love Virtual Environments
I'm a data analyst so it probably won't surprise anyone "in the know" to learn I have written a fair amount of code that would distress software developers. I will admit to you now that I have practically eschewed virtual environments for far too long. Beyond some vague understanding that I should be using them and that it is considered polite to have a requirements.txt file available if you expect anyone else to run your code, I can say I didn't know or care much about them.
In moments where I felt less cavalier, I would dutifully type python -m venv project-venv
into the cmd and grumble through package installation while anxiously awaiting the fun part. More often, I'd just install everything directly into my base Python installation and simply hope it didn't turn into a whole thing later.
... Yeah, it's not a great look for me but let's just chalk it up to feeling dangerous.
I've finally taken the time to look into this though and I feel wiser now. It turns out my renegade attitude around virtual environments stemmed from ignorance about what is happening when you create a virtual environment and why anyone bothers to do it. I share what I've learned here for my own benefit and if it encourages or informs anyone else, I won't be mad about it.
I offer my usual caveat when explaining anything I learn to anyone: I hesitate to claim expertise in anything because I believe too much certainty is a dangerous thing. I would urge you, as should always be the case with random people telling you things on the internet, to only accept that which agrees with your own reason.
Onward and upward!
Installing things
Let's remember that when you install Python on your computer, nothing too crazy is happening. You're essentially just copying a bunch of files into a directory on your computer, one of those files being the Python executable.
You also may have noticed that you can have multiple Python versions installed on your computer at the same time. That is, if you have both Python 3.8 and Python 3.11, there will be two different folders associated with these two different versions of Python and Python executables within each are available to you. Only one of these versions will be your "default" version.
When you want to launch the Python shell or run a Python script, you don't need to type the path to your python.exe in order to do so. Just python
will do (which I am grateful for because almost always it is hidden away in a labyrinth of a path that was assigned at time of installation a year ago that I will probably never remember). This default version of Python is typically determined by your PATH variable - a useful environmental variable which specifies paths where executables on your machine reside including, yes, python.exe.
When you later go to install a library with pip install
, you are adding additional files to the directory that houses your active Python version. (This will be the default Python version if you are not intentionally calling another version of Python.) You are often installing a wealth of dependencies too. (That is, the functions that someone else wrote for you that you want to import? These functions themselves may rely on imported functions from yet another library and, yes, even those may rely on imported functions from yet another library and... so on and so forth.)
Version conflicts
Suppose that you write up a little application in Python. You install and import your favorite superawesome library, version 3.0, and use its super awesome tools that you didn't have to write because the open source community just made your job so much easier yet again. This little program is your greatest work so far; it's going to save everyone so much time. You happily move on with your life feeling great.
Later, you write up another separate application in Python and you want to use some of the new features added to superawesome library that are not available in version 3.0. You upgrade the library to version 3.6, you put on your favorite jams and get to work.
You still have use for your first program. It's your greatest work after all. When you go to run it, you find out that your application doesn't run anymore! The upgrading of superawesome to version 3.6 broke your greatest work. :(
Your old application needs version 3.0 of the superawesome library and your new application needs version 3.6 of the library but you can't have multiple versions of the superawesome library associated with the same Python installation so, what can we do here?
Resolving these conflicts
It's worth noting now that you can actually manipulate your PATH variable, adding and removing paths and if we think of a Python installation as just some bundle of files unpacked into a directory somewhere, we could propose the following solution.
Create two copies of our Python installation. They're just folders, right? Just... have two installations of Python at the ready, one in a "my-greatest-work" folder and install version 3.0 of superawesome library to that and have the other Python copy in a "my-new-application" folder and install 3.6 of superawesome library to it.
Then, when we want to run our first application, we can add the Python from "my-greatest-work" to our PATH variable before doing so, so we're using the version of Python with superawesome v3.0.
When we want to run our second application, we can remove the "my-greatest-work" Python path from our PATH variable and add "my-new-application" to our PATH variable so we're using Python with superawesome v3.6.
And then, if we want to run our first application again, we can --
Ugh, I don't know about you but I feel a little annoyed just considering doing that manually more than a handful of times. Thankfully, that is actually exactly what Python's venv library does for you. Yep, that's my big reveal for you.
When you create a virtual environment with python -m venv my-greatest-work-venv
in Windows, the "my-greatest-work-venv" folder is created and a copy of your Python installation is copied to that folder.
When you activate that environment (using my-greatest-work-venv\Scripts\activate
in Windows), venv sets your PATH to include the Python version in the virtual environment folder. When you deactivate the environment, the virtual environment folder is removed from the PATH variable. You can inspect your PATH before, during and after activation in the cmd by simply typing PATH
in the cmd at each step.
When you use pip install with this virtual environment active (either by leveraging a requirements.txt file or just manually installing your needed packages one by one), the packages you install will be associated with the Python installation inside your virtual environment folder.
Anytime you activate that environment, you get access to those packages. When you deactivate the environment, you'll be back to your default version of Python.
And?
For data analysts, I get the sense it's fairly common to not even think about the possibility of future conflicts. You install what you need as needed and play in Jupyter and if your code fails to run when you upgrade some library or another, that's a problem for future you. You could be dead by then.
To an extent, I think it is fair. I have done a lot of one-off narrative reporting in my time as a data analyst. We aren't always confronted with our crimes the way software engineers are; I have a dusty archive of forgotten .ipynb files I can gleefully say will never be looked at again. I think it is actually possible to throw every package you could possibly ever need into your source Python installation and have it never catch up with you. If you're not worried, go ahead and buy yourself a cool pair of sunglasses to don and flip off all the nerds who try to tell you what to do while you ride off on your motorcycle into the sunset.
But for those of us who do have code we aim to repeatedly use and code we'd like to share with our colleagues or stand up as portfolio projects, it feels a bit cheeky now to not be more thoughtful on this topic, doesn't it? It turns out that these software people know what they're talking about. Who knew? It's almost as if it is their job to think about these types of things.
Anyway, just taking the time to understand what venv is intended to solve makes it a lot more clear to me why you might want to leave your base Python installation clean and leverage virtual environments for each project instead.
What is also clear to me is how much time I have wasted fiddling with installation problems that could have been avoided by just taking a moment to keep track of my requirements but uh, that's neither here nor there; I fear I have revealed too many sins for one day.