Isn’t collaboration great? Whether you are a part of a community driven open-source project, or an over-worked design team whose souls have long since been crushed, you can be proud of being part of something bigger!
While development work often falls between these two extremes, package and extended library management is a constant to always be mindful of. Although personal projects grant free reign to enact every bad programming practice your fingers can muster, subjecting contributors and co-workers to your wrath is not without consequence.
Managing packages and keeping track of dependencies becomes increasingly difficult as a project grows in size. While introductory programming resources and texts often do a great job of covering programming concepts, areas related to project organization, and working in a development environment as whole, often don’t fit within the context of what is being taught. Some aspects of these topics really can’t be taught, but can only be learned through experience. There’s nothing mystical about this subject; it’s just made up of concepts that are hard to quantify in a clearly structured way.
This series on virtual environments aims to introduce the more technical concepts related to project management. The importance of this topic is defined is only defined by the needs of the project you are a part of, and my intent is to express the importance of these concepts and utilities by conveying their usefulness. This topic, however is only a small part of the whole; even though effective project management goes well beyond package and dependency management, it’s a part of the process, and incidentally, the topic of this post.
Virtualization, at it’s core, is the process of emulating something physical in a non-physical way. Virtual machines, for example, create an instance of an operating system that runs off of virtualized hardware. While this virtual hardware is derived from the physical components that make up your computer, its implementation is achieved through virtualization process. In that respect, virtualization can be thought of as the process that makes something virtual.
Unfortunately, this explanation isn’t very clear because the term “virtual” has become almost synonymous with “computer”. Personally, I like to think of virtual environments as a container that holds a workspace of sorts. While the virtualization process itself varies depending on the context of the virtual environment, the purpose of a virtual environment doesn’t really change. Similar to virtual machines, which operate alongside a pre-established OS, a programming virtual environment can operate alongside a pre-established development environment.
This series is split into three sections:
- Part 1 provides an overview of virtual environments, along with practical applications as well
- Part 2A covers virtualenv, a Python specific tool used to implement virtual environments.
- Part 2B covers Conda, a language independent tool that is both a package manager, and a virtual environment manager.
While all three parts were originally in one single article, it length became problematic. The problem wasn’t the information itself; the issue was putting it all in one place. Parts 1 and 2A are being released together, and part 2B will be released as soon as it’s finished.
Collaboration and Project Management
Python’s already extensive library can be expanded through thousands of user made libraries and packages. Many projects incorporate at least one of these packages, which in turn creates a dependency. While keeping track of all dependencies specific to a project is a problem, it’s not the problem.
Every package has a version number, and most packages are continually updated. While updates are usually beneficial, changes to a package can create conflicts with other packages and/or Python itself . Git may address file consistency across multiple contributors, but how do you address consistency across multiple development environments as well? For most projects with more than a single contributer, the following are important considerations:
- Everyone is developing and testing code using the same version of Python
- Everyone has all external package dependencies installed
- Those external packages are the correct version
Keeping track of that manually is not only error prone, but unmaintainable depending on how many external libraries are required. An even bigger concern is accounting for completely unrelated libraries that are part of a contributor’s system that might also introduce compatibility issues. An additional consideration, for open-source projects at least; contributors can’t be expected to limit their only development environment to the scope of one project.
Under the Hood
A Python virtual environment, at the most basic level, will include some version of Python, pip, and setuptools. These environments function alongside a user’s main python environment, allowing for different Python versions and packages to be installed alongside each other. A Python 2 virtual environment will work just fine in a system with only Python 3 installed. Virtual environments must be explicitly activated, so they only “take over” when instructed to do so.
Activating a virtual environment invokes a configuration change (of sorts) that temporarily alters the path variables related to the Python interpreter. When I activate a virtual environment, I’m not so much creating a new process, I’m just pointing my system to an isolated version of Python. While different tools create virtual environment in different ways, the result is a folder containing copies of, or links to the required version of Python, along with any external dependencies.
When a virtual environment is activated, any installed packages are installed to only that environment. Because each environment is effectively isolated from all other environments, they can be quickly created, deleted, or changed to suit a project’s needs. The utilities used to manage virtual environments allow users to define working environments for specific projects. This allows contributors to easily create isolated environments that match the working environments established by project authors.
While being able to share a working development environment is extremely beneficial, use case extends to other areas as well. For example, virtual environments could be created to test the following:
- Addition or removal of packages and libraries
- Version compatibility between packages
- Checks against older/newer versions of Python
The process of actually defining a development environment for all project contributors is accomplished by exporting the names of all installed packages (and their version numbers) to a
.yaml file. Contributors can download this file and use a local package manager to read the contents and automatically install all dependencies to a local environment.
Creating a virtual environment is handled by an environment manager, while importing and exporting environments are handled by a package manager. Part-2A and Part-2B discuss the utilities used to either create or import/export a virtual environment. A python specific solution can be accomplished through it’s package manager pip, along with virtualenv package. A language-independent approach can be achieved through conda, which manages both packages and virtual environments. Part 2A and 2B cover using virtualenv and conda in respect to the following:
- Creating environments
- Using and managing environments
Links to those articles can be found here as those articles become available:
- Part 2A – Virtualenv
- Part 2B – Conda