Conda Virtual Environments

Virtual Environments are a set of isolated packages containing binaries, libraries, configuration, and data that are linked together.

Creating virtual environments gives you the ability to have a set of custom packages combined specifically to meet your projects needs.

Whats the point?

Assume that you have two separate programs with each using different versions of python and/or other libraries. Given a situation like this, rather than always modifying your program to meet the several version requirements that your project depends on, you can create multiple virtual environments(one for each variant of code/program) to serve this purpose so as to keep things clean and efficient.

Creating a Virtual Environment

To get started, first you need to load the Conda module which is a tool that aims to simplify package management and deployment of data science and machine learning tools.

1. Login to Discovery and run

module load conda

2. Check out existing virtual environments

Before creating conda virtual environments on Discovery, there are a few list of already created virtual environments you can use right out of the box which are tailored to suit specific project needs. Some of these virtual environments are TensorFlow (with GPU support), PyTorch and QIIME. But if you require none of those, you can go ahead and create yours.

Show the list of all existing environments

conda env list

Output

# conda environments:
#
base                  *  /software/anaconda/anaconda3
alfalfa_gbs              /software/anaconda/anaconda3/envs/alfalfa_gbs
amptk-1.4.2              /software/anaconda/anaconda3/envs/amptk-1.4.2
hsc_prediction           /software/anaconda/anaconda3/envs/hsc_prediction
pytorch                  /software/anaconda/anaconda3/envs/pytorch
qiime2-2019.10           /software/anaconda/anaconda3/envs/qiime2-2019.10
redbiom                  /software/anaconda/anaconda3/envs/redbiom
soilsystems              /software/anaconda/anaconda3/envs/soilsystems
tensorflow-1.15.0        /software/anaconda/anaconda3/envs/tensorflow-1.15.0
tensorflow-2.0.0         /software/anaconda/anaconda3/envs/tensorflow-2.0.0
tensorflow-gpu-1.15.0    /software/anaconda/anaconda3/envs/tensorflow-gpu-1.15.0
tensorflow-gpu-2.0.0     /software/anaconda/anaconda3/envs/tensorflow-gpu-2.0.0

The output above contains a list of virtual environments on Discovery and their respective locations. The asterisk * symbol on the base environment specifies that the base is the current active virtual environment. Actually, the base is the default VE.

3. Create the environment

syntax: conda create -n <your_environment> --no-channel-priority -c <channel_name> <package_name>

This command only creates the environment without installing any packages.

$ conda create -n my_env

OR

Creating a Conda Environment Using a YAML File.

You can specify all the packages and their versions in a YAML file. This is particularly useful for sharing environments or ensuring reproducibility.

Here’s an example YAML:

name: myenv
channels:
  - defaults
dependencies:
  - numpy
  - pandas
  - matplotlib

Once you have a YAML file (here using myenv.yml as example), you can run

$ conda env create -f myenv.yml

After that, the environment myenv is ready to be activated.

OR

  • R

  • Python

This command creates the environment and also installs some essential packages for working with R, all on one line.

$ conda create -n my_env -c r r-essentials

The my_env is the name of the virtual environment, r is the specified channel to install the r-essentials package from.

This command creates the environment and also installs some essential packages for working with Python, all on one line. You can also specify the version of python you would like to install.

$ conda create -n my_env python=3.6

The my_env is the name of the virtual environment, python is the specified package and version you want to install in this new environment.

Flags Explained

Flag

Description

-n or --name

The name of the virtual environment

-c or --channel

Conda packages are downloaded from remote channels, which are URLs to directories containing the conda packages.

Now, conda will take a little while to search for the package(s) you specified to download and install it to your home directory /home/yourusername/.conda/envs/my_env. After you get the prompt Proceed ([y]/n)?, please hit the y and then the Enter key afterward to continue with the installation.

Once this phase is done the end of the output printed on your console should look like the one below.

...

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate my_env
#
# To deactivate an active environment, use
#
#     $ conda deactivate

4. Activate your newly created environment

To start using the packages installed in your environment, you have to activate the environment you just created using the command below.

syntax: conda activate <your_environment>

$conda activate my_env

After the environment has been activated, you would notice your shell prompt on the console changed from:

[yourusername@discovery-l1 ~]$

to

(my_env) [yourusername@discovery-l1 ~]$

This shows that you are currently in the my_env workspace. Also when you run the command conda env list, you should see the asterisk * symbol on the my_env line.

5. Show the list of installed packages

Use the conda list command to show the list of packages installed.

(my_env) [yourusername@discovery-l1 ~]$ conda list

You should get an output like the one below.

  • R

  • Python

...

r-xml2                    1.2.0             r36h29659fb_0
r-xtable                  1.8_4             r36h6115d3f_0
r-xts                     0.11_2            r36h96ca727_0
r-yaml                    2.2.0             r36h96ca727_0
r-zoo                     1.8_6             r36h96ca727_0

...
...

pip                       20.1.1                   py36_1
python                    3.6.10               h7579374_2
readline                  8.0                  h7b6447c_0
setuptools                49.2.0                   py36_0
sqlite                    3.32.3               h62c20be_0
tk                        8.6.10               hbc83047_0

...

Instead of showing the entire list, you can use the grep command to quickly search and verify if a given package is installed.

(my_env) [yourusername@discovery-l1 ~]$ conda list | grep -i wheel

If after going through the installed packages, and you discovered that the package you want isn’t part of the installed essentials. Then, you can search for the package on the conda channels and install afterward.

6. Installing additional packages to your environment

For the R programming example, search for and install the package → R devtools. It makes your life as a package developer easier by providing R functions that simplify many common tasks.

For the Python programming example, search for and install the package → Scipy, a free and open-source Python library used for scientific computing and technical computing

  • Search for the package(s):

syntax: conda search <package_name(s)>

  • R

  • Python

(my_env) [yourusername@discovery-l1 ~]$ conda search r-devtools
(my_env) [yourusername@discovery-l1 ~]$ conda search scipy

The output of the R example above should print out a list of r-devtools versions along side their channels. The Python example should print a list of scipy versions as well as their channels respectively. Whether you require the latest version of the package or not, choose the version that suits your purpose and also specify it’s channel.

  • Search for the package(s) by channel:

syntax: conda search -c <channel> <package_name(s)>

  • R

  • Python

(my_env) [yourusername@discovery-l1 ~]$ conda search -c r -c conda-forge r-devtools
(my_env) [yourusername@discovery-l1 ~]$ conda search -c conda-forge scipy
  • In the R example, notice that the multiple channels r and conda-forge are specified meaning that it looks for the devtools package on both the channels. This is because either channel could have the most recent version of the package you am looking for.

The first -c argument is of higher priority than the second, therefore priority decreases from left to right. conda-forge is a community channel made up of thousands of contributors.

Although specifying the channel is optional, it remains a good practice to specify the --channel or -c flag because it shows you a list of various versions of that package.

  • Install the package(s):

syntax: conda install <package(s)>

  • R

  • Python

(my_env) [yourusername@discovery-l1 ~]$ conda install r-devtools
(my_env) [yourusername@discovery-l1 ~]$ conda install scipy

Note that when installing a package, conda would also install all the dependencies required for that given package.

7. Run the installed program

  • R

  • Python

  • Next, launch the R CLI by typing the letter R on your console. Notice the uppercase.

    (my_env) [yourusername@discovery-l1 ~]$ R

    Now you should get an output like the one below.

    R version 3.6.1 (2019-07-05) -- "Action of the Toes"
    Copyright (C) 2019 The R Foundation for Statistical Computing
    Platform: x86_64-conda_cos6-linux-gnu (64-bit)
    
    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.
    
    Natural language support but running in an English locale
    
    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.
    
    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.
    
    >|
  • Next, update to the latest version of devtools.

    > devtools::install_github("hadley/devtools")

    If everything works correctly, you should be presented with a list of packages that have more recent versions available for update. Next, select one, or more numbers, for the package you intend to update, or simply select 1 to update all packages.

  • Next, launch the python CLI by typing python on your console.

    (my_env) [yourusername@discovery-l1 ~]$ python

    Now you should get an output like the one below.

    Python 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>>

    Notice the version of python installed is 3.6>

  • Next, import and use the scipy module.

    >>> from scipy.special import cbrt
    >>> cb = cbrt([27, 64])
    >>> print(cb)
    
    [3. 4.]

    The output is [3. 4.].

Exiting a conda environment

$ conda deactivate

Searching for a conda package

Syntax: conda search <package_name>

Example command to search for r-devtools

$ conda search r-devtools

Cloning an existing conda virtual environment

There are occasions where you’d want to install extra packages to an already existing virtual environment. However, you may not have the necessary permissions to do so, your best bet would be to create a new environment.

Rather than recreating the entire virtual environment which includes re-downloading the required dependencies, you can simply clone the existing virtual environment and activate it. Then, you can install your own packages to it so as to save you some time as well as cutting-off the extra work.

Syntax:

conda --name <your_custom_environment_name> --clone

$ conda create --name myproject --clone my_env

The my_env is the name of the already existing environment you intend cloning and myproject is the name of the new environment you are cloning into.

Exporting Your Environment as YAML File

You can export the current state of the environment to a YAML file using:

$ conda env export > current_env.yml

Then, you will find current_env.yml file in your current working directory.

Removing a Package From a conda Environment

Syntax:

conda remove --name <your_custom_environment_name> <package_name>

$ conda remove --name my_env scipy

The my_env is the name of the already existing environment and scipy is the name of the package which will be removed from the environment. This uninstalls the package together with its dependencies.

Delete an environment and everything in it

If you want to destroy a given virtual environment that you created, you can use the command below which removes the environment and all the packages in it.

Syntax: conda env remove --name <your_custom_environment_name>

$ conda env remove --name my_env

Installing local packages in R

Local packages are packages that aren’t part of the packages included in Conda package ​​​​​​repository but are available elsewhere to be downloaded and installed manually.

In the below example, an R package called GWASploy was downloaded which is used for genome-wide association studies in Autopolyploids (and Diploids).

  1. Login to Discovery.

  2. Load the conda module

    module load conda
  3. Download the package to your home directory (For Example, /home/your-username)

    $ wget https://potatobreeding.webhosting.cals.wisc.edu/wp-content/uploads/sites/161/2016/08/GWASpoly_download.zip
  4. Unzip the downloaded file and copy the file with .tar.gz extension to your home directory

    Notice the trailing period at the end of line two.

    $  unzip GWASpoly_download.zip
    $  cp GWASpoly_download/GWASpoly_1.3.tar.gz .
  5. Install the package

    R --slave -e "install.packages('/home/yourusername/GWASpoly_1.3.tar.gz', dependencies = TRUE, repos=NULL, method='libcurl')"

    If you get a warning message about the non-existence of a given dependency, please search for and install the given dependency and run step 3 again.

Managing Python Packages with pip

In the below example, the python camelcase package which capitalizes the first letter of each word was downloaded using pip command.

  1. Log in to Discovery.

  2. Load the conda module

    module load conda
  3. Activate the virtual environment

    conda activate my_env
  4. Use the pip command to install the camelcase package

    pip install camelcase
  5. After the successful installation of the package, launch the python CLI and paste the codes below line after line.

    >>> import camelcase
    >>> c = camelcase.CamelCase()
    >>> txt = "hello world"
    >>> print(c.hump(txt))
    
    Hello World

    Thus, the text hello world is converted to a Camel case format.

Using Conda virtual environments in your Slurm script

The example in this section assumes that you have carried out the steps in Installing packages with python pip

After creating your virtual environment, you can use it in your Slurm script because your program depends on the packages contained in the environment. In your Slurm script, there are two lines you’d want to add right after the declaration of Slurm directives, module load conda and conda activate my_env. my_env is the name of the virtual environment which was created earlier.

  1. Log in to Discovery.

  2. Create a file called script.sh and then copy and paste the code below and save afterward.

    #!/bin/bash
    
    #SBATCH --job-name=CamelCase   		## Name of the job
    #SBATCH --output=CamelCase.out  	 ## Output file
    #SBATCH --time=10:00           		## Job Duration
    #SBATCH --ntasks=1             		## Number of tasks (analyses) to run
    #SBATCH --cpus-per-task=1      		## The number of threads the code will use
    #SBATCH --mem-per-cpu=100      		## Real memory(MB) per CPU required by the job.
    
    ## Load the python interpreter
    module load conda
    conda activate my_env
    
    ## Execute the python script
    srun -n 1 python program.py

    On line 11, the conda module has been loaded. On line 12, the custom conda environment, my_env has been activated which contains the packages and dependencies, that the project requires to run.

  3. Create a file called program.py and then copy and paste the code below and save afterward.

    import camelcase
    
    c = camelcase.CamelCase()
    
    txt = "hello world"
    
    print(c.hump(txt))
  4. Make the batch script executable

    chmod +x script.sh
  5. Submit the batch script

    sbatch script.sh

    Output If you show the content of the output file CamelCase.out, you should see a result like the one below on your console.

    Hello World

    Thus, the text hello world is converted to a Camel Case format.