Python Example

This tutorial shows you how to create a simple Python project with Pixi. Pixi supports two manifest formats: pixi.toml and pyproject.toml. This tutorial will use the pixi.toml. The pixi.toml is the Pixi project configuration file, also known as the project manifest. This tutorial will create a deep learning project that classifies the is a Convolutional Neural Network (CNN) built using PyTorch to classify images from the CIFAR-10 dataset, which consists of 60,000 32x32 color images across 10 classes (like planes, cars, birds, etc.) using Pixi.

Create pixi.toml File

To use Pixi on Discovery, you need to load its module.

module load pixi

Then, create a pixi.toml file by running the following command:

$ pixi init pixi_pytorch
✔ Created /fs1/project/hpcteam/tahat/pixi/pixi_pytorch/pixi.toml

This will create a folder named pixi_pytorch and a pixi.toml file that contains the following:

[project]
authors = ["Mohammad <tahat@nmsu.edu>"]
channels = ["conda-forge"]
description = "Add a short description here"
name = "pixi_pytorch"
platforms = ["linux-64"]
version = "0.1.0"

[tasks]

[dependencies]

You can update the options of the project table. For example, update the description with appropriate description.

The channels and platforms are added to the [project] section. Conda-forge manages packages similar to PyPI, but it offers a wider range of packages for different programming languages. The platforms keyword dictates the supported platform for the project.

Then, change directory to project directory pixi_pytorch.

cd pixi_pytorch

Add Project Dependency

This project depends on two software packages: cuda, pytorch and torchvision. To add these dependencies, run the following command:

$  pixi add cuda pytorch torchvision
✔ Added cuda >=12.6.1,<13
✔ Added pytorch >=2.4.0,<3
✔ Added torchvision >=0.19.0,<0.20

pixi add adds dependencies to the pixi.toml. It will only add the dependency if the package with its version constraint is able to work with rest of the dependencies in the project. == Install Dependencies Install the dependencies you added in the previous section by running the following command:

pixi install
✔ The default environment has been installed.

Using pixi list, you can see what’s in the environment.

$ pixi list
...
c-compiler                            1.7.0         hd590300_1                   6.2 KiB    conda  c-compiler-1.7.0-hd590300_1.conda
ca-certificates                       2024.8.30     hbcca054_0                   155.3 KiB  conda  ca-certificates-2024.8.30-hbcca054_0.conda
cuda                                  12.6.1        ha804496_0                   26.1 KiB   conda  cuda-12.6.1-ha804496_0.conda
cuda-cccl_linux-64                    12.6.37       ha770c72_0                   1 MiB      conda  cuda-cccl_linux-64-12.6.37-ha770c72_0.conda
...
python_abi                            3.12          5_cp312                      6.1 KiB    conda  python_abi-3.12-5_cp312.conda
pytorch                               2.4.0         cpu_generic_py312h1576ffb_1  24.6 MiB   conda  pytorch-2.4.0-cpu_generic_py312h1576ffb_1.conda
readline                              8.2           h8228510_1                   274.9 KiB  conda  readline-8.2-h8228510_1.conda
sleef                                 3.6.1         h1b44611_3                   1.8 MiB    conda  sleef-3.6.1-h1b44611_3.conda
sympy                                 1.13.2        pypyh2585a3b_103             4.4 MiB    conda  sympy-1.13.2-pypyh2585a3b_103.conda
sysroot_linux-64                      2.17          h4a8ded7_16                  14.8 MiB   conda  sysroot_linux-64-2.17-h4a8ded7_16.conda
tk                                    8.6.13        noxft_h4845f30_101           3.2 MiB    conda  tk-8.6.13-noxft_h4845f30_101.conda
torchvision                           0.19.0        cpu_py312hdb59fe3_0          10 MiB     conda  torchvision-0.19.0-cpu_py312hdb59fe3_0.conda
typing_extensions                     4.12.2        pyha770c72_0                 39 KiB     conda  typing_extensions-4.12.2-pyha770c72_0.conda
...

When installing environments on old versions of Linux. You may encounter the following error:

× The current system has a mismatching virtual package. The project requires '__linux' to be at least version '5.10' but the system has version '4.18.0'

To fix this, edit the pixi.toml file and add the following to lower their system requirements for the project:

[system-requirements]
linux = "4.12.14"

Create Python Script

Create a Python script and name it image_classifier.py in your desired folder, and add the following code to it:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
# Set the device to GPU if available, otherwise CPU
print(torch.version.cuda)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
print(torch.cuda.is_available())
# Define the transformation to normalize the data
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert the image to a PyTorch tensor
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize the images
])

# Load the CIFAR-10 training dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=100,
                                          shuffle=True, num_workers=2)

# Load the CIFAR-10 test dataset
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=100,
                                         shuffle=False, num_workers=2)

# Define the classes in CIFAR-10
classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # First convolutional layer: input channels = 3, output channels = 32, kernel size = 3
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        # Second convolutional layer: input channels = 32, output channels = 64, kernel size = 3
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        # Max pooling layer with a 2x2 window
        self.pool = nn.MaxPool2d(2, 2)
        # Fully connected layer: input features = 64*8*8, output features = 512
        self.fc1 = nn.Linear(64 * 8 * 8, 512)
        # Fully connected layer: input features = 512, output features = 10 (for 10 classes)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        # Apply conv1, followed by ReLU activation, then max pooling
        x = self.pool(torch.relu(self.conv1(x)))
        # Apply conv2, followed by ReLU activation, then max pooling
        x = self.pool(torch.relu(self.conv2(x)))
        # Flatten the feature map for the fully connected layer
        x = x.view(-1, 64 * 8 * 8)
        # Apply fc1 followed by ReLU activation
        x = torch.relu(self.fc1(x))
        # Apply the output layer (fc2)
        x = self.fc2(x)
        return x

# Instantiate the network and move it to the GPU
net = Net().to(device)
# Use CrossEntropyLoss which combines Softmax and Negative Log-Likelihood Loss
criterion = nn.CrossEntropyLoss()
# Use Stochastic Gradient Descent (SGD) with a learning rate of 0.001 and momentum of 0.9
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# Loop over the dataset multiple times (epochs)
for epoch in range(5):  # Train for 10 epochs
    running_loss = 0.0
    # Iterate over data in batches
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data  # Get the inputs and labels
        inputs, labels = inputs.to(device), labels.to(device)  # Move to GPU

        optimizer.zero_grad()  # Zero the parameter gradients
        outputs = net(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Compute the loss
        loss.backward()  # Backward pass
        optimizer.step()  # Optimization step

        running_loss += loss.item()
        if i % 100 == 99:  # Print every 100 mini-batches
            print(f'Epoch {epoch + 1}, Batch {i + 1}: loss {running_loss / 100:.3f}')
            running_loss = 0.0

print('Finished Training')
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)  # Move to GPU
        outputs = net(images)  # Forward pass
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total} %')

Create and Submit a Submission Script

To run the Python script on Discovery, create a submission script (sub_pixi.py) using Slurm in the same directory of the Python script, and add the following lines:

#!/bin/bash
#SBATCH --job-name=tensorflow
#SBATCH --output=modle-%j.out
#SBATCH --ntasks=1
#SBATCH --gpus-per-task=1
##SBATCH --ntasks-per-node=1
#SBATCH --mem-per-gpu=5G
#SBATCH -p normal
#SBATCH --time 00:30:00
#SBATCH --constraint=v100-32g
module purge
module load pixi/2024a

cd /fs1/project/hpcteam/tahat/pixi/pixi_pytorch
pixi run python image_classifier.py
You can also add a task and run that task in the submission script.

Then, submit the script using the following command:

sbatch sub_pixi.py
To run your GPU experiments on discovery-g1 make sure that the CUDA version is 11.

Then you can check the output of the job output file.