Using TensorFlow

The TensorFlow platform allows you to implement best practices for data automation, model tracking, performance monitoring, and model retraining. Using production-level tools to automate and track model training over the lifetime of a product, service, or business process is critical to success.

Running Programs Interactively

To run your program interactively using the TensorFlow image, follow the next steps:

Launch interactive shell on compute nodes using the command line or OnDemand Interactive Shell(Tmux). Assign GPU resources, if you’re going to run your program on GPUs.
Create your program.
Run your program using the built TensorFlow image.

Examples

This section shows examples to run programs using TensorFlow SIF image on CPUs and GPUs.

Example: Run a program on CPU

The first step is to build the appropriate image.

apptainer build tensorflow.sif docker://tensorflow/tensorflow

Then, create your Python program with the following content:

 import tensorflow as tf
 mnist = tf.keras.datasets.mnist
 mnist = tf.keras.datasets.mnist
 (x_train, y_train), (x_test, y_test) = mnist.load_data()
 x_train, x_test = x_train / 255.0, x_test / 255.0

 model = tf.keras.models.Sequential([
   tf.keras.layers.Flatten(input_shape=(28, 28)),
   tf.keras.layers.Dense(128, activation='relu'),
   tf.keras.layers.Dropout(0.2),
   tf.keras.layers.Dense(10)
 ])
 predictions = model(x_train[:1]).numpy()
 loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
 loss_fn(y_train[:1], predictions).numpy()

 model.compile(optimizer='adam',
                loss=loss_fn,
                metrics=['accuracy'])
  model.fit(x_train, y_train, epochs=5)
  model.evaluate(x_test,  y_test, verbose=2)

The above program Loads and prepare the MNIST dataset. It converts the sample data from integers to floating-point numbers. Then, it builds a machine learning model and trains and evaluates the model.

To run your program actively, launch an interactive shell on compute nodes using the command line:

srun -n 1 -c 2 --mem-per-cpu 5g  -p normal -t 01:00:00 --pty /bin/bash

where -n is number of tasks and -c is number of CPUs per task. This command will assign only CPUs, this means that your program will run on CPUs.

The last step is to run your program using the SIF image that has been created before.

apptainer exec tensorflow.sif python tenorExample.py

Example: Run a program on GPU

The first step is to build the appropriate image.

apptainer build tensorflowGPU.sif docker://tensorflow/tensorflow:latest-gpu

To run your program actively, launch an interactive shell on compute nodes using the command line:

srun -n 1 --gpus-per-task 2 --mem-per-gpu 5g  -p interactive -t 04:00:00 --pty /bin/bash

This command will allocate only GPUs, this means that your program will run on GPUs.

The last step is to run your program using the SIF image that has been created before.

apptainer exec --nv tensorflowGPU.sif python tenorExample.py

--nv is an Apptainer flag required to enable NVIDIA support.

Submission Script

You can also integrate the TensorFlow container into a SLURM submission script. For example, to run the above Python script in a submission script, create a submission script with the following content:

#!/bin/bash

## Slurm Directives
#SBATCH --job-name=tensorflow
#SBATCH --output=modle-%j.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
##SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=5G
#SBATCH -p normal
#SBATCH --time 00:10:00

## Load modules

## Run the program using use 'srun'.
srun apptainer exec /path/to/tensorflow.sif python /path/to/tenorExample.py

Then, submit your script and check the output file. For more information about SLURM commands and submission scripts, see Slurm Commands.

To run your program on GPUs, use the following script:

#!/bin/bash

## Slurm Directives
#SBATCH --job-name=tensorflow
#SBATCH --output=modle-%j.out
#SBATCH --ntasks=1
#SBATCH --gpus-per-task=1
##SBATCH --ntasks-per-node=1
#SBATCH --mem-per-gpu=5G
#SBATCH -p normal
#SBATCH --time 00:10:00

## Load modules

## Run the program using use 'srun'.
srun apptainer exec --nv /path/to/tensorflowGPU.sif python /path/to/tenorExample.py

gpus-per-task to assign GPUs, and --mem-per-gpu is the amount of memory per GPU.