Using TensorFlow
The TensorFlow platform allows you to implement best practices for data automation, model tracking, performance monitoring, and model retraining. Using production-level tools to automate and track model training over the lifetime of a product, service, or business process is critical to success.
Running Programs Interactively
To run your program interactively using the TensorFlow image, follow the next steps:
-
Launch interactive shell on compute nodes using the command line or OnDemand Interactive Shell(Tmux). Assign GPU resources, if you’re going to run your program on GPUs.
-
Create your program.
-
Run your program using the built TensorFlow image.
Examples
This section shows examples to run programs using TensorFlow SIF image on CPUs and GPUs.
Example: Run a program on CPU
The first step is to build the appropriate image.
apptainer build tensorflow.sif docker://tensorflow/tensorflow
Then, create your Python program with the following content:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
predictions = model(x_train[:1]).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
The above program Loads and prepare the MNIST dataset. It converts the sample data from integers to floating-point numbers. Then, it builds a machine learning model and trains and evaluates the model.
To run your program actively, launch an interactive shell on compute nodes using the command line:
srun -n 1 -c 2 --mem-per-cpu 5g -p normal -t 01:00:00 --pty /bin/bash
where -n
is number of tasks and -c
is number of CPUs per task. This command will assign only CPUs, this means that your program will run on CPUs.
The last step is to run your program using the SIF image that has been created before.
apptainer exec tensorflow.sif python tenorExample.py
Example: Run a program on GPU
The first step is to build the appropriate image.
apptainer build tensorflowGPU.sif docker://tensorflow/tensorflow:latest-gpu
To run your program actively, launch an interactive shell on compute nodes using the command line:
srun -n 1 --gpus-per-task 2 --mem-per-gpu 5g -p interactive -t 04:00:00 --pty /bin/bash
This command will allocate only GPUs, this means that your program will run on GPUs.
The last step is to run your program using the SIF image that has been created before.
apptainer exec --nv tensorflowGPU.sif python tenorExample.py
--nv
is an Apptainer flag required to enable NVIDIA support.
Submission Script
You can also integrate the TensorFlow container into a SLURM submission script. For example, to run the above Python script in a submission script, create a submission script with the following content:
#!/bin/bash
## Slurm Directives
#SBATCH --job-name=tensorflow
#SBATCH --output=modle-%j.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
##SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=5G
#SBATCH -p normal
#SBATCH --time 00:10:00
## Load modules
## Run the program using use 'srun'.
srun apptainer exec /path/to/tensorflow.sif python /path/to/tenorExample.py
Then, submit your script and check the output file. For more information about SLURM commands and submission scripts, see Slurm Commands.
To run your program on GPUs, use the following script:
#!/bin/bash
## Slurm Directives
#SBATCH --job-name=tensorflow
#SBATCH --output=modle-%j.out
#SBATCH --ntasks=1
#SBATCH --gpus-per-task=1
##SBATCH --ntasks-per-node=1
#SBATCH --mem-per-gpu=5G
#SBATCH -p normal
#SBATCH --time 00:10:00
## Load modules
## Run the program using use 'srun'.
srun apptainer exec --nv /path/to/tensorflowGPU.sif python /path/to/tenorExample.py
gpus-per-task
to assign GPUs, and --mem-per-gpu
is the amount of memory per GPU.