Skip to content

Hydra Cluster: TensorFlow

Introduction

This document walks through a basic example of using TensorFlow on the Hydra cluster. You should read the main documentation first if you have not already done so. This is based around the Virtualenv example given in the TensorFlow documentation.

Installing TensorFlow

Starting out in your cluster home directory on myrtle or raptor you can create a new virtual environment. In this case, we'll call the directory tensorflow and use Python 3.

tdb@myrtle:/cluster/home/cur/tdb$ python3 -m venv --system-site-packages tensorflow
tdb@myrtle:/cluster/home/cur/tdb$

Now activate the environment. Notice the prompt changes to indicate you're inside the new virtual environment.

tdb@myrtle:/cluster/home/cur/tdb$ source tensorflow/bin/activate
(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$

Now install pip and then TensorFlow. Output is trimmed here for brevity. Please be patient - this step can take a while.

(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$ pip install --upgrade pip
Collecting pip
  Using cached pip-21.2.4-py3-none-any.whl (1.6 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Uninstalling pip-20.0.2:
      Successfully uninstalled pip-20.0.2
Successfully installed pip-21.2.4
(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$
(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$ pip install --ignore-installed --upgrade tensorflow
Collecting tensorflow
  Using cached tensorflow-2.6.0-cp38-cp38-manylinux2010_x86_64.whl (458.4 MB)
...
Successfully installed absl-py-0.14.1 astunparse-1.6.3 cachetools-4.2.4 certifi-2021.10.8 charset-normalizer-2.0.6 clang-5.0 flatbuffers-1.12 gast-0.4.0 google-auth-1.35.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.41.0 h5py-3.1.0 idna-3.2 keras-2.6.0 keras-preprocessing-1.1.2 markdown-3.3.4 numpy-1.19.5 oauthlib-3.1.1 opt-einsum-3.3.0 protobuf-3.18.1 pyasn1-0.4.8 pyasn1-modules-0.2.8 requests-2.26.0 requests-oauthlib-1.3.0 rsa-4.7.2 setuptools-58.2.0 six-1.15.0 tensorboard-2.6.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.0 tensorflow-2.6.0 tensorflow-estimator-2.6.0 termcolor-1.1.0 typing-extensions-3.7.4.3 urllib3-1.26.7 werkzeug-2.0.2 wheel-0.37.0 wrapt-1.12.1
(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$

If you're using this example as a starting point for your own code you can install additional Python packages within this virtual environment as required.

Testing TensorFlow

To test TensorFlow we'll create a short program taken from the TensorFlow documentation. We'll also create a shell script to configure the environment and run it. Use a text editor to create the files shown below, obviously substituting your own home directory in the second file.

(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$ cat tftest.py
import tensorflow as tf
print(tf.reduce_sum(tf.random.normal([1000, 1000])))

(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$ cat tftest.sh
#!/bin/sh

. /cluster/home/cur/tdb/tensorflow/bin/activate
python /cluster/home/cur/tdb/tftest.py

(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$

Now we can try running it on the Hydra cluster.

(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$ srun -p gpu --gres gpu:1 ./tftest.sh
2021-10-09 11:51:55.982457: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-09 11:52:02.023876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15405 MB memory:  -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:3b:00.0, compute capability: 6.0
tf.Tensor(-872.1127, shape=(), dtype=float32)
(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$

Or we can submit it as a batch job.

(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$ sbatch -p gpu --gres gpu:1 ./tftest.sh
Submitted batch job 141658
(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$
(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$ cat slurm-141658.out
2021-10-09 11:52:56.659022: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-09 11:52:57.191003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15405 MB memory:  -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:3b:00.0, compute capability: 6.0
tf.Tensor(-760.46423, shape=(), dtype=float32)
(tensorflow) tdb@myrtle:/cluster/home/cur/tdb$

Take note when running on CPUs, rather than GPUs

We've had issues reported when running TensorFlow on older CPUs without the AVX instruction set. If you're using the gpu partition then you're fine, but if you are using the cpu partition you should use the -C avx flag to make sure you get only those machines with the newer CPUs.

As usual, please contact us with any queries.

Back to top