Skip to content

Hydra Cluster: Overview


Hydra is the School of Computing's High Performance Computing cluster. It consists of a number of CPU resources and a small number of GPU resources. We use the Slurm Workload Manager to manage jobs within the cluster.

The resources are split in to two partitions (think of these as groups of machines); one for CPU only jobs, and one for jobs that require a GPU. The GPU partition can also be used for CPU jobs with permission.

The GPU partition contains five servers with 10 GPUs of a mixture of architectures:

  • One server with 4x NVIDIA A100 80GB GPUs. It also has 2x Intel Gold 5317 CPUs running at 3.00GHz, with a total of 2 cores (48 threads), and 384GB of RAM.
  • One server with 1x NVIDIA TITAN V GPU. It has 2x Intel Xeon E5-2620 and 144GB of RAM.
  • Two servers with 2x NVIDIA Tesla P100 GPUs in each. They also each have 2x Intel Gold 6136 CPUs running at 3.00GHz, giving a total of 24 cores (48 threads) per server, and 256GB of RAM.
  • There's also an older server with an NVIDIA Tesla K40 GPU. This is slower but potentially useful for testing.

The CPU partition contains 14 servers each with 2x Intel Xeon E5520 CPUs running at 2.27GHz. Each server has 8 cores (16 threads), and between 12GB and 24GB of RAM. These servers are much older but can still provide a decent amount of processing power by virtue of having many cores in total. The partition also contains a few servers running within our cloud; we may increase or decrease the number of these during the year to make better use of spare resources within the cloud.

Requesting access

Access is currently open to staff and research postgraduate students within the School. You will need to contact us to get access and to discuss your requirements.

Getting started

The Slurm Quick Start User Guide provides a good tutorial for getting started with Slurm. We won't replicate its contents here, but rather we'll detail the specifics of our setup. It's well worth reading through it, and possibly the rest of the Slurm user documentation, along with the notes on this page.

To perform these steps you'll need to be able to log in to a server named hydra using SSH. You can also use myrtle or raptor if you prefer. If you've not logged in to any of these machines before then you'll first need to set a password. Then you can log in using an SSH client, for example PuTTY on Windows. If you're using a Mac, Linux, or other Unix-like systems with a command line ssh, you can simply type the following, replacing login with your own username.


Submitting jobs

Jobs can be submitted directly from hydra, or from myrtle or raptor. You don't need to log into any of the servers running the jobs. As detailed in the quickstart guide above you can use srun to submit a job and receive immediate output, or you can use sbatch to queue a job and have the output stored in a file. For example:

tdb@hydra:~$ srun hostname

tdb@hydra:~$ cat
#SBATCH --mail-type=END
tdb@hydra:~$ sbatch
Submitted batch job 160
tdb@hydra:~$ # wait for the job to complete or email to arrive
tdb@hydra:~$ cat slurm-160.out

Partitions and requesting resources

The default partition is named test. This partition has a maximum 1 hour run time on jobs and is intended for testing out commands to make sure they behave as expected. This partition should always have capacity available and you shouldn't have to wait for long running jobs to complete. When you run a command like srun without any arguments, as above, it'll run in this partition. To request an alternative partition use the -p flag:

tdb@hydra:~$ srun -p cpu hostname
tdb@hydra:~$ srun -p gpu hostname

By default your job will have 1GB of memory allocated. If you attempt to use more your job will be killed. To request more use the --mem flag to specify a larger amount. For example --mem=2G.

If your job requires a GPU, in addition to specifying the gpu partition you will also need to request the required GPU resources. This can consist of either 1 or 2 GPUs, and specify whether you want the Volta generation (TITAN V), Pascal generation (P100) or Kepler generation (K40) cards. For example:

tdb@hydra:~$ srun -p gpu --gres gpu:1 hostname
tdb@hydra:~$ srun -p gpu --gres gpu:volta:1 hostname

Please note that the GPUs will only be available if you request them with --gres. It is not enough to just select the GPU partition.


Shared home directories are available on all the nodes in the cluster, and on myrtle and raptor. On myrtle and raptor your shared home directory is different to your normal home directory, so to access it you need to do the following:

tdb@myrtle:~% clusterdir
Your cluster home directory is:
tdb@myrtle:~$ cd /cluster/home/cur/tdb

On the hydra server, and on all the cluster nodes, this is your default home directory. You may find it more straightforward to use the hydra server to avoid needing to change directories.

Your directory is also available as \\\exports\cluster.

Any jobs you submit will run with the same working directory and environment as you have when you launch them. So if you change to your cluster home directory, or a sub-directory, before running jobs you will get more predictable results.

Each machine also has a temporary local area mounted on /scratch. You're free to use this space if you need to create temporary files or extract data sets as your job starts running. This storage will be faster than the shared home directories. Please make sure to clean up your files when your job completes. If you require a larger amount of storage in /scratch make sure add the --tmp flag, eg --tmp=1G, to request a machine with enough space.

Large storage

We additionally have large shared storage mounted at /data on all nodes. If you need to store large amounts of data then this might be more appropriate to use. It may also have better performance for I/O heavy tasks. Please get in touch with us if you'd like to use it.

Checking the queue

You can check the queue by using the squeue command. For example:

tdb@hydra:~$ squeue
               166      test hostname      tdb  R       0:04      1 cloud01
               167       cpu hostname      tdb  R       0:01      1 cloud02

Available software

The Hydra nodes contain a number of software packages including Java, Julia, R, Python, Ocaml, and the usual Linux C compilers. We can likely install anything that's available in the Ubuntu 20.04 repositories.

The GPU machines additionally contain the Nvidia CUDA tools.

As a general rule, to keep things consistent, we will try to maintain the same versions on these machines as on myrtle and raptor. However, during upgrades they may diverge, in which case we'd recommend using the hydra server instead.

TensorFlow example

TensorFlow is a popular framework so we've put together a short example of how to use it on the Hydra cluster. It may also serve as a useful starting point for your own programs. The example can be found here.


Here are some general recommendations to consider when creating jobs:

  • Break jobs down in to smaller units when possible. This allows them to be spread over more nodes and do more in parallel.
  • Try not to request more resources than you need as it will limit the number of jobs that can run on a node.
  • Think about the amount of available resources before launching a huge number of jobs. Maybe run a batch, wait for results, then run another.
  • Keep an eye on the queue and what colleagues are doing. Communicate with us and your colleagues if your jobs are impacting each other.


Slurm has an extensive set of options and features and we're still learning how best to make use of them. If you have questions or ideas on how things could be done better please let us know. Also please get in touch if you think we could have mentioned something else in this guide, or if anything cloud have been clearer.

As usual, please contact us with any queries.

Back to top