Linux Research Cluster

Overview

The Linux cluster is a dedicated research cluster intended for running computationally intensive batch and parallel jobs. The cluster consists of a master node and 10 client nodes which have the specification below:

  Client Nodes
Number 10
CPUs Dual Quad Core Xeon E5345 CPU, 2.33Ghz, 8Mb L2 cache
Memory 8GB
Disks 2x 80GB SATA Disk (mirrored)
Operating System CentOS 5

Although in total there are 80 processing cores available, only 79 of these are assigned for computer cluster use. As the master node is needed to handle logins, 1 core is unallocated to ensure that the master node is capable of responding in a timely manner to login and scheduling operations.

Only the master node is directly accessible - jobs must be submitted using the GridEngine scheduling software which handles distribution of the job to the worker nodes.

The nodes are functionally equivalent to the CentOS installation in use in the School and have access to user home directories and other NFS mounted partitions (e.g. /home/research, /home/scratch-staff).

If your program is likely to generate large amounts of temporary data or have a large number of read/write operations, then you should consider using the locally available temporary storage. This is available at /data/private/staff or /data/private/pg. You should ensure that your job submission script includes suitable code at the end to copy any required resulting data to an easily accessible location - either your home directory or scratch or the research directory. Note that the local storage is not backed up.

The cluster should be viewed as a "black box" where users only connect to the master node. The scheduleing system Grid Engine has been installed to handle all the details of scheduling jobs to run on the clients. Users are not permitted to login directly to the client nodes to execute jobs.

Access to the Cluster

The research cluster is primarily intended for use by staff and research students. However, final year/MSc project students may use the cluster with the permission of their project supervisor. Irrespective of whether a user is a member of staff or a student, all users must register in order to be granted access to the cluster. To register, staff and research students should contact the IT Service Desk while final year/MSc project students should apply through their project supervisor.

Master Node

The master node, named cluster1, is directly accessible from the School's network using ssh and behaves in a very similar way to the standard linux configuration.

Running Jobs

Grid Engine Clustering Software has been installed on the cluster for submitting and managing jobs. Users may only submit jobs using these tools.

More information about using Grid Engine can be found by following the link below:

Users whose jobs generate large data files are advised to write this data locally to disk (to /data/private/pg or /data/private/staff). Writing large amounts of data to the file server can flood the network with traffic and can result in impaired performance. Once the jobs have completed, the local data can be processed, compressed and copied back to the file server.

Parallel Jobs

The cluster is an ideal environment to run parallel jobs. Both MPICH and LAM/MPI have been installed on the cluster and integrated with Grid Engine. See the Grid Engine documentation for more information.