Linux Research Cluster
Overview
The Linux cluster is a dedicated research cluster intended for running computationally intensive batch and parallel jobs. The cluster consists of a master node and 10 client nodes which have the specification below:
| Client Nodes | |
| Number | 10 |
| CPUs | Dual Quad Core Xeon E5345 CPU, 2.33Ghz, 8Mb L2 cache |
| Memory | 8GB |
| Disks | 2x 80GB SATA Disk (mirrored) |
| Operating System | CentOS 5 |
Although in total there are 80 processing cores available, only 79 of these are assigned for computer cluster use. As the master node is needed to handle logins, 1 core is unallocated to ensure that the master node is capable of responding in a timely manner to login and scheduling operations.
Only the master node is directly accessible - jobs must be submitted using the GridEngine scheduling software which handles distribution of the job to the worker nodes.
The nodes are functionally equivalent to the CentOS installation in use in the School and have access to user home directories and other NFS mounted partitions (e.g. /home/research, /home/scratch-staff).
If your program is likely to generate large amounts of temporary data or have a large number of read/write operations, then you should consider using the locally available temporary storage. This is available at /data/private/staff or /data/private/pg. You should ensure that your job submission script includes suitable code at the end to copy any required resulting data to an easily accessible location - either your home directory or scratch or the research directory. Note that the local storage is not backed up.
The cluster should be viewed as a "black box" where users only connect to the master node. The scheduleing system Grid Engine has been installed to handle all the details of scheduling jobs to run on the clients. Users are not permitted to login directly to the client nodes to execute jobs.
Access to the Cluster
The research cluster is primarily intended for use by staff and research students. However, final year/MSc project students may use the cluster with the permission of their project supervisor. Irrespective of whether a user is a member of staff or a student, all users must register in order to be granted access to the cluster. To register, staff and research students should contact the IT Service Desk while final year/MSc project students should apply through their project supervisor.
Master Node
The master node, named cluster1, is directly accessible from the School's network using ssh and behaves in a very similar way to the standard linux configuration.
Running Jobs
Grid Engine Clustering Software has been installed on the cluster for submitting and managing jobs. Users may only submit jobs using these tools.
More information about using Grid Engine can be found by following the link below:
Users whose jobs generate large data files are advised to write this data
locally to disk (to /data/private/pg or
/data/private/staff). Writing large amounts of data to the file server
can flood the network with traffic and can result in impaired performance.
Once the jobs have completed, the local data can be processed, compressed
and copied back to the file server.
Parallel Jobs
The cluster is an ideal environment to run parallel jobs. Both MPICH and LAM/MPI have been installed on the cluster and integrated with Grid Engine. See the Grid Engine documentation for more information.