Thomas

Thomas

Quick Links

Contact & Support
For support for any of our services or for general advice and consultancy, email:
rc-support@ucl.ac.uk


Thomas is the UK National Tier 2 High Performance Computing Hub in Materials and Molecular Modelling.

Applying for an account

Thomas accounts belong to you as an individual and are applied for through your own institution's Point of Contact. You will need to supply an SSH public key, which is the only method used to log in.

Creating an ssh key pair

An ssh key consists of a public and a private part, typically named id_rsa and id_rsa.pub by default. The public part is what we need. You must not share your private key with anyone else. You can copy it onto multiple machines belonging to you so you can log in from all of them (or you can have a separate pair for each machine).

Creating an ssh key in Linux/Unix/Mac OS X

ssh-keygen -t rsa

The defaults should give you a reasonable key. If you prefer to use DSA, ECDSA or ED25519 instead, and longer keys, you can. You can also tell it to create one with a different name, so it doesn't overwrite any existing key.

You will be asked to add a passphrase for your key. A blank passphrase is not recommended; if you use one please make sure that no one else ever has access to your local computer account. How often you are asked for a passphrase depends on how long your local ssh agent keeps it.

You may need to run ssh-add to add the key to your agent so you can use it.

Creating an ssh key in Windows

Have a look at Key-Based SSH Logins With PuTTY which has step-by-step instructions. You can choose whether to use Pageant or not to manage your key. You can again pick RSA, DSA, ECDSA etc but do not pick SSH-1 as that is a very old and insecure key type.

Information for Points of Contact

Points of Contact have some tools they can use to manage users and allocations, documented at Points of Contact.

Logging in

You will be assigned a personal username and your SSH key pair will be used to log in. External users will have a username in the form mmmxxxx and UCL users will use their central username.

SSH timeouts

Idle ssh sessions will be disconnected after 7 days.

Using the system

Thomas is a batch system. The login nodes allow you to manage your files, compile code and submit jobs. Very short (<15mins) and non-resource-intensive software tests can be run on the login nodes, but anything more should be submitted as a job.

Full user guide

Thomas has the same user environment as RC Support's other clusters, so the User guide is relevant and is a good starting point for further information about how the environment works. Any variations that Thomas has should be listed on this page.

Submitting a job

Create a jobscript for non-interactive use and submit it using qsub. Jobscripts must begin #!/bin/bash -l in order to run as a login shell and get your login environment and modules.

Memory requests

Note: the memory you request is always per core, not the total amount. If you ask for 128G RAM and 24 cores, that will run on 24 nodes using only one core per node. This allows you to have sparse process placement when you do actually need that much RAM per process.

Monitoring a job

In addition to qstat, nodesforjob $JOB_ID can be useful to see what proportion of cpu/memory/swap is being used on the nodes a certain job is running on.

qexplain $JOB_ID will show you the full error for a job that is in Eqw status.

Useful utilities

As well as nodesforjob, there are the following utilities which can help you find information about your jobs after they have run.

  • jobhist - shows your job history for the last 24hrs by default, including start and end times and the head node it ran on. You can view a longer history by specifying --hours=100 for example.
  • scriptfor $JOB_ID - show the script that was submitted for the given job.

These utilities live in GitHub at https://github.com/UCL-RITS/go-clustertools and https://github.com/UCL-RITS/rcps-cluster-scripts

Queue names

On Thomas, users do not submit directly to queues - the scheduler assigns your job to one based on the resources it requested. The queues have somewhat unorthodox names as they are only used internally, but this is what they mean:

  • Jerry: single-node job
  • Tom: multi-node job
  • Spike: cross-CU job, using superqueue (any multi-node job may end up using this)

back to top

Software

Thomas mounts the RC Systems software stack.

Have a look at Applications for specific information on running some applications, including example scripts. The list there is not exhaustive.

Access to software is managed through the use of modules.

  • module avail shows all modules available.
  • module list shows modules currently loaded.

Access to licensed software may vary based on your host institution and project.

Requesting software installs

To request software installs, email us at the support address below or open an issue on our GitHub. You can see what software has already been requested in the Github issues and can add a comment if you're also interested in something already requested.

Installing your own software

You may install software in your own space. Please look at Compiling for tips.

Maintaining a piece of software for a group

It is possible for people to be given central areas to install software that they wish to make available to everyone or to a select group - generally because they are the developers or if they wish to use multiple versions or developer versions. The people given install access would then be responsible for managing and maintaining these installs.

Licensed software

Reserved application groups exist for software that requires them. The group name will begin with leg or lg. After we add you to one of these groups, the central group change will happen overnight. You can check your groups with the groups command.

  • CASTEP: You/your group leader need to have signed up for a CASTEP license. Send us an acceptance email, or we can ask them to verify you have a license. You will then be added to the reserved application group legcastep.
  • DL_POLY: has individual licenses for specific versions. Sign up at DL_POLY's website and send us the acceptance email they give you. We will add you to the appropriate version's reserved application group, eg lgdlp408.
  • Gaussian: not currently accessible for non-UCL institutions. UCL having a site license and another institute having a site license does not allow users from the other institute to run Gaussian on UCL-owned hardware.
  • VASP: we are not managing licenses for non-UCL institutions. You may install your copy in your home, and we provide a simple build script on Github. You need to download the VASP source code and then you can run the script following the instructions at the top.


back to top

Suggested job sizes

The target job sizes for Thomas are 48-120 cores (2-5 nodes). Jobs larger than this may have a longer queue time and are better suited to ARCHER, and single node jobs may be more suited to your local facilities.

back to top

Maximum job resources

Cores Max wallclock
864 48hrs

On Thomas, interactive sessions using qrsh have the same wallclock limit as other jobs.

Nodes in Thomas are 24 cores, 128G RAM. The default maximum jobsize is 864 cores, to remain within the 36-node 1:1 nonblocking interconnect zones.

Jobs on Thomas do not share nodes. This means that if you request less than 24 cores, your job is still taking up an entire node and no other jobs can run on it, but some of the cores are idle. Whenever possible, request a number of cores that is a multiple of 24 for full usage of your nodes.

There is a superqueue for use in exceptional circumstances that will allow access to a larger number of cores outside the nonblocking interconnect zones, going across the 3:1 interconnect between blocks. A third of each CU is accessible this way, roughly approximating a 1:1 connection. Access to the superqueue for larger jobs must be applied for: contact the support address below for details.

Some normal multi-node jobs will use the superqueue - this is to make it easier for larger jobs to be scheduled, as otherwise they can have very long waits if every CU is half full.

back to top

Budgets and allocations

We have enabled Gold for allocation management. Jobs that are run under a project budget have higher priority than free non-budgeted jobs.

To see the name of your project(s) and how much allocation that budget has, run the command budgets.

budgets
Project  Machines Balance  
-------- -------- -------- 
UCL_Test ANY      22781.89

Pilot users were added to a project for their institution, eg. Imperial_pilot.

Submitting a job under a project budget

To submit a job under a budget, add this to your jobscript:

#$ -P Gold
#$ -A MyProject 

(Specifying Gold will become the default later, so you would then use -P AllUsers to submit a free low priority job).

back to top

Support

Email rc-support@ucl.ac.uk with any support queries. It will be helpful to include Thomas in the subject along with some descriptive text about the type of problem, and you should mention your username in the body.

back to top

Acknowledging the use of Thomas in publications

All work arising from this facility should be properly acknowledged in presentations and papers with the following text:

"We are grateful to the UK Materials and Molecular Modelling Hub for computational resources, which is partially funded by EPSRC (EP/P020194/1)"

back to top