FAQ

FAQ

Quick Links

Contact & Support
For support for any of our services or for general advice and consultancy, email:
rc-support@ucl.ac.uk


This page attempts to address some of the topics we most frequently receive questions about, or to which the answers are most complex.

Why is my job in Eqw status?

If your job goes straight into Eqw state, there was an error in your jobscript that meant your job couldn't be started. You can see a truncated version of the error with

qstat -j <job_ID>

To see the full error (in most cases you won't need to load userscripts as it is a default module now):

module load userscripts
qexplain <job_ID>

This often happens because a file or directory you are trying to use doesn't exist.

"Unable to determine job requirements" error

Unable to run job: Rejected by ucl_jsv4h Reason:Unable to determine job requirements. Exiting.

The #$ directives are missing from your script, or have extra white space before them. This means SGE isn't picking them up and doesn't know what resources you are requesting. Add them or remove the spaces and it will work.

"/bin/bash: invalid option" error

This is a sign that your jobscript is a DOS-formatted text file and not a Unix one - the line break characters are different. Type dos2unix <yourscriptname> in your terminal to convert it.

Sometimes the offending characters will be visible in the error. You can see here it's trying to parse ^M as an option.


/bin/bash: -^M: invalid option
Usage:  /bin/bash [GNU long option] [option] ...
       /bin/bash [GNU long option] [option] script-file ...



Your Scratch space goes missing

You may have accidentally deleted or replaced the link to your Scratch space. Do an ls -al in your home - if set up correctly, it should look like this:

lrwxrwxrwx   1 username private       24 Apr 14  2014 Scratch -> /scratch/scratch/username

where username is your UCL user ID. You can recreate the symlink with

ln -s /scratch/scratch/username Scratch

Which MKL library files should I use to build my application?

Depending on which whether you wish to use BLAS/LAPACK/ScaLAPACK/etc... there is a specific set of libraries that you need to pass to your compilation command line. Fortunately, Intel have released a tool that allows you to determine which libraries to link and in which order for a number of compilers and operating systems (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/). See also MKL_on_Legion

SSH known_hosts

1. If you get a warning when sshing in, check Summary of Legion changes. You probably need to delete old host keys from your ~/.ssh/known_hosts.

2. If you look in the error file for your job, you may find a number of errors like the one below. Please ignore these as they are the result of compute nodes being unable to write to your home directory and do not indicate a problem.


Failed to add the RSA host key for IP address '10.143.9.1' to the list of known hosts (/home/uccaoke/.ssh/known_hosts)



"ssh: Unsupported option - -x" errors

These errors indicate that you are attempting to use the QLogic version of mpirun in the OpenMPI parallel environment. It is likely you are doing this by accident and probably intend to use the OpenMPI mpirun but do not have your modules configured correctly.

Please add the lines below, either after default modules (defmods) are loaded in your .bashrc, or else in your job script before mpirun:


module remove mpi/qlogic/1.2.7/gnu
module add mpi/openmpi/1.4.1/intel



Note: the above assumes you are using the Intel compilers.

You get "Program not started through mpirun. Exiting..." but are using mpirun!

This is most often caused by launching a program built with QLogic MPI with the mpirun from another MPI implementation (e.g. OpenMPI). You can determine which version of MPI your program was built with by running ldd on the application binary.

You want to know where the libraries loaded via modules system are on disk

Look at the contents of the default modules to find the path to those libraries on the current system. Look at the following command listing:


[user@login07 ~]$ module avail hdf
---------------------------------- /shared/ucl/apps/modulefiles/libraries ----------------------------------
hdf/5-1.8.15/gnu-4.9.2                  hdf/5-1.8.15-p1-impi/intel-2015-update2
hdf/5-1.8.15/intel-2015-update2         hdf/5-1.8.15-p1-ompi/gnu-4.9.2
[user@login07 ~]$ module show hdf/5-1.8.15/gnu-4.9.2
-------------------------------------------------------------------
/shared/ucl/apps/modulefiles/libraries/hdf/5-1.8.15/gnu-4.9.2:
module-whatis    adds HDF5 1.8.15 (Serial) for GCC 4.9.2 to your environment. 
conflict         hdf 
prereq   gcc-libs/4.9.2 
prepend-path     PATH /shared/ucl/apps/HDF/5-1.8.15-gcc.4.9.2/bin 
prepend-path     LD_RUN_PATH /shared/ucl/apps/HDF/5-1.8.15-gcc.4.9.2/lib 
prepend-path     LD_LIBRARY_PATH /shared/ucl/apps/HDF/5-1.8.15-gcc.4.9.2/lib 
prepend-path     LIBRARY_PATH /shared/ucl/apps/HDF/5-1.8.15-gcc.4.9.2/lib 
prepend-path     CPATH /shared/ucl/apps/HDF/5-1.8.15-gcc.4.9.2/include 
prepend-path     INCLUDE_PATH /shared/ucl/apps/HDF/5-1.8.15-gcc.4.9.2/include 
prepend-path     CMAKE_PREFIX_PATH /shared/ucl/apps/HDF/5-1.8.15-gcc.4.9.2 
-------------------------------------------------------------------



As you can see, the modules system sets the paths to libraries in environment variables which the system uses to locate files.

Unable to run job: JSV stderr: perl: warning: Setting locale failed.

This error is generally because your SSH client is passing LANG through as part of the SSH command, and is passing something that conflicts with what Legion has it set to. You may be more likely to come across this with newer versions of Mac OS X - if your client is different, have a look for an equivalent option.

In Mac OS X Terminal, click Settings and under International untick the box that says "Set locale environment variables on startup".

Per session, you can try LANG=C ssh userid@legion.rc.ucl.ac.uk

Why can't I find out when my job will run?

An informative discussion on this matter can be found in the Legion Scheduler section of the User Guide.

What can I do to minimise the time I need to wait for my job(s) to run?

  1. Minimise the amount of wall clock time you request.
  2. Use job arrays instead of submitting large numbers of jobs (see our job script examples).
  3. Plan your work so that you can do other things while your jobs are being scheduled - the rule of thumb is that you will have to wait about twice the requested wall clock time (on average).

What is my project code (short string) / project ID?

Prior to July 2014, every user had a project code. Now all users belong to the default project "AllUsers" and no longer have to specify this. If you see older job script examples mentioning a project ID, you can delete that section. Only projects with access to paid or specialised resources need to give a project code in order to use those resources. If you do not know yours, contact rc-support.

}}