Access to cluster resources is managed using combination of Torque resource manager and MOAB scheduler.
Submitting Jobs
Thea easiest way to submit jobs to the cluster is by using two commands:
Iqsub
for submitting interactive jobsBqsub
for submitting batch jobs
You can run either of those from the command line without any arguments to see the syntax info including examples. Extra batch interpreters and be added to Bqsub as requested by users.
To check the state of your jobs, you can use either qstat
or showq
commands.
For more sophisticated submissions see man qsub
and use directly qsub
command.
Deleting Jobs
To delete a queued or running job use qdel
command and job number as reported by qstat
or showq
. E.g. to delete job 99999 (99999.admin) one would run the following:
qdel 99999.admin
Note! Deleting a running MPI or other parallel job might leave behind some of the workers in the parallel pool in so called zombie state. If in doubt, contact system administrator.
Resource Limits
First of all, all cluster users and groups are equally privileged to run cluster jobs within the resource limits as the fair-share policies do not make any distinction among them. If temporary extra privileged were to be granted to any user/group, those will be consulted first with cluster's principal users (like faculty or group leaders).
The following limits are imposed on all users to prevent any of them from monopolizing cluster resources. Those are subject to change depending on resource utilization and cluster load, but only after consultations with cluster's principal users.
Resource limits per user:
- 14 nodes on under-utilized cluster and down to 9 under 100% cluster load
- 280 cores on under-utilized cluster and down to 180 under 100% cluster load
- 14 scheduled jobs on under-utilized cluster and down to 9 under 100% cluster load
- 20 days maximum wall-time in processing queues
Note! Because of scheduler properties 100% utilization does not necessary need to be reflected by instantaneous output of showq
or qsub
commands since scheduler takes into account also future needs of queued jobs.
Sharing the Cluster
Understand that cluster resources are limited, hence make every effort not to abuse them and/or prevent other users from using those resources. The bottom line is that allocated cluster resources should not sit idle.
Please:
- do not leave interactive jobs running while you are not using the allocated resources,
- allocate only as much resources as necessary to do the job,
- e.g. do not request multiple cores/nodes for inherently serial jobs.
- when requesting wall-time for a job, make an educated guess how long the job is going to run and add no more then 50% to your estimate to account for possible error in your estimate; it allows both scheduler and other users to better plan their work.