Access to cluster resources is managed using combination of Torque resource manager and MOAB scheduler.

Submitting Jobs

Thea easiest way to submit jobs to the cluster is by using two commands:

  • Iqsub for submitting interactive jobs
  • Bqsub for submitting batch jobs

You can run either of those from the command line without any arguments to see the syntax info including examples. Extra batch interpreters and be added to Bqsub as requested by users.

To check the state of your jobs, you can use either qstat or showq commands.

For more sophisticated submissions see man qsub and use directly qsub command.

Deleting Jobs

To delete a queued or running job use qdel command and job number as reported by qstat or showq. E.g. to delete job 99999 (99999.admin) one would run the following:

    qdel 99999.admin

Note! Deleting a running MPI or other parallel job might leave behind some of the workers in the parallel pool in so called zombie state. If in doubt, contact system administrator.

Resource Limits

First of all, all cluster users and groups are equally privileged to run cluster jobs within the resource limits as the fair-share policies do not make any distinction among them. If temporary extra privileged were to be granted to any user/group, those will be consulted first with cluster's principal users (like faculty or group leaders).

The following limits are imposed on all users to prevent any of them from monopolizing cluster resources. Those are subject to change depending on resource utilization and cluster load, but only after consultations with cluster's principal users.

Resource limits per user:

  • 14 nodes on under-utilized cluster and down to 9 under 100% cluster load
  • 280 cores on under-utilized cluster and down to 180 under 100% cluster load
  • 14 scheduled jobs on under-utilized cluster and down to 9 under 100% cluster load
  • 20 days maximum wall-time in processing queues

Note! Because of scheduler properties 100% utilization does not necessary need to be reflected by instantaneous output of showq or qsub commands since scheduler takes into account also future needs of queued jobs.

Sharing the Cluster

Understand that cluster resources are limited, hence make every effort not to abuse them and/or prevent other users from using those resources. The bottom line is that allocated cluster resources should not sit idle.

Please:

  • do not leave interactive jobs running while you are not using the allocated resources,
  • allocate only as much resources as necessary to do the job,
    • e.g. do not request multiple cores/nodes for inherently serial jobs.
  • when requesting wall-time for a job, make an educated guess how long the job is going to run and add no more then 50% to your estimate to account for possible error in your estimate; it allows both scheduler and other users to better plan their work.