Running Jobs

All of our supercomputers use a system called PBS to make sure that everyone’s program have the resources they need (mostly CPU cores and memory) and that they’re not taking more than their fair share.

Anything that runs for more than a few seconds should be run inside a PBS job. This can be done by using a PBS script or an interactive PBS session.

Example PBS jobs for many of our programs can be found in /usr/local/apps/example_jobs. PBS scripts are submitted by using the qsub command. The status of PBS jobs can then be checked by using the qstat command.

When running a one-off job or developing a PBS script, it is convenient to be able to run commands directly inside the PBS environment. Interactive PBS sessions can be started using qsub with the -I option. By default, interactive sessions are allocated a single CPU and an amount of RAM that varies by system (900MB on sequoia and 1GB on catalpa). If you need more resources than than the default, you can request them. For instance, to request four CPUs and five GB of RAM, you would run:
qsub -I -lncpus=4 -lmem=5gb

When a job is submitted, PBS checks to see if the resources required by the job are available. If so, the job is allowed to run. If not, the job is queued until the resources are available. If your job requires a lot of resources, it’s possible that jobs submitted after yours will run first because they require fewer resources.

It’s important to give some thought to how you break up your jobs and how many resources your job really needs. In general, several┬ásmall jobs will usually start running sooner and finish faster than a single large job. A job might take less time with 32 CPU cores, but it could start running sooner and possibly finish sooner if you use only 16. It’s often worthwhile to see what resources are available and how many other jobs are queued before submitting your job.

See the Maple page for information on allocating GPUs.