Interactive vs batch jobs
There are two basic ways to run jobs on the cluster -- interactively or via a batch script.
What is an interactive job?
Interactive jobs are the simplest way to use the cluster. You log in, run commands which execute immediately, and log off when you’re finished. You can use either the command line or a graphical environment when running an interactive job.
When should you use an interactive job?
- Short tasks
- Tasks that require frequent user interaction
- Graphically intensive tasks
Submitting an interactive job
To submit an interactive job, ssh to compute.cla.umn.edu and enter “qsub -IX” on the command line. This will drop you to a node where you can run your interactive job.
[emailprotected]:~$ ssh -X compute.cla.umn.edu
[emailprotected]:~$ qsub -IX
What is a batch job?
A batch job involves writing a script that specifies the tasks that you want to run. The script is then submitted to the job scheduler and runs without user interaction. This is an efficient way to leverage the power of the cluster as once your batch job has been submitted, you can log off and wait for the job to complete.
When should you use a batch job?
- Longer running processes
- Parallel processes
- Running large numbers of short jobs simultaneously
- Tasks that can be left running for a significant amount of time without any interaction
Submitting a batch job
The easiest way to submit a batch job is to first create a Portable Batch System (PBS) script which defines the commands and cluster resources that will be needed to run the job. This script is then submitted to PBS using the qsub command.
Creating a PBS Script
To set the parameters for your job, you can create a file that contains the commands to be executed. Typically, this is in the form of a PBS script. You can find a list of some of the more commonly-used PBS directives in the PBS Parameters section on this page.
Here is a sample PBS file, named myjob.pbs, followed by an explanation of each line of the file:
#PBS -S /bin/bash
#PBS -q batch
#PBS -l nodes=1:ppn=2
#PBS -l walltime=01:00:00
#PBS -l mem=500mb
module load stata/15
stata-se -b do test.do
- The first line in the file identifies which shell will be used for the job. In this example, bash is used but tcsh or other valid shells would also work.
- The second line specifies which queue to use. In this case, we are submitting the job to the “batch” queue.
- The third line specifies the number of nodes and processors desired for this job. In this example, one node with two processors is being requested. Note: The “-l” flag is an “el” (for ”resource_list”), not a “one”. There are no spaces around the “=” and “:” signs.
- The fourth line specifies how much wall-clock time is being requested. The format for the walltime option is "hh:mm:ss" so the 01:00:00 in the above example denotes that one hour of walltime is being requested.
- The fifth line in the PBS file requests a maximum of 500mb physical memory. Note that on Linux, the “-l mem” directive is ignored if the number of nodes being requested is not 1.
- The sixth line tells the cluster to cd to the directory from which the batch job was submitted. By default, a new job starts in your home directory. Including this line in your script makes it convenient to edit a script and then submit the job from the same directory.
- The seventh line loads the stata 15 environment module in preparation to run a stata script. More information on environment modules can be found on our Using Modules page.
- The last line tells the cluster to run the program. In this example, it runs stata, specifying test.do in the same directory from which the job was submitted.
Submitting the job
Once you have your job script ready to go, you will need to submit it to the cluster. To run the job, enter the following command on compute.cla.umn.edu:
[emailprotected]:~$ qsub myjob.pbs
Job arrays can be used to simplify the task of submitting multiple similar jobs. For example, let’s say you want to run 10 jobs using the same script to analyze 10 different data files. Rather than submit 10 jobs individually, using a job array allows you to submit a single job. PBS will then create the 10 individual jobs using the requested script and file parameters. Please note that job arrays are inherently serial unless the code itself is parallelized.
Submitting a Job That Uses an Array
An easy way to prepare your data files for job submission using a job array is to rename the files by appending sequential numbers to the name of the each of the files, such as data-1, data-2, etc.. Here is an example of a pbs script that uses job arrays to run a script 10 times, each time with a different input file starting with data-1 and ending with data-10:
#PBS -S /bin/bash
#PBS -q batch
#PBS -l nodes=1:ppn=2
#PBS -l walltime=01:00:00
#PBS -J 1-10
The -J parameter sets the range of the PBS_ARRAYID variable. In the above example, the parameter of 1-10 will cause qsub to call the script 10 times, each time updating the PBS_ARRAYID from 1 to 10. This results in 10 jobs being created in the job array, each one using the same script with a different input data file from data-1 to data-10. The argument to the -t parameter can be an integer id or a range of integers. Multiple ids or id ranges can be combined in a comma delimited list (e.g., -t 1,10,50-100).
You can also limit the number of jobs that will run simultaneously by specifying the number of job slots that you want. For example, if you change the -t parameter in the above example to:
#PBS -J 1-10%5
you are specifying an array with 10 elements as before but the “%5” at the end tells the system that only 5 should be running at any one time. Limiting the number of simultaneous jobs in this manner can be useful when you are sharing limited cluster resources with others.
This is a partial list of some of the more commonly-used PBS parameters. A complete list can be found in the qsub man page or the Torque Administrator Guide
|#PBS -N stata-test|
Sets the name of the job that will be seen in the qstat output. If not set, the name defaults to the name of the script.
|#PBS -o myprog.out|
Where to write stdout. Defaults to $PBS_JOBNAME.o$PBS_JOBID in the job submission directory.
|#PBS -e myprog.err|
Where to write stderr. Defaults to $PBS_JOBNAME.e$PBS_JOBID in the job submission directory.
|#PBS -o mylogs/|
Write stdout logs to the mylogs subdirectory of the job submission Directory. You can also specify the filename, such as mylogs/myprog.err. Note that when specifying just a directory, you need to include the trailing slash
|#PBS -e mylogs/|
Write stderr logs to the mylogs subdirectory of the job submission directory. As above, you can also specify the filename but will need to include the trailing slash when specifying just a directory.
|#PBS -j $arg|
This argument to this directive determines how the standard error and standard output streams will be joined. The $arg argument can be one of the following:
|#PBS -M [emailprotected]|
Specifies the email address where PBS will send messages about the job. If unset, it defaults to [emailprotected], which is probably not what you want.
|#PBS -m $mail_options|
Defines the set of conditions under which the execution server will send a mail message about the job. The mail_options argument is a string which consists of either the single character "n" or one or more of the characters "a", "b", and "e".
#PBS -m abe # Send mail when the job aborts, begins, or ends.
|#PBS -S /bin/$shell|
Sets the shell to be used in executing your script. If left out, it defaults to your normal login shell. Typical values for the $shell argument are /bin/bash, /bin/tcsh, /bin/csh or /bin/sh.
Export all environment variables in the qsub command environment to the batch job environment.
Queues and Resource Limits - Block
|Routing Queue||Queue||Access Control List||Direct Submission Allowed?||Interactive?|
*The batch and gpu queues are routing queues. Jobs submitted to these queues will be routed to the appropriate sub-queue based on the requested walltime, Procs, and RAM.
For access to highmem or multinode please email us at [emailprotected] with your needs.
The table below shows the default and maximum resources for each queue.
|Queue||Default Walltime||Default RAM (GB)||Default Procs (Cores)||Maximum Walltime||Maximum RAM (GB)||Maximum Procs (Cores)|