Why use the scheduler (PBS script)?
The organization of the Linux compute cluster is with one or a few public facing login nodes (
blackjack , etc.) and many identical worker nodes behind a private network switch. In our case we have about 60 worker nodes, organized into different groups; the largest (and default) group is the
batch2 queue with about 45 nodes.
The scheduler is the route to using the worker nodes. The syntax and mastery of the scheduler/PBS scripts can be demanding but it is crucial to making use of the cluster as a clustered system. If you find that your needs are satisfied by running jobs only interactively on the head node, you should try to migrate to one of our shared-use non-clustered Linux systems (e.g. silvertip); another alternative is to use
flapjack , which is a public-facing cluster node
This document aims to help users run jobs on the SDSU Linux cluster using the Moab/PBS scheduler. There are two key elements. First, the user must know the correct syntax to run the program from the command line; if necessary, interactive (non-scheduled) testing can be done with small jobs to discover the syntax. Second, the command (or commands) is/are copied into a PBS scheduler script (usually near the end of the script) and submitted to the queue. If necessary, directives are added (near the top of the script) to request resources (nodes, processors, walltime, etc.) in excess or different from the default values.
Short tutorial example: compiling and running a single-thread C program
I have provided a very simple single-threaded example
program on the cluster called
prime_ex that does a prime number
search. The files should be located in my home folder underneath the
examples subfolder, i.e. in
You have to first copy the C source code into your home folder or a
subfolder you create, for example as below:
For this simple C program, use the gcc compiler:
This will create an executable file named
a.out . You can rename
a.out to whatever you like (using
mv ), or you can add an option
-o in the compile to explicitly name the executable created by the compile.
Testing: running the program on the command line interactively
The program computes all the prime numbers less than or equal to a given input number and writes back one line with the number of primes found. So this particular program takes as input interactively one integer number. To run this program for a small argument, from the command line, interactively, an example session is shown.
The program is written to write the prompt and wait for input. Then when it receives the number and the user hits return, the program runs and the output is written to the screen.
(Btw, to check if the program is correct, see for example
This program runs instantaneously when the input is 100; we can make that number bigger and bigger and eventually it will take a while to compute. So this is a small example of a program that would be useful to run in background (and, eventually, on a worker node, not blackjack itself).
Our very simple
prime_ex program does not take much in input, just one int, but some programs take very large input sets, so that it is inconvenient to type them all on the command line. You can put your inputs in a file and then redirect the file to the program as input.
To create an input file you can use an editor (see the section on Linux/UNIX editors) or you can just do:\
and it will create a textfile called
infile with just the given echoed characters in it. With this input value (10000000), it should take about 2 min to finish; if you want it to run faster, use a smaller number.
Then you can use this file as input redirect to the program
You will notice that when we run with input redirect, it still writes the output to the screen. We can make that output go to a file instead with an output redirect.
This will run, but you will notice it ties up the terminal while the job is executing; you can't type commands. If you want the job to run, and release the terminal, also put the job in background.
putting job in background
Here is an example syntax.
Then you can use the
more command to check the contents of the output file.
Following the same example, we had the program in a folder beneath the home directory called testfolder. Copy a simple PBS script into that folder:
Now, open the file
simple.pbs in a text editor. We chose the nano editor because it is simple, but you can use vi or emacs if you are familiar with those.
At this point you can move the cursor around with the arrow keys and type text or delete characters, etc. Use Ctrl-G for help and Ctrl-X to exit; it will prompt you to save if you have any changes.
Why did we not redirect the output to a file? The scheduler will automatically produce an output file, so we don’t need to re-direct the output. We could, but for such a small job, here there is no need.
Submitting the PBS script
Now, if we are in the folder that has the submit script, and the program, and the input file, we use the
qsub command to submit the job.
It responds with a number, which is our job number.
Tracking the running job
Assuming the job does not finish immediately, we can use some helper
commands to check on the status. The
qstat command will show our
currently running (or recently ended) jobs.
This does not seem to tell us much, but the most important thing to note
is the “
R ” underneath the
S (for status), which means the job is
running. Other commonly seen codes are
E (error) or
To see all the jobs running in the scheduler you can use
Here we see our job running, having been listed as asking for one
process (the default). Note the time remaining, just less than 30
minutes; that is because the max walltime allowable in the debug queue
is 30 minutes. Since we did not indicate a walltime request in the PBS
script, it used the default, one hour, and that was then modified to the
maximum for this queue.
After the job is finished
Now, if all goes well and the job finishes, you should have two new
files in your folder:
The two new files have the name of the script, appended by the job
number preceded either by the character
o (for output) or
more to look inside the output file.