3.2. Submitting Batch Jobs to SGE

Batch jobs are submitted to SGE via scripts. Here is an example of a serial job script, sleep.sh. It basically executes the sleep command.

[sysadm1@frontend-0 sysadm1]$ cat sleep.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
date
sleep 10
date

Note

Entries which start with #$ will be treated as SGE options.

  • -cwd means to execute the job for the current working directory.

  • -j y means to merge the standard error stream into the standard output stream instead of having two separate error and output streams.

  • -S /bin/bash specifies the interpreting shell for this job to be the Bash shell.

To submit this serial job script, you should use the qsub command.

[sysadm1@frontend-0 sysadm1]$ qsub sleep.sh
your job 16 ("sleep.sh") has been submitted

Next, we'll submit a parallel job. First, let's get and compile a test MPI program. As a non-root user, execute:

$ cd $HOME
$ mkdir test
$ cd test
$ cp /opt/mpi-tests/src/*.c .
$ cp /opt/mpi-tests/src/Makefile .
$ make

Now we'll create an SGE submission script for mpi-ring. The program mpi-ring sends a 1 MB message in a ring between all the processes of an MPI job. Process 0 sends a 1 MB message to process 1, then process 1 send a 1 MB message to process 2, etc. Create a file named $HOME/test/mpi-ring.qsub and put the following in it:

#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#

/opt/openmpi/bin/mpirun -np $NSLOTS $HOME/test/mpi-ring

The command to submit a MPI parallel job script is similar to submitting a serial job script but you will need to use the -pe orte N. N refers to the number of processes that you want to allocate to the MPI program. Here's an example of submitting a job that will use 2 processors:

$ qsub -pe orte 2 mpi-ring.qsub

When the job completes, the job's output will be in the file mpi-ring.qsub.o*. Error messages pertaining to the job will be in mpi-ring.qsub.po*.

To run the job on more processors, just change the number supplied to the -pe orte flag. Here's how to run the job on 16 processors:

$ qsub -pe orte 16 mpi-ring.qsub

If you need to delete an already submitted job, you can use qdel given it's job id. Here's an example of deleting a fluent job under SGE:

[sysadm1@frontend-0 sysadm1]$ qsub fluent.sh
your job 31 ("fluent.sh") has been submitted
$ qstat
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID
---------------------------------------------------------------------------------------------
     31     0 fluent.sh  sysadm1      t     12/24/2003 01:10:28 comp-pvfs- MASTER
$ qdel 31
sysadm1 has registered the job 31 for deletion
$ qstat
$

Although the example job scripts are bash scripts, SGE can also accept other types of shell scripts. It is trivial to wrap serial programs into a SGE job script. Similarly, for MPI parallel jobs, you just need to use the correct mpirun launcher and to also add in the SGE variable, $NSLOTS within the job script. For other parallel jobs other than MPI, a Parallel Environment or PE needs to be defined. This is covered withn the SGE documentation found on Sun's web site.