Batch jobs are submitted to SGE via scripts. Here is an example of a serial job script, sleep.sh. It basically executes the sleep command.
[sysadm1@frontend-0 sysadm1]$ cat sleep.sh #!/bin/bash # #$ -cwd #$ -j y #$ -S /bin/bash # date sleep 10 date |
Entries which start with #$ will be treated as SGE options.
|
To submit this serial job script, you should use the qsub command.
[sysadm1@frontend-0 sysadm1]$ qsub sleep.sh your job 16 ("sleep.sh") has been submitted |
Next, we'll submit a parallel job. First, let's get and compile a test MPI program. As a non-root user, execute:
$ cd $HOME $ mkdir test $ cd test $ cp /opt/mpi-tests/src/*.c . $ cp /opt/mpi-tests/src/Makefile . $ make |
Now we'll create an SGE submission script for mpi-ring. The program mpi-ring sends a 1 MB message in a ring between all the processes of an MPI job. Process 0 sends a 1 MB message to process 1, then process 1 send a 1 MB message to process 2, etc. Create a file named $HOME/test/mpi-ring.qsub and put the following in it:
#!/bin/bash # #$ -cwd #$ -j y #$ -S /bin/bash # /opt/openmpi/bin/mpirun -np $NSLOTS $HOME/test/mpi-ring |
The command to submit a MPI parallel job script is similar to submitting a serial job script but you will need to use the -pe orte N. N refers to the number of processes that you want to allocate to the MPI program. Here's an example of submitting a job that will use 2 processors:
$ qsub -pe orte 2 mpi-ring.qsub |
When the job completes, the job's output will be in the file mpi-ring.qsub.o*. Error messages pertaining to the job will be in mpi-ring.qsub.po*.
To run the job on more processors, just change the number supplied to the -pe orte flag. Here's how to run the job on 16 processors:
$ qsub -pe orte 16 mpi-ring.qsub |
If you need to delete an already submitted job, you can use qdel given it's job id. Here's an example of deleting a fluent job under SGE:
[sysadm1@frontend-0 sysadm1]$ qsub fluent.sh your job 31 ("fluent.sh") has been submitted $ qstat job-ID prior name user state submit/start at queue master ja-task-ID --------------------------------------------------------------------------------------------- 31 0 fluent.sh sysadm1 t 12/24/2003 01:10:28 comp-pvfs- MASTER $ qdel 31 sysadm1 has registered the job 31 for deletion $ qstat $ |
Although the example job scripts are bash scripts, SGE can also accept other types of shell scripts. It is trivial to wrap serial programs into a SGE job script. Similarly, for MPI parallel jobs, you just need to use the correct mpirun launcher and to also add in the SGE variable, $NSLOTS within the job script. For other parallel jobs other than MPI, a Parallel Environment or PE needs to be defined. This is covered withn the SGE documentation found on Sun's web site.