3.3. Monitoring SGE Jobs

To monitor jobs under SGE, use the qstat command. When executed with no arguments, it will display a summarized list of jobs

[sysadm1@frontend-0 sysadm1]$ qstat
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID
     20     0 sleep.sh   sysadm1      t     12/23/2003 23:22:09 frontend-0 MASTER
     21     0 sleep.sh   sysadm1      t     12/23/2003 23:22:09 frontend-0 MASTER
     22     0 sleep.sh   sysadm1      qw    12/23/2003 23:22:06

Use qstat -f to display a more detailed list of jobs within SGE.

[sysadm1@frontend-0 sysadm1]$ qstat -f
queuename            qtype used/tot. load_avg arch      states
comp-pvfs-0-0.q      BIP   0/2       0.18     glinux    
comp-pvfs-0-1.q      BIP   0/2       0.00     glinux    
comp-pvfs-0-2.q      BIP   0/2       0.05     glinux    
frontend-0.q         BIP   2/2       0.00     glinux
     23     0 sleep.sh   sysadm1      t     12/23/2003 23:23:40 MASTER
     24     0 sleep.sh   sysadm1      t     12/23/2003 23:23:40 MASTER
     25     0 linpack.sh sysadm1      qw    12/23/2003 23:23:32

You can also use qstat to query the status of a job, given it's job id. For this, you would use the -j N option where N would be the job id.

[sysadm1@frontend-0 sysadm1]$ qsub -pe mpich 1 single-xhpl.sh
your job 28 ("single-xhpl.sh") has been submitted
[sysadm1@frontend-0 sysadm1]$ qstat -j 28
job_number:                 28
exec_file:                  job_scripts/28
submission_time:            Wed Dec 24 01:00:59 2003
owner:                      sysadm1
uid:                        502
group:                      sysadm1
gid:                        502
sge_o_home:                 /home/sysadm1
sge_o_log_name:             sysadm1
sge_o_path:                 /opt/sge/bin/glinux:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/ganglia/bin:/opt/maui/bin:/opt/OpenPBS/bin:/opt/OpenPBS/sbin:/opt/rocks/bin:/opt/rocks/sbin:/home/sysadm1/bin
sge_o_mail:                 /var/spool/mail/sysadm1
sge_o_shell:                /bin/bash
sge_o_workdir:              /home/sysadm1
sge_o_host:                 frontend-0
account:                    sge
cwd:                        /home/sysadm1
path_aliases:               /tmp_mnt/ * * /
merge:                      y
mail_list:                  [email protected]
notify:                     FALSE
job_name:                   single-xhpl.sh
shell_list:                 /bin/bash
script_file:                single-xhpl.sh
parallel environment:  mpich range: 1
scheduling info:            queue "comp-pvfs-0-1.q" dropped because it is temporarily not available
                            queue "comp-pvfs-0-2.q" dropped because it is temporarily not available
                            queue "comp-pvfs-0-0.q" dropped because it is temporarily not available