Partners High Performance Computing Cluster
Partners applications via Mac/PC
Clinical & research applications
DFCI bioinformatics computer
PHS Research Computing cluster
Bioinformatics news
Data Storage & Backup
Sharing files & collaboration
HIPAA, ePHI and research (internal)
RPDR
HPCGG
Biomedical Engineering Model Shop
Harvard's GForge Implementation
Institutional research distribution lists

 

pHPC account registration pHPC user guide pHPC services pHPC web protal



Check your job status on cluster


Intro: HP XC-4000 cluster use LSF and SLURM together to manage/monitor your job. The command to monitor your job is "bjobs"

Example output:

[testy@n137 ~]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
667988  testy   PEND  private    n63                     gethosts   Aug 31 09:47


[testy@n137 ~]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
667988  testy   RUN   private    n63         lsfhost.l   gethosts   Aug 31 09:47


The STAT field shows the status of the job and can be one of the following values:

* PEND: the job is pending in the queue
* RUN: the job has been launched and is running on a system
* SSUSP: the job has been automatically suspended by the system (and will be automatically resumed when favorable conditions return)
* USUSP: the job has been suspended by the owner or administrator via the bstop command (bresume resumes that job)
* PSUSP: the job has been suspended by the owner or administrator while pending.

For detailed information of the job, use -l as argument
[testy@n63 ~]$ bjobs -l 667988


If you want to know which node has been collected, use the following command to grep the SLURM allocation

[testy@n137 ~]$ bjobs -l 667988 | grep slurm
Mon Sep  4 13:52:43: slurm_id=284200;ncpus=4;slurm_alloc=n[18];

The example ouput shows slurm_alloc=n[18], which means the system collected node n18 for your program. From time to time, you want to know whether the sytem is busy, because that will decide when resources will be available and when your job will begin to run.

To check all jobs on the cluster
[testy@n137 ~] bjobs -u all

To check all jobs on the PEND status
[testy@n137 ~] bjobs -u all | grep PEND

If you just want to know the number of jobs with pending status
[testy@n137 ~] bjobs -u all | grep PEND | wc -l

If you have a job array, you will see (please refer to "how to submit massive serial job")
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
667990  testy   RUN   private    n63         lsfhost.loc *thosts[1] Aug 31 10:24
667990  testy   RUN   private    n63         lsfhost.loc *thosts[2] Aug 31 10:24
667990  testy   RUN   private    n63         lsfhost.loc *thosts[3] Aug 31 10:24
667990  testy   RUN   private    n63         lsfhost.loc *thosts[4] Aug 31 10:24
667990  testy   RUN   private    n63         lsfhost.loc *thosts[5] Aug 31 10:24
667990  testy   PEND  private    n63         lsfhost.loc *thosts[6] Aug 31 10:24
667990  testy   PEND  private    n63         lsfhost.loc *thosts[7] Aug 31 10:24
667990  testy   PEND  private    n63         lsfhost.loc *thosts[8] Aug 31 10:24
As you can see, some are running and some are on pending.