Partners High Performance Computing Cluster
Partners applications via Mac/PC
Clinical & research applications
DFCI bioinformatics computer
PHS Research Computing cluster
Bioinformatics news
Data Storage & Backup
Sharing files & collaboration
HIPAA, ePHI and research (internal)
RPDR
HPCGG
Biomedical Engineering Model Shop
Harvard's GForge Implementation
Institutional research distribution lists

 

pHPC account registration pHPC user guide pHPC services pHPC web protal



A basic "MUST READ" for a first time user on Partners High Performance Computing (pHPC) clusters.


What is the pHPC cluster?
How to logon to the clusters?
Who uses the pHPC clusters
Who maintains and supports the pHPC clusters?
How do I run my computational job?
My program is not parallel, how can I allocate multiple computing nodes at the same time?
I need to interact with my program during the run, can I do that?
What operating system and applications are available on pHPC clusters?
How does the system prioritize jobs submitted by users?
How busy is the system?
My job does not run correctly, how do I know if the the issue is with my program or with the system?


What are the pHPC clusters

The Partners HPC (pHPC) comprises a growing number of computational resources. It includes two major clusters (HPRES, RCCLU) with more than 600 computing cores, 1.3TB memory and 40TB of memory. Both clusters were dilivered from Hewllet-Packer and they are LINUX based. You have to have some basic linux/unix knowledge to use the cluster efficiently. There are also various services for the clusters, including smaller scale clusters, databases, storage....etc. Please visit pPHC service pages for details.

A testing windows cluster (40 computing cores, 1TB storage) is available from Aug 2008, please contact rcc@partners.org for detail information.


How to logon to the clusters

If you are a Windows user, we suggest you download ssh software and ssh to the following address:

HPRES user: hpres.partners.org
RCCLU user: rcclu.partners.org

Once you login, you will land on login node called "n254" or "n255" for HPRES cluster, "n136" or "n137" for RCCLU cluster.

The cluser is firewall blocked so you might not be able to access it outside of the Partners Network.

Who uses the pHPC clusters

Users of pPHC clusters are from numerous labs and departments across Partners, Harvard University, and colloborators from otehr institutes within the nation. They are biologists, genetists, doctors, chemists, etc. To see the most recent active users, type "busers all" on your terminal:

[testy@n136 ~]$ busers all
USER/GROUP          JL/P    MAX  NJOBS   PEND    RUN  SSUSP  USUSP    RSV
boian                  -      -      0      0      0      0      0      0
clange                 -      -    500      0    120      0      0      0
cpas                   -      -      0      0      0      0      0      0
default                -      -      -      -      -      -      -      -
dennis                 -      -      0      0      0      0      0      0
dherman                -      -      1      0      1      0      0      0
dongmei                -      -     34      0     34      0      0      0
ds668                  -      -      0      0      0      0      0      0
esc11                  -      -      0      0      0      0      0      0
jsu                    -      -      0      0      0      0      0      0
jxu                    -      -      0      0      0      0      0      0
lsfadmin               -      -      0      0      0      0      0      0
mdupin                 -      -      0      0      0      0      0      0
others*0               -      -      0      0      0      0      0      0
sc567                  -      -      0      0      0      0      0      0
sg804                  -      -     20      0     10      0      0      0
testy                  -      -      0      0      0      0      0      0
yew                    -      -      0      0      0      0      0      0



Who maintains and supports the hpres cluster?

The pHPC clusters are supported by Enterprise Reseach Infrastructure Services (ERIS) group in Partners research computing department. The following employees offer daily assistance related to clusters.

Dennis Gurgul, email: dgurgul@partners.org, 617-724-3169
Jerry Xu, email: yxu11@partners.org, tel: 617-726-5832

How do I run my computational job

First, you need to setup your working environment in the clusters so that you can fully utilize the system. (It is not really required if you know how to configure your linux bash shell environment with conventional way)

Second, you need to check what resources (e.g. system and software, hardware) are available and whether these resources are enough for you to run your job

Third, the system uses a complicated software (Platform LSF) to manage users and their computing tasks. You need to use a "queue" to submit your job. Each queue has different properties and contraints with respect to job priority or resources allocation. To see which queue is available, type 'bqueues' from your terminal. Based on the information that you provide to us when you apply for an account, we will assign you the correct queues. Otherwise, by default, the system will use "normal" queue for your job. The "normal" queue has 3 hours run time limitation. If you have specific questions regarding queues, please contact us.

[testy@n137 ~]$ bqueues
QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP
priority         43  Open:Active       -    -    -    -     0     0     0     0
chkpnt_rerun_qu  40  Open:Active       -    -    -    -     0     0     0     0
steele           40  Open:Active       -    -    -    -     0     0     0     0
short            35  Open:Active       -    -    -    -     0     0     0     0
license          33  Open:Active       -    -    -    -     0     0     0     0
normal           30  Open:Active       -    -    -    -     0     0     0     0
8core            30  Open:Active       -    -    -    -     0     0     0     0
long             25  Open:Active       -    -    -    -   195     0   195     0
bioperl          25  Open:Active       -    -    -    -     0     0     0     0
matlab           19  Open:Active      35    -    -    -     0     0     0     0


Forth, check the job submission templates provided on this website. If you feel your situation is very specific, please contact us

My program is not parallel, how can I allocate multiple computing nodes at the same time?

If your program is MPI (Massage Passing Interface) enabled and is parallel itself, you can refer to "how to submit a parallel job" to submit your job. However, many parallel applications have build-in rules regarding submitting a parallel job. Please read your software manual carefully. If you have questions, please call us or send us email.

If your program is not parallel, it is acceptable to collect multiple computing nodes at the same time and each node will run independently. We enourage you to use Job Array to do so, especially when you have different input files or output files for each node. Please refer to how to submit massive serial jobs.

Many bioinfomatics serial jobs can be easily "embarrassingly" parallelized with MPI, and it will always give additional controlling advantages and improve job efficiency.

I need to interact with my program during the run; how can I do that?

Please refer to how to submit an interactive job

What operating system and applications are available on pHPC clusters for user?

Please refer to system and applications in pHPC clusters

How does the system prioritize jobs submiited by users?

By default, LSF calculates the dynamic priority based on the following information about each user:
1. Number of shares assigned to the user
2. Resources used by jobs belonging to the user:
    a. Number of job slots reserved and in use.
    b. Run time of running jobs.
    c. Cumulative actual (not normalized) CPU time.
    d. Historical run time of finished jobs.
    e. Committed run time, specified at job submission with the -W option of bsub, or in the queue with the RUN_LIMIT parameter.

In otherwords, for users with the same "share" and the same queue, if your job requires fewer resources, it will be put on the top of the queue. The highest priority of the queue is "private", which belongs to the system administrator. By default, the system will use "normal" queue.

On the other hand, we also can adjust user's job priority based on the service model that specifc user has.
To check our service model, please visit ERIS pHPC services models


How busy is the system?

You can use the command "bjobs -u all" to check how many jobs have been submitted (running or pending) on the system. You also can use command "bhosts" to find more information about availability of the resources in the cluster

[testy@n137 ~]$ bhosts
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
8core              ok              -    192      7      7      0      0      0
general1           ok              -    112     18     18      0      0      0
general2           ok              -    112     14     14      0      0      0
n0                 closed          -      0      0      0      0      0      0
n136               closed          -      0      0      0      0      0      0
n137               closed          -      0      0      0      0      0      0
steele             ok              -     32     29     29      0      0      0
The first column is the resource group. For example, "general1" is a node group contains a number of computing nodes, in the above output, it shows it can take 112 computing task maximumly and have 18 jobs runing. Status "ok" means it still can take jobs.


My job does not run correctly, how do I know if the the issue is with my program or with the system?

Most of the time, we encourage you to carefully check your code and your software manual. But, if you do have a question, please do not hesitate to contact us. ERIS team provide scientific computing support to identify the problem but we might not able to rewrite your application due to limited resources that we have. Remember that the HPC system continues to evolve and may need to catch up with the almots daily changes in biocomputing field.

Please read through the instructions on this website, if you have a question, you can either post it on the web, or send an email to us.