 |
|
A basic "MUST READ" for a first time user on Partners High Performance Computing (pHPC) clusters.
What is the pHPC cluster?
How to logon to the clusters?
Who uses the pHPC clusters
Who maintains and supports the pHPC clusters?
How do I run my computational job?
My program is not parallel, how can I allocate multiple computing nodes at the same time?
I need to interact with my program during the run, can I do that?
What operating system and applications are available on pHPC clusters?
How does the system prioritize jobs submitted by users?
How busy is the system?
My job does not run correctly, how do I know if the the issue is with my program or with the system?
What are the pHPC clusters
The Partners HPC (pHPC) comprises a growing number of computational resources. It includes two major clusters (HPRES, RCCLU) with more than 600 computing cores, 1.3TB memory and 40TB of memory. Both clusters were dilivered from Hewllet-Packer and they are LINUX based. You have to have some basic linux/unix knowledge to use the cluster efficiently. There are also various services for the clusters, including smaller scale clusters, databases, storage....etc. Please visit pPHC service pages for details.
A testing windows cluster (40 computing cores, 1TB storage) is available from Aug 2008, please contact rcc@partners.org for detail information.
How to logon to the clusters
If you are a Windows user, we suggest you download ssh software and ssh to the following address:
HPRES user: hpres.partners.org
RCCLU user: rcclu.partners.org
Once you login, you will land on login node called "n254" or "n255" for HPRES cluster, "n136" or "n137" for RCCLU cluster.
The cluser is firewall blocked so you might not be able to access it outside of the Partners Network.
Who uses the pHPC clusters
Users of pPHC clusters are from numerous labs and departments across Partners, Harvard University, and colloborators from otehr institutes within the nation. They are biologists, genetists, doctors, chemists, etc.
To see the most recent active users, type "busers all" on your terminal:
[testy@n136 ~]$ busers all
USER/GROUP JL/P MAX NJOBS PEND RUN SSUSP USUSP RSV
boian - - 0 0 0 0 0 0
clange - - 500 0 120 0 0 0
cpas - - 0 0 0 0 0 0
default - - - - - - - -
dennis - - 0 0 0 0 0 0
dherman - - 1 0 1 0 0 0
dongmei - - 34 0 34 0 0 0
ds668 - - 0 0 0 0 0 0
esc11 - - 0 0 0 0 0 0
jsu - - 0 0 0 0 0 0
jxu - - 0 0 0 0 0 0
lsfadmin - - 0 0 0 0 0 0
mdupin - - 0 0 0 0 0 0
others*0 - - 0 0 0 0 0 0
sc567 - - 0 0 0 0 0 0
sg804 - - 20 0 10 0 0 0
testy - - 0 0 0 0 0 0
yew - - 0 0 0 0 0 0
Who maintains and supports the hpres cluster?
The pHPC clusters are supported by Enterprise Reseach Infrastructure Services (ERIS) group in Partners research computing department. The following employees offer daily assistance related to clusters.
Dennis Gurgul, email: dgurgul@partners.org, 617-724-3169
Jerry Xu, email: yxu11@partners.org, tel: 617-726-5832
How do I run my computational job
First, you need to setup your working environment in the clusters so that you can fully utilize the system. (It is not really required if you know how to configure your linux bash shell environment with conventional way)
Second, you need to check what resources (e.g. system and software, hardware) are available and whether these resources are enough for you to run your job
Third, the system uses a complicated software (Platform LSF) to manage users and their computing tasks. You need to use a "queue" to submit your job. Each queue has different properties and contraints with respect to job priority or resources allocation. To see which queue is available, type 'bqueues' from your terminal. Based on the information that you provide to us when you apply for an account, we will assign you the correct queues. Otherwise, by default, the system will use "normal" queue for your job. The "normal" queue has 3 hours run time limitation. If you have specific questions regarding queues, please contact us.
[testy@n137 ~]$ bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
priority 43 Open:Active - - - - 0 0 0 0
chkpnt_rerun_qu 40 Open:Active - - - - 0 0 0 0
steele 40 Open:Active - - - - 0 0 0 0
short 35 Open:Active - - - - 0 0 0 0
license 33 Open:Active - - - - 0 0 0 0
normal 30 Open:Active - - - - 0 0 0 0
8core 30 Open:Active - - - - 0 0 0 0
long 25 Open:Active - - - - 195 0 195 0
bioperl 25 Open:Active - - - - 0 0 0 0
matlab 19 Open:Active 35 - - - 0 0 0 0
Forth,
check the job submission templates provided on this website. If you feel your situation is very specific, please contact us
My program is not parallel, how can I allocate multiple computing nodes at the same time?
If your program is MPI (Massage Passing Interface) enabled and is parallel itself, you can refer to "how to submit a parallel job" to submit your job.
However, many parallel applications have build-in rules regarding submitting a parallel job. Please read your software manual carefully. If you have questions, please call us or send us email.
If your program is not parallel, it is acceptable to collect multiple computing nodes at the same time and each node will run independently. We enourage you to use Job Array to do so, especially when you have different input files or output files for each node. Please refer to how to submit massive serial jobs.
Many bioinfomatics serial jobs can be easily "embarrassingly" parallelized with MPI, and it will always give additional controlling advantages and improve job efficiency.
I need to interact with my program during the run; how can I do that?
Please refer to how to submit an interactive job
What operating system and applications are available on pHPC clusters for user?
Please refer to system and applications in pHPC clusters
How does the system prioritize jobs submiited by users?
By default, LSF calculates the dynamic priority based on the following information about each user:
1. Number of shares assigned to the user
2. Resources used by jobs belonging to the user:
a. Number of job slots reserved and in use.
b. Run time of running jobs.
c. Cumulative actual (not normalized) CPU time.
d. Historical run time of finished jobs.
e. Committed run time, specified at job submission with the -W option of bsub, or in the queue with the RUN_LIMIT parameter.
In otherwords, for users with the same "share" and the same queue, if your job requires fewer resources, it will be put on the top of the queue.
The highest priority of the queue is "private", which belongs to the system administrator. By default, the system will use "normal" queue.
On the other hand, we also can adjust user's job priority based on the service model that specifc user has. To check our service model, please visit
ERIS pHPC services models
How busy is the system?
You can use the command "bjobs -u all" to check how many jobs have been submitted (running or pending) on the system. You also can use command "bhosts" to find more information about availability of the resources in the cluster
[testy@n137 ~]$ bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
8core ok - 192 7 7 0 0 0
general1 ok - 112 18 18 0 0 0
general2 ok - 112 14 14 0 0 0
n0 closed - 0 0 0 0 0 0
n136 closed - 0 0 0 0 0 0
n137 closed - 0 0 0 0 0 0
steele ok - 32 29 29 0 0 0
The first column is the resource group. For example, "general1" is a node group contains a number of computing nodes, in the above output, it shows it can take 112 computing task maximumly and have 18 jobs runing. Status "ok" means it still can take jobs.
My job does not run correctly, how do I know if the the issue is with my program or with the system?
Most of the time, we encourage you to carefully check your code and your software manual. But, if you do have a question, please do not hesitate to contact us. ERIS team provide scientific computing support to identify the problem but we might not able to rewrite your application due to limited resources that we have. Remember that the HPC system continues to evolve and may need to catch up with the almots daily changes in biocomputing field.
Please read through the instructions on this website, if you have a question, you can either post it on the web, or send an email to us.
|