Partners High Performance Computing Cluster
Partners applications via Mac/PC
Clinical & research applications
DFCI bioinformatics computer
PHS Research Computing cluster
Bioinformatics news
Data Storage & Backup
Sharing files & collaboration
HIPAA, ePHI and research (internal)
RPDR
HPCGG
Biomedical Engineering Model Shop
Harvard's GForge Implementation
Institutional research distribution lists

 

pHPC account registration pHPC user guide pHPC services pHPC web protal


Resources in HPC Clusters

Both HPRES and RCCLU clusters share the same file system and they have nearly identical OS configurations. Thus the applications listed in this page fit for both systems.

Note:
1. HPRES is using AMD64 processors, RCCLU is using Intel XEON processors. Some applications might need different arguments when you use them (most of them are same).
2. In the linux environment, users can install various additional software into their own home directories without special permission or privileges. Our staff is happy to assist if you have difficulties.

News: A testing 10-node, 40-core Windows 2008 HPC cluster is currently depolyed for windows users. Please contact yxu11@partners.org for more information regarding the applications installed over there. Later more information will be posted on the website


__________________________________________________________________________


System, OS, Scheduler and Resource Management

Names Description and References
SSH address RCCLU cluster: rcclu.partners.org
HPRES cluster: hpres.partners.org
* How to request your account?
* How to login to the cluster
Login Node On RCCLU cluster: n136 or n137
On HPRES cluster: n62 or n63
* How to setup your login ?
Linux OS 2.6.9-42.9hp.2sp.XCsmp x86_64 GNU/Linux
Default Shell Bash
User Home /shr/home/your_username
Cluster Software HP XC V3.2
* XC User Manual from HP
Scheduler, Queue and Resource Management HP LSF-6.2
* Quickstart for job submission
* Complete Platform LSF 6.2 User Guide
User Environment Control HP Environment Modules
* How to setup your environment


__________________________________________________________________________


General and High Performance Compilers. Math Library, Modules on the cluster(s)


Compilers
Names Path Description and References
GNU gcc version 3.4.6 (gcc, g++, g77) /usr/bin GNU Compiler Default from system:
GNU gcc version 4.2 (gcc, g++, gfortran) /source/gnu_4.2/bin GNU Compiler (with openMP support)
* how to launch multi-thread OpenMP job
Intel (c/c++/fortran) and the debuggering tool /source/intel/cce
/source/intel/fce
/source/intel/idbe
Intel 10.1 Compiler (Recommended)
* quick example how to use Intel compiler
* complete Intel 10.1 C compiler user guide (PHS network)
Intel Threading Building Blocks (TBB) /source/intel/tbb
Intel TBB 2.0
* Intel TBB user guide (PHS network)
HP MPI 2.01.00-08(mpicc, mpiCC, mpiff, mpif90) /opt/hpmpi/bin HP MPI C/C++ Compiler, fully compatible with mpich1.2
(Recommended) for MPI jobs
* How to use MPI on the cluster ?
MPICH 1.2 /source/mpich MPICH Compiler
MPICH 2 /source/mpich2 MPICH2 Compiler
JDK5 /source/jdk1.5.0 Java Compiler
JDK6 /source/jdk1.6.0 Java Compiler

Extra Math Libraries
Names Path Description and References
Numeric C /source/numeric_c From Numeric C Publisher, including a large number of source code for mathematical simulation and modeling
* External resource Numerical Recipes in C
The GNU Scientific Library /source/gsl mathematical libs
How to use gsl on cluster
FFTW Library /source/fftw Discrete Fourier transform
Intel MKL 10.1 /source/intel/cmkl/ Intel Math Kernel Library
How to use Intel MKL in the cluster



__________________________________________________________________________


Scientific Computing Software and Applications

Notes: Most users can install various applications in their own home directory. Please simply email us if you think it is necessary to install the application into a public directory (for example, large amount of storage is needed, group accessed is required, or restriction of license ..etc)


Matlab 2007
Names Path Description
Matlab2007b /source/matlab2007b Matlab
Current Toolboxes available
MATLAB Version 7.4 (R2007b)
Distributed Computing Toolbox/Engine (R2007b)
Optimization Toolbox (R2007b)
Statistics Toolbox (R2007b)
Signal Processing Toolbox (R2007b)
Image Processing Toolbox (R2007b)
Wavelet Toolbox (R2007b)
Bioinformatics Toolbox (R2007b)
The number of license for toolboxes are limited.

* How to use matlab
* How to use matlab DCE to fire parallel jobs
( Matlab R2008b is scheduled to be depolyed later this year)

General biostatistics software
Names Path Description
BioPerl Module /usr/bin/perl Bioperl 1.5, including core packages and run packages
R version 2.5 and BioConductor /source/R_2.5 Open source statistics software (including bioconductor modules for microarray analysis)


Sequencing alignment and analysis
Names Path Description and References
Solexa Pipeline /source/Solexa/GAPipeline-0.3.0 This is "next generation sequencing" pipeline. Multiple versions compiled with different libraries are availble. Running selexa requires user have knowledge of some basic configurations of the pipeline and the system, please consult scientific computing support
NCBI Toolbox, 2007 March release /source/ncbi/bin NCBI Toolbox including blastall, megablast, wblast...etc
* How to run BLAST on the cluster
* External resources NCBI BlAST
UCSC BLAT /source/blat BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more.
* How to run BLAT on the cluster
* External resources UCSC BLAT
UMBL EMBOSS /source/emboss_5.0 open Source software analysis package from EBI, specially developed for the needs of the molecular biology (e.g. EMBnet) user community.

* External resources UMBL EMBOSS
Clustalw /source/clustalw2.0 open Source software for sequence alignment
HMMER /source/hmmer_2.3 HMMER: biosequence analysis using profile hidden Markov models
MR Bayers /source/mrbayes_3.1 Bayesian analysis of phylogeny
pbat, p2bat Belong to specific users , contact HSPH Chris Lange An interactive software package for the design of genetic family-based association studies
* Resource from pbat manual
plink /source/plink_1.0 A free, open-source whole genome association analysis toolset developed from MGH
* Exteranl resource from plink website


Protein structural dynamics
Names Path Description and References
eHits /source/eHits_6.2
/ehits.sh
Protein-ligand docking program, fast screen for large-scale ligand databases
* How to use ehits in the cluster
* External resources SimBioSys eHits
Dock 6 /source/dock6 Protein-ligand docking simulation (allow multiple nodes, MPI-enabled).
AutoDock /source/autodock Protein-ligand docking simulation (Single Thread).
NAMD /source/namd Molecular dynamics simulation.
* how to use namd in the cluster?
* External resources UIUC namd
CHARMM /source/charmm Molecular dynamics simulation (compiled with Intel Fortran Compiler)



Public genomic/genetic databases
Names Path Description and References
NCBI Blast DB /pub1/ftp-ncbi-blast/blast/db BLAST database: including latest version nt, nr, human_genomic, mourse_genomic, est_others, est_human. (last update Dec 2007)
The NCBI genome data /pub1/ftp-ncbi-blast/blast/genome genome data, last update Jan 2008
UCSC genome /pub1/ucsc-genome (.2bit) format for BLAT purpose
(note: Other large-scale databases can be uploaded to the system based on request )

Text-Mining applications
Names Path Description
MMTx /pub1/nlp/ Biomedical text mining software from NLM
How to run MMTx in the cluster
UIMA /source/uima General text mining software



3D Graphics Library
Names Path Description
Mesa /source/mesa Open Source 3D computer graphics library that provides a generic OpenGL implementation for rendering three-dimensional graphics on multiple platforms.



Tomographic image analysis
Names Path Description
GATE/GEANT /source/geant/geant4.9.1.p01/
work/bin/Linux++
This application is used for development and evaluation of tomographic reconstruction algorithms and other numerical observer studies.

Running GATE/GEANT requires complicated environment setup and configuration, please consult scientific computing support for envrionment variables

More information can be retrieved from national-wide health grid


Cell image analysis application
Names Path Description and References
CProfiler /source/CProfiler/ Open-Source cell image analysis software from Board Institute
We have both matlab source version and executable version in the cluster, please consult scientific computing support regarding how to build the cluster version pipeline.
CellProfiler Manual



__________________________________________________________________________


Services Integrated with the Clusters
(The following services can only be accessed from within PHS network)


Database Service
Names Address Description and References
MySQL 5.2 hpcdb.research.
partners.org
Currently store data for Text Mining and Sequence Analysis, Please consult scientific computing support if you want to upload your data
PostgreSQL 8.2 hpcdb.research.
partners.org
Currently is used for proteomics analysis. Please consult system admin to use it


GRID Web Service
Names Address Description and References
Engineframe BioGrid access hpcweb.research.partners.org:8081/
engineframe/
It provides web access to the cluster and allow user to control/manage their own web interface to interact with cluster.
LabKey 2.1 hpcweb.research.
partners.org
A web-based high-throughput proteomics Mass Spectroscopy analysis application integrated with cluster. Please consult system admin

Visit HPCGG Proteomics Web Portal (if you do not have Labkey password, you will see a demo project as anonymous user) .


Windows GUI access
Names Address Description and References
Windows-Based Applications hpcwin.research.
partners.org
Providing windows GUI interface for users to retrieve data/results from cluster remotely. Application can be installed based on user's request