 |
|
A basic "MUST READ" for a first time user on Partners High Performance Computing (pHPC) Windows cluster.
What is, and why do we depoly the Windows cluster?
How to access the Windows clusters?
Who can use the Windows clusters
Who maintains and supports the Windows clusters?
How do I run my computational job in the windows cluster?
How do allocate multiple nodes for my parallel and non-parallel batch jobs?
How can I interact with the program during the computation?
What operating system and applications are available on pHPC clusters?
What application can be installed on pHPC clusters?
My job does not run correctly, how do I trouble-shoot?
Where can I find more resources about the Windows cluster?
What is, and Why do we depoly the Windows cluster?
The current Windows cluster in the Partners HPC (pHPC) infrastructure is also known as Windows HPC Server 2008, which is a product targeting high performance computing usage and was just released from Microsoft in Sept 2008.
Until now the pHPC primary offering has consisted of two major clusters (HPRES, RCCLU) and several small clusters, they have been the main computing facilities for our users, but they are all Linux based. The new Windows cluster provides an opportunity for users who need to develop computationally intensive application in Windows environment, particularly VC++ or dotNet applications. It is also easy to launch batch jobs in the Windows cluster such as NCBI blast, Matlab, Perl, or R scripts.
The hardware of the Windows cluster is composed of 10 DL145G3 servers, each of them has 4 cores, 8GB memory. All of the nodes share the storage mounted at the headnode, the total current storage size is 1TB. We expect to expand the system based on users' research needs.
How to access the Windows clusters?
The best way to use the cluster is to install a client utility called "HPC Job Manager" in your own machine, and use it to submit jobs to the cluster. Please refer to "Access windows cluster" .
And, you do not need RDP (Remote Desktop) access to the cluster unless you need a high resolution GUI interface. The Enterprise Research Infrastructure Services (ERIS) group is working to provide solution for that specific type of usage.
Like the Linux clusters, the Windows cluster is also firewall blocked so you might not be able to access it outside of the Partners Network.
Who can use the Windows clusters
Any researcher or developer that is affiliated with Partners Healthcare and has a Partners logon account can apply for an account on the Windows cluster through the registration form.
We are expecting the growing usage of this cluster and also look for collaboration with users that are interested in developing high performance computing applications based on Windows development environment or dotNet framework. A group of image analysis researchers and bioinformaticians already have begun exploring the Windows cluster.
Who maintains and supports the Windows clusters?
The Windows cluster is supported by Enterprise Research Infrastructure Services (ERIS) group in the Partners research computing department. The following employees offer daily assistance related to clusters.
Dennis Gurgul, email: dgurgul@partners.org, 617-724-3169
Jerry Xu, email: yxu11@partners.org, tel: 617-726-5832
Where do I store my excutable files and data files?
Theoretically, you can store your files in any file system (including pfs$) in the Partners network as long as they can be accessed with your Partners authentication (your Partners logon account). However, in order to avoid network bottleneck, you might want to store your files in \\rcwinclu\$PARTNERID\ ($PARTNERID is your Partners logon account name). That storage is within the Windows cluster network and can be easily reached by computing nodes. Currently we have limited space but will expand it the future. Please contact Dennis Gurgul for more information.
How do I run my computational job in the Windows cluster?
Please refer to "How to use Windows cluster for computational jobs" .
How do allocate multiple nodes for my parallel and non-parallel batch jobs?
If your program is MPI (Massage Passing Interface) enabled or MPI.NET enabled, you can refer to "how to submit a parallel MPI job in windows cluster"
to submit your job. However, many parallel applications have built-in rules regarding submitting a parallel job. Please read your software manual carefully. If you have questions, please call us or send us email.
If your program is not parallel, it is also very common to submit a large number of batch jobs with different input arguments or input data to the cluster. In particular, the Windows cluster "Job Manager" utility provides a "parametric sweep" function that allows you to quickly submit a large number of batch jobs. You can also create your own script/program to give you more flexibility to submit a large number of batch jobs. Please refer to how to submit a large number of batch jobs in Windows cluster. .
If your program is multithreading enabled (using multiple cores within one node), you can submit many of these multithreading jobs as batch jobs but you might want to allocate each node "exclusively" for each job. This allocation is selected during the job submission process.
Many bioinformatics serial jobs can be easily "embarrassingly" parallelized with MPI, and it will always give additional controlling advantages and improve job efficiency.
How can I interact with the program during the computation?
In general, the windows cluster does not allow user remote desktop to the cluster computing node and run application interactvely.
However, some applications such as Matlab can interact with cluster directly from the client. And, we indeed have some oomputing nodes can be accessed directly but they also need to be allocated through job scheduler. Please contact us rcc@partners.org , we are willing to work with you to resolve those specific problems.
What operating system and applications are available on pHPC Windows clusters for user?
Please refer to system and applications in pHPC clusters
What application can be installed on pHPC Windows clusters?
Normally, many 32bit or 64bit applications currently run under windows XP, Vista or Windows Server 2003 can be successfully migrated to the current pHPC Windows cluster. However, in order to use multiple nodes in the cluster, the application needs to have capability either to run under batch mode or to be parallel-aware. If the application is single thread and needs GUI, please refer to the above "How can I interact with the program during the computation?". And, if your application contains libraries compiled under dotNet1.1 or environment prior to dotNet1.1, the application might encounter various problems in the current pHPC Windows cluster.
My job does not run correctly, how do I trouble-shoot?
The "Job Manager" tools will provide very useful log information regarding troubleshooting, but we strongly suggest you specify your standard output file and standard error file when you prepare your computing tasks through the "Job Manager". Those two files will provide extra information. And, most of the time, we encourage you to carefully check your code and your software manual. If you do have a question, please do not hesitate to contact us. The ERIS team can provide scientific computing support to help identify the problem, but please keep in mind that we might not able to rewrite your application due to the limited resources that we have. And, please remember that the HPC system continues to evolve and may need to catch up with the almost daily changes in biocomputing field.
And, most of the time, we encourage you to carefully check your code and your software manual. If you do have a question, please do not hesitate to contact us. ERIS team provide scientific computing support to identify the problem, but please keep in mind that we might not able to rewrite the application for you due to limited resources that we have. And, please remember that the HPC system continues to evolve and may need to catch up with the almots daily changes in biocomputing field.
Where can I find more resources about the Windows cluster?
We have maintained some documentation at \\rcwinclu\public\windowsHPC_doc you can retrieve them easily from within PHS network. You can also always visit Microsoft TechNet for more information.
|