|
Submit a MPI program on cluster
The key here is to combine the command mpirun with lsb_hosts to submit a mpi job in hpres cluster.
Attention: You must use HP-MPI to test the following script. HP-MPI is basically compatible with MPICH 1.x
Step 1: Example mpiintegral.c can be downloaded
Step 2: Compile it with the HP-MPI compiler
If you have loaded the right module (refer to "setup your working environment" ), you shall have hp-mpi available.
[testy@n137 ~] which mpicc
/opt/hpmpi/bin/mpicc
[testy@n137 ~] mpicc -o mpiintegral mpiintegral.c
Step 3: Create your job script mympijob.lsf
[testy@n137 ]$ vi mympijob.lsf
# enable your environment, which will use .bashrc configuration in your home directory
#BSUB -L /bin/bash
# the name of your job showing on the queue system
#BSUB -J mpitest
# the queue that you will use, the example here use the queue called "normal"
# please use bqueus command to check the available queues
#BSUB -q normal
# the system output and error message output, %J will show as your jobID
#BSUB -o %J.out
#BSUB -e %J.err
#the computing core number that you will collect (Attention: each node has 4 to 8 cores)
#BSUB -n 20
#when job finish that you will get email notification
#BSUB -u youremail@partners.org
#BSUB -N
#enter your working directory, change to your own dir
cd /shr/home/$USER/
#Finally, start the mpi program. You MUST make sure the argument for -np, here is "8" is same as the
#number for "#BSUB -n"
mpirun -np 20 -lsb_hosts ./mpiintegral
Step 4: Submit your job:
[testy@n137 ~]$ bsub < mympijob.lsf
You can always check it by typing "bjobs". If your job is dispatched, it will show as the following. The job id is 14894.
[testy@n137 ]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
14894 testy PEND normal n137 mpitest Dec 17 14:35
[jxu@n137 examples]$ bjobs -l
Job <14894>, Job Name , User , Project , Mail , Status , Queue , Command <#!/bin
/bash; # enable your environment, which will use .bashrc c
onfiguration in your home directory;#BSUB -L /bin/bash; #
the name of your job showing on the queue system;#BSUB -J
gethosts; # the queue that you will use, the example here
use the queue called "normal";# please use bqueus command
to check the available queues;#BSUB -q normal; # the syst
em output and error message output, %J will show as your j
obID;#BSUB -o %J.out;#BSUB -e %J.err; #the CPU number that
you will collect >
Thu Dec 17 14:35:40: Submitted from host , CWD <$HOME/TestMPI/examples>,
Output File <%J.out>, Error File <%J.err>, Notify when job
ends, 20 Processors Requested, Login Shell ;
Thu Dec 17 14:35:44: Started on 20 Hosts/Processors <3*n25> <2*n3> <2*n8> <2*n2
7> <2*n5> <2*n4> <2*n23> <2*n14> <2*n15> <1*n10>, Executio
n Home , Execution CWD ;
Thu Dec 17 14:35:44: Resource usage collected.
MEM: 1 Mbytes; SWAP: 10 Mbytes; NTHREAD: 1
PGID: 12476; PIDs: 12476
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
Step 5: Check your output results:
When job is finishes, the system will generate output in *.out file.
[testy@n137 ~]$ vi 14894.out
n57 n57 n57 n57 n59 n59 n59 n59 n58 n58 n58 n58 n61 n61 n61 n60 n60 n60 n63 n63
Process 0 has the partial integral of 0.078459
Process 1 has the partial integral of 0.077975
Process 3 has the partial integral of 0.075572
Process 2 has the partial integral of 0.077011
Process 4 has the partial integral of 0.073666
Process 9 has the partial integral of 0.057659
Process 13 has the partial integral of 0.038366
Process 16 has the partial integral of 0.021313
Process 19 has the partial integral of 0.003083
The Integral =1.000000
Process 5 has the partial integral of 0.071307
Process 6 has the partial integral of 0.068508
Process 7 has the partial integral of 0.065287
Process 8 has the partial integral of 0.061663
Process 11 has the partial integral of 0.048611
Process 10 has the partial integral of 0.053299
Process 12 has the partial integral of 0.043623
Process 14 has the partial integral of 0.032873
Process 15 has the partial integral of 0.027177
Process 17 has the partial integral of 0.015318
Process 18 has the partial integral of 0.009229
|