|
Submit a MPICH2 MPI program on cluster
MPICH2 is an all-new implementation of MPI, designed to support research into high-performance implementations of MPI-1 and MPI-2 functionality. ( ) Please check MPICH2 website
In order to run MPICH2 MPI program, we need to start the MPICH2 mpd ring in the nodes that are allocated by LSF scheduler, and use mpiexec to submit a mpi job in the clusters.
Step 1: You must laod MPICH2 module. By default, the system will use HP-MPI which is compatible with MPICH 1.x. In order to use MPICH2, in your .bashrc file, you need to have "module load mpi/mpich2/default" in your module-load section, for example, it will look like
# modules definitions
if [ -n "$MODULESHOME" ]; then
module use /shr/modules
module load mpi/mpich2/default
module load matlab/default
module load intel/cce/10.1/default
module load java/1.6/default
fi
Type "..bashrc" in the terminal to enable the environment configuration.
[testy@n137 ~] ..bashrc
Step 2: Example MPICH2Example.zip can be downloaded. It also contains a Makefile example to compile multiple C++ source files to single execuable file. Please download and unzip it to /shr/home/$USER/TestMPICH2/
Step 3: Compile it with the MPICH2 compiler
[testy@n137 ~] which mpicxx
/source/mpich2/bin/mpicxx
[testy@n137 ~] which mpiexec
/source/mpich2/bin/mpiexec
[testy@n137 ~] cd TestMPICH2
[testy@n137 TestMPICH2]$ make
mpicxx -o main.o -c main.cpp
mpicxx -o funkywork.o -c funkywork.cpp
mpicxx -o myexe main.o funkywork.o -lm -O2 -Wall
Step 4: Create your job script called "mpich2.lsf"
#BSUB -L /bin/bash
# the name of your job on the queue system
#BSUB -J mpich2_test
# the queue that you will use, the example here use the queue called "normal"
# please use bqueus command to check the available queues
#BSUB -q normal
# the system output and error message output, %J will show as your jobID
#BSUB -o %J.out
#BSUB -e %J.err
#the number of processors that you will use
#BSUB -n 10
#when job finish that you will get email notification
#BSUB -u testy@partners.org
#BSUB -N
############ enter your working directory, change to your own dir ###
work_dir="/shr/home/$USER/TestMPICH2"
cd $work_dir
############ create mpd ring, DO NOT modify this section unless you really know ####
nproc=0
for proc in $LSB_HOSTS ; do
echo $proc >> mpd.procs
nproc=`expr $nproc + 1`
done
echo $LSB_HOSTS
echo $nproc
`sort -u mpd.procs > mpd.nodes`
nhosts=`less mpd.nodes | wc -l`
mpdboot -n $nhosts -v -f mpd.nodes
############ ONLY change the myexe to your own application and ncessary arguments #####
mpiexec -machinefile mpd.procs -np $nproc ./myexe
############ exit the mpd ring and clean off the nodes ###################
mpdallexit
mpdcleanup
`rm mpd.nodes`
`rm mpd.procs`
Step 5: Submit your job:
[testy@n137 ~]$ bsub < mpich2.lsf
You can always check it by typing "bjobs". If your job is dispatched, it will show as the following. The job id is 14894.
[testy@n137 TestMPICH2]$ bsub < mpich2.lsf
Job <496886> is submitted to queue
[testy@n137 TestMPICH2]$ bjobs -l
Job <496886>, Job Name , User , Project , Mail , Status , Queue , Command <#B
SUB -L /bin/bash; # the name of your job on the queue syst
em;#BSUB -J mpich2_test; # the queue that you will use, th
e example here use the queue called "normal";# please use
bqueus command to check the available queues;#BSUB -q norm
al; # the system output and error message output, %J will
show as your jobID;#BSUB -o %J.out;#BSUB -e %J.err; #the n
umber of processors that you will use;#BSUB -n 10; #when j
ob finish that you will get email notification;#BSUB -u yx
u11@partners.org;#BSU>
Wed Apr 8 19:17:47: Submitted from host , CWD <$HOME/TestMPICH2>, Output
File <%J.out>, Error File <%J.err>, Notify when job ends,
10 Processors Requested, Login Shell ;
Wed Apr 8 19:17:52: Started on 10 Hosts/Processors <4*n7> <3*n8> <3*n18>, Exec
ution Home , Execution CWD ;
Wed Apr 8 19:17:52: Resource usage collected.
MEM: 1 Mbytes; SWAP: 12 Mbytes; NTHREAD: 1
PGID: 7728; PIDs: 7728
Step 6: Check your output results:
When job is finishes, the system will generate output in *.out file.
[jxu@n137 TestMPICH2]$ vi 496886.out
n7 n7 n7 n7 n8 n8 n8 n18 n18 n18
running mpdallexit on n7
LAUNCHED mpd on n7 via
RUNNING: mpd on n7
LAUNCHED mpd on n18 via n7
LAUNCHED mpd on n8 via n7
RUNNING: mpd on n8
RUNNING: mpd on n18
Hello, 1 and n7 say hi in a C++ statement
Hello, 0 and n7 say hi in a C++ statement
Hello, 2 and n7 say hi in a C++ statement
Hello, 4 and n8 say hi in a C++ statement
Hello, 9 and n18 say hi in a C++ statement
Hello, 3 and n7 say hi in a C++ statement
Hello, 8 and n18 say hi in a C++ statement
Hello, 7 and n18 say hi in a C++ statement
Hello, 5 and n8 say hi in a C++ statement
Hello, 6 and n8 say hi in a C++ statement
|