How to run RMPI on clusters
Notes:
1. lammpi instead of hpmpi needs to be used.
2. R2.8 needs to be used
3. RLIB needs to be correctly configured.
4. The job script needs to pass the node information to the lammpi, and also take care of the lam daemon among all the allocated nodes.
Step 1: Please read "how to setup your working environment" for the bashrc configuration:
a. Basically, in module-loading section, you shall contain the following block in the .bashrc file.
# modules definitions
if [ -n "$MODULESHOME" ]; then
module use /shr/modules
module load mpi/lammpi/default
module load r/2.8/default
module load gcc/4.2/default
fi
export LAMRSH="ssh -x"
The last line is to convert the lammpi rsh communication to ssh.
b. Type ". .bashrc" in your terminal, and then type "module list or type "which R" in your terminal again, you shall be able to see:
[testy@n137 ~]$ . .bashrc
[testy@n137 ~]$ module list
Currently Loaded Modulefiles:
1) r/2.8/default 2) mpi/lammpi/default 3) gcc/4.2/default
[testy@n137 ~]$ which R
/source/R_2.8/bin/R
[testy@n137 ~]$ which mpicc
/source/lam_7.1.4/bin/mpicc
[testy@n137 ~]$ echo $R_LIBS
/source/R_2.8/lib64/R/library
If you see the above output on terminal, it means your environment has been setup correctly for Rmpi.
Step 2: Create a folder called "TestR" in your home directory, then download the Sample R mpi script and save it as "example_rmpi.R" in your TestR directory.
Step 3: Create the job script. In the job script, we need to make sure the number of nodes that will be allocated is the same as specified in your Rmpi script (in our example, it is 10)
We also need to start the lamboot before execution of R, and then shut down the lam daemon on each of the allocated nodes after the execution of R
Here is the sample job script example_rmpi.lsf
# enable your environment, which will use .bashrc configuration in your home directory
#BSUB -L /bin/bash
# the name of your job showing on the queue system
#BSUB -J Rmpitest
# the queue that you will use, the example here use the queue called "normal"
# please use bqueus command to check the available queues
#BSUB -q normal
# the system output and error message output, %J will show as your jobID
#BSUB -o %J.out
#BSUB -e %J.err
#the computing core number that you will collect (Attention: the number 10 is the same as in the R script)
#BSUB -n 10
#when job finish that you will get email notification
#BSUB -u testy@partners.org
#BSUB -N
#enter your working directory, change to your own dir
work_dir="/shr/home/$USER/TestR"
echo $work_dir
cd $work_dir
#### this section is go create a host file for lammpi to use #####
for host in $LSB_HOSTS ; do
echo $host >> $work_dir/lamhosts
done
lamboot -v $work_dir/lamhosts
### the following command need to be changed when you use other R script #####
R --no-save example_rmpi.out
#### this following section is entering each of the allocated node and kill the lam daemon####
for host in $LSB_HOSTS ; do
ssh $host ps -ef | grep "lam_7.1.4" | awk '{ print $2}' > /tmp/lampid.$USER
ssh $host kill `cat /tmp/lampid.$USER`
done
rm $work_dir/lamhosts
Step 4: Submit your Rmpi job
[testy@n137 ~] bsub < example_rmpi.lsf
In this example, it will run on 10 node and geneate a pdf format plot.
|