Partners High Performance Computing Cluster
Partners applications via Mac/PC
Clinical & research applications
DFCI bioinformatics computer
PHS Research Computing cluster
Bioinformatics news
Data Storage & Backup
Sharing files & collaboration
HIPAA, ePHI and research (internal)
RPDR
HPCGG
Biomedical Engineering Model Shop
Harvard's GForge Implementation
Institutional research distribution lists

 

pHPC account registration pHPC user guide pHPC services pHPC web protal



Run MMTx on clusters


Intro: MMTx is a complex text mining software distributed from NLM. MMTx is an effort to make the MetaMap program available to biomedical researchers in a generic, configurable environment. MetaMap maps arbitrary text to concepts in the UMLS Metathesaurus; or, equivalently, it discovers Metathesaurus concepts in text. With this software, text is processed through a series of modules. First it is parsed into components including sentences, paragraphs, phrases, lexical elements and tokens. Variants are generated from the resulting phrases. Candidate concepts from the UMLS Metathesaurus are retrieved and evaluated against the phrases. The best of the candidates are organized into a final mapping in such a way as to best cover the text.

for more information regarding MMTx, please visit NLM website for MMTx.

MMTx documentation is also avaiable in HPC web portal MMTx user guide

Step 1:

a. Please read how to setup your working environment .


b. Load application environment

load mmtx environment : Include "module load mmtx/default" in the module-loading blocks in your .bashrc file.
load jdk 1.5 environment : Include "module load jdk/1.5/default" in the module-loading blocks in your .bashrc file.

c. Check your modules, Type ". .bashrc" in your terminal, and then type "module list or type "which MMTx" in your terminal again, you shall be able to see:

[testy@n137 ~]$ . .bashrc
[testy@n137 ~]$ module list
Currently Loaded Modulefiles:
   1) mmtx/default 2) jdk/1.5/default
[testy@n137 ~]$ which MMTx
/pub1/nlp/mmtx/bin/MMTx
[testy@n137 ~]$ which java
/source/jdk1.5.0_01/bin/java
If you see the above output on terminal, it means your environment has been setup correctly.

Note: The current release of MMTx use several third-party components that have problems in clustered NFS system, please follow the instructions here in order to run MMTx successfully. (The trick is to copy the entire data file into the /tmp of each computing node before running the index, thus it can avoid the NFS problem)

Step 2: Copy the following mmtx job script into your own home directory, create a job script file called mmtx.lsf
#!/bin/bash

#BSUB -L /bin/bash

# here define the job environment

#BSUB -J mmtx
#BSUB -q long
#BSUB -o %J.%I.out
#BSUB -e %J.%I.err

# how many computing core for each job element (use one care for each job)
#BSUB -n 1
#BSUB -u YourEmail@partners.org
#BSUB -N

# because this NLP  job performane better with large memery, you might need to use entire
# 8GB memory of the node, this commnd let each of the job use the entirenode

#BSUB -x

# getting to the working dir
workdir=/shr/home/$USER/mmtx
cd $workdir

# copy all the datafile to local /tmp
TMP=/tmp/nlp_data
 if [ -d $TMP ]; then
 rm -rf $TMP
 fi
mkdir $TMP
scp -r /pub1/nlp/nls/mmtx/data_backup $TMP/data


#submit the single MMTx analysis
MMTx --fileName=inputexample.txt --outputFileName=myoutput.mmtx




Step 3: submit your MMTx job
[testy@n137 ~]$ bsub < mmtx.lsf