How to run Blast on clusters
Notes:
1. We are updating our database frequently from the NCBI ftp website ftp://ftp.ncbi.nih.gov/ncbi/blast/db , so, you DO NOT have to download the database into your own home directory
2. The preformatted/fragmented databae is located at /pub1/ftp-ncbi-blast/blast/db in cluster, the FASTA database is located at /pub1/ftp-ncbi-blast/blast/db/FASTA
3. You do not have to install NCBI tool! The latest ncbi tool is located at /source/ncbi in cluster.
4. In the following steps, step 1 and 2 are only needed when you use the NCBI tools in hpres cluster for the first time.
Step 1: Please read "how to setup your working environment" and do the following:
a. Include the sentence of "module load ncbi/default" in the module-loading blocks in your .bashrc file
b. Type ". .bashrc" in your terminal, and then type "module list or type "which blastall" in your terminal again, you shall be able to see:
[testy@n137 ~]$ . .bashrc
[testy@n137 ~]$ module list
Currently Loaded Modulefiles:
1) ncbi/default
[testy@n137 ~]$ which blastall
/source/ncbi/bin/blastall
If you see the above output on terminal, it means your environment has been setup correctly.
Step 2: Setup your NCBI data path.
Create the ".ncbirc" file in your home directory
[NCBI]
Data=/source/ncbi/data
Step 3: Begin your blast operation
Example 1: You need to compare your sequence against the entire nt databases.
a: Create folder called "blast_example" in your home directory, then create the following blast.lsf file inside the folder
#!/bin/bash
# enable your environment, which will use .bashrc configuration in your home directory
#BSUB -L /bin/bash
# the name of your job showing on the queue system
#BSUB -J blasttest
# the queue that you will use, the example here use the queue called "normal"
# please use bqueus command to check the available queues
#BSUB -q normal
# the system output and error message output, %J will show as your jobID
#BSUB -o %J.out
#BSUB -e %J.err
#the CPU number that you will collect (Attention: each node has 2 CPU)
#BSUB -n 1
#when job finish that you will get email notification
#BSUB -u yourID@partners.org
#BSUB -N
#enter your working directory, change to your own dir
cd /shr/home/$USER/blast_example
#Finally, Start the blast program
blastall -p blastn -d /pub1/ftp-ncbi-blast/blast/db/nt -i example.fas -o blast_results.txt
example.fas can be downloaded as a test
b. Submit your blast job.
[testy@n137 ~] bsub < blast.lsf
In this example, it will take about 4 min to compare against the lastest entire nucleotide database, and you shall receive a email when the job is finished.
Example 2: You have several sequence files and need to compare each of them against the entire nt databases. In general, you need to run massive jobs. We suggest you use Job Array
a: Create folder called "blast_example" in your home directory, then create the following blastArray.lsf file inside the folder (Note: I assume you have 6 different sequence files, the names are example1.fas, example2.fas, example3.fas ...etc)
#!/bin/bash
# enable your environment, which will use .bashrc configuration in your home directory
#BSUB -L /bin/bash
# the name of your job showing on the queue system, here you have 6 blast jobs
#BSUB -J blasttest[1-6]
# the queue that you will use, the example here use the queue called "normal"
# please use bqueus command to check the available queues
#BSUB -q normal
# the system output and error message output, %J will show as your jobID
#BSUB -o "/tmp/%J.%I.out"
#BSUB -e "/tmp/%J.%I.err"
#the CPU number that you will collect (Attention: each node has 2 CPU)
#BSUB -n 1
#when job finish that you will get email notification
#BSUB -u yourID@partners.org
#BSUB -N
#enter your working directory, change to your own dir
cd /shr/home/$USER/blast_example
#Finally, Start the blast program
blastall -p blastn -d /pub1/ftp-ncbi-blast/blast/db/nt -i example$LSB_JOBINDEX.fas -o results$LSB_JOBINDEX.txt
example.fas can be downloaded as a test
b. Submit your blast job.
[testy@n137 ~] bsub < blastArray.lsf
In this example, it will take about 4 min to compare against the lastest entire nucleotide database, and you shall receive a email when the job is finished.
|