Partners Research Computing Cluster Resources
Partners applications via Mac/PC
Clinical & research applications
DFCI bioinformatics computer
PHS Research Computing cluster
Bioinformatics news
Data Storage & Backup
Sharing files & collaboration
HIPAA, ePHI and research (internal)
RPDR
HPCGG
Biomedical Engineering Model Shop
Harvard's GForge Implementation
Institutional research distribution lists

 

pHPC account registration pHPC user guide pHPC services pHPC web protal



How to run BLAT on clusters




Notes:

Human Genome
1. Data for each chromosome (from NCBI) is located at /pub1/ftp-ncbi-blast/genomes/H_sapiens directory
2. BLAT is installed at /shr/home/source/blat
3. The 2bit formatted entire human genome data (in order to run UCSC blat) is located at /pub1/ucsc-genome/hg18.2bit
4. To speed up your search, the blat server for human chromosomes has been started at node n1 with port 17779


Mouse Genome
1. Data for each chromosome (from NCBI) is located at /pub1/ftp-ncbi-blast/genomes/M_musculus directory
2. BLAT is installed at /shr/home/source/blat
3. The 2bit formatted entire mouse genome data (in order to run UCSC blat) is located at /pub1/ucsc-genome/mm8.2bit
4. To speed up your search, the blat server for mouse chromosomes has been started at node n1 with port 17780


A much more powerful web portal will be available in Feb, 2008, email notification will be broadcasted.

Step 1: Please read "how to setup your working environment" and do the following:

a. Include the sentence of "module load blat/default" in the module-loading blocks in your .bashrc file
b. Type ". .bashrc" in your terminal, and then type "module list or type "which blat" in your terminal again, you shall see:

[testy@n137 ~]$ . .bashrc
[testy@n137 ~]$ module list
Currently Loaded Modulefiles:
   1) blat/default
[testy@n137 ~]$ which blat
/source/blat/bin/x86_64-redhat-linux-gnu/blat
If you see the above output on terminal, it means your environment has been setup correctly.

Step 2: Begin your blat operation by gfClient command

Example 1: You need to compare your sequence against the entire mouse genome.

a: Create folder called "blat_example" in your home directory, then create the following blat.lsf file inside the folder
#!/bin/bash

# enable your environment, which will use .bashrc configuration in your home directory
#BSUB -L /bin/bash

# the name of your job showing on the queue system
#BSUB -J blattest

# the queue that you will use, the example here use the queue called "normal"
# please use bqueus command to check the available queues
#BSUB -q normal


# the system output and error message output, %J will show as your jobID
#BSUB -o %J.out
#BSUB -e %J.err

#the CPU number that you will collect
#BSUB -n 1

#when job finish that you will get email notification
#BSUB -u yourID@partners.org
#BSUB -N


#enter your working directory, change to your own dir
cd /shr/home/$USER/blat_example


#Finally, Start the blat, this is to search against mouse, change the port number to 17779 to search again human

gfClient n1 17780  / testmouse.fas  outputtest.psl -out=blast -minScore=30 -minIdentity=90 -q=dna
testmouse.fas can be downloaded as a test

b. Submit your blast job.

 [testy@n137 ~] bsub < blat.lsf
 
In this example, it will take about a second to get your results.


If you want to know more about the options or arguments of using BLAT, please refer to http://genome.ucsc.edu

If you have many sequences/jobs to run blat, please refer to the page of how to run multiple blast with job array (example B)