use of tblastx to find regions of homology among multiple ... · why tblastx? • sgp2 (parra et...
TRANSCRIPT
![Page 1: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/1.jpg)
“V Jornada de Usuarios de la RES”
Use of TBLASTX to find regions of homology
among multiple large-size mammalian genomes
Francisco Câmara Ferreira
Bioinformatics & Genomics Unit (Roderic Guigó,CRG)
![Page 2: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/2.jpg)
Why TBLASTX?
• SGP2 (Parra et al. 2003)
• ab initio Geneid + sequence similarity search algorithm (TBLASTX)
SGP2 is a comparative gene prediction tool: QUERY sequences from a genome (i.e H.sapiens ) is compared against a collection of sequences from a second TARGET (REFERENCE;i.e. M.musculus) genome (TBLASTX) and the results of the comparison generate “HSPs” are used to modify the scores of the exons produced by the underlying ab initio gene prediction tool GENEID
WHAT IS SGP2??
![Page 3: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/3.jpg)
Geneid: • Geneid is a protein-coding gene prediction tool: can be optimized for prediction in different species. • Geneid follows a hierarchical structure: signal -> exon -> gene • Exon score: Score of exon-defining signals + protein-coding potential • Dynamic programming algorithm: maximize score of assembled exons -> assembled gene
SGP2
![Page 4: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/4.jpg)
TBLASTX as a gene-prediction tool
Coding sequences evolve slowly
compared to surrounding DNA
“Proper” evolutionary distance?
TBLASTX CHR5_1_5000000 CHR1_mm -
hspmax=0 -gspmax=0 W=5 E=0.01
E2=0.01 -nogap -filter=xnu+seg S2=80
-matrix=blosum62 -altscore="* any -
999" -altscore="any * -999”
TBLASTX is computationally expensive
“flavour”of BLAST
6-frame translation of query/target
![Page 5: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/5.jpg)
Why marenostrum?
• H.sapiens vs. M.musculus
•7-10 days on a 20-25 CPU grid
•12-13 hours on 256 CPUs
• Multiple genomes compared
concurrently
¡PARALLELIZATION!
LARGE SIZE OF MAMMALIAN GENOMES (i.e. Human & Cow ~3 Gbases, Mouse 2.5 Gb…)
![Page 6: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/6.jpg)
Strategy for MN TBLASTX:
• Fragment “query” genome:
• H.sapiens genome: >650 5-Mbase fragments
• Reference genome divided into 10-
Mbase fragments (internally)
•22 chromosomes for M.musculus
![Page 7: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/7.jpg)
TBLASTX MN PIPELINE: David García Cortés/Xavier Pastor
![Page 8: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/8.jpg)
Significant publications (MN-derived)
![Page 9: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/9.jpg)
SGP2 importance as an annotation tool
component of the comparative gene prediction pipelines to annotate:
• Human (MN)
• Mouse • Rat • Cow (MN)
• Chicken • Paramecium
• Also several species of insects and plants (Melon/Bean)
![Page 10: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/10.jpg)
UCSC Genome browser: http://www.genome.ucsc.edu
![Page 11: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/11.jpg)
GBL Web server: http://genome.crg.es/genepredictions/
![Page 12: Use of TBLASTX to find regions of homology among multiple ... · Why TBLASTX? • SGP2 (Parra et al. 2003)• ab initio Geneid + sequence similarity search algorithm (TBLASTX) SGP2](https://reader035.vdocuments.net/reader035/viewer/2022071006/5fc375aa75b6cf3d1052d914/html5/thumbnails/12.jpg)
Acknowledgments
• BSC-CNS/U. de Cantabria
•Xavier Pastor
•David García Cortés
• Genis Parra/Josep Abril/Roderic
Guigo (developers of SGP2)