bioinformática y la era post-genómica coral del val muñoz dept. ciencias de la computación e...

Download Bioinformática y la era post-genómica Coral del Val Muñoz Dept. Ciencias de la Computación e Inteligencia Artificial, Universidad de Granada Dept. Molecular

If you can't read please download the document

Upload: jerry-wasden

Post on 14-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Bioinformtica y la era post-genmica Coral del Val Muoz Dept. Ciencias de la Computacin e Inteligencia Artificial, Universidad de Granada Dept. Molecular Biophysics, German Cancer Research Center Heidelberg, Alemania Dept. Molecular Microbiology HHMI, Washington University, St. Louis USA Slide 2 Volvemos al principioEl dogma central ADN ARN Protena Trascripcin Traduccin Replicacion Slide 3 Eucariotas: tienen una membrana nuclear y orgnulos (plantas, animales, hongos,) Biologa Molecular: ProcariotasVsEucariotas Procariotas: no tienen una membrana Que separe ncleo y orgnulos (bacteria) NO todos los organismos unicelulares son procariotas (levadura) BIOS Scientific Publishers Ltd, 1999 Slide 4 ATGCCAGGCCCCCCACCAGCCACGTTGGGGCAGCCCCCACAGCTCCCGGCCTTCGGGCCAAGGTGTCGGGGTGCGTCTCCTGGCCCATC AATACAGATTACATATTTATATCAATCGCGGGCTCTGAGGGCGCCCTCGGAGAGCGGCCCCGCGCCTACGAAACCAAACTGGGAGTGG TCGCGCGGAAACTCTGGCTCGGGATTGGCTGCGGGCGCCCGCCGCGGTGCGGGGGGATTGCTAATCGTATTCAGCATGTTTTGCACAAG AAATGTCAGCCAGAAAGGGCTATCTGCTCCCTTCGCCAAATTATCCCACAACAATGTCATGCTCGGAGAGCCCCGCCGCGAACTCTTTT TTGGTCGACTCGCTCATCAGCTCGGGCAGAGGCGAGGCAGGCGGCGGTGGTGGTGGCGCGGGGGGCGGCGGCGGTGGCGGTTACTACG CCCACGGCGGGGTCTACCTGCCGCCCGCCGCCGACCTGCCATACGGGCTGCAGAGCTGCGGGCTCTTCCCCACGCTGGGCGGCAAGCGC AATGAGGCAGCGTCGCCGGGCAGCGGTGGCGGTGGCGGGGGTCTAGGTCCCGGGGCGCACGGCTACGGGCCCTCGCCCATAGACCTGT GGCTAGACGCGCCCCGGTCTTGCCGGATGGAGCCGCCTGACGGGCCGCCGCCGCCGCCCCAGCAGCAGCCGCCGCCCCCGCCGCAACC ACCCCAGCCAGCGCCGCAGGCCACCTCGTGCTCTTTCGCGCAGAACATCAAAGAAGAGAGCTCCTACTGCCTCTACGACTCGGCGGACA AATGCCCCAAAGTCTCGGCCACCGCCGCCGAACTGGCTCCCTTCCCGCGGGGCCCGCCGCCCGACGGCTGCGCCCTGGGCACCTCCAGC GGGGTGCCAGTGCCTGGCTACTTCCGCCTTTCTCAGGCCTACGGCACCGCCAAGGGCTATGGCAGCGGCGGCGGCGGCGCGCAGCAACT CGGGGCTGGCCCGTTCCCCGCGCAGCCCCCGGGGCGCGGTTTCGATCTCCCGCCCGCGCTAGCCTCCGGCTCGGCCGATGCGGCCCGGA AGGAGCGAGCCCTCGATTCGCCGCCGCCCCCCACGCTGGCTTGCGGCAGCGGCGGGGGCTCGCAGGGCGACGAGGAGGCGCACGCGTC GTCCTCGGCCGCGGAGGAGCTCTCCCCGGCCCCTTCCGAGAGCAGCAAAGCCTCGCCGGAGAAGGATTCCCTGGGTAAGCAGGGCTGC AGAGGGCTGCAGTCAGGCGGGCAGACAGGCAGACACAAGGAGGAGAAGGATCAGAAAACTAGGAGCCCGCGCAGCAGCCGGCCGGC CTTGGCCCAAGCTGCAGGCAGGCTGACCTTGTGAACTTGCTTTTTAATATTTGGGCGTGGGGGCGCAGTAAAATTCATGTCCGGCTTAG CGCCCCACAGCAAGACGTCCTCGGCGCTGGCCTCAGCTCCCCCTGACTAGGGACGAGGACACCAGCGAGCAGGCCCCCTCCTGTGCGCT CTTTCCTGTGGCCGGGAGGACCCAGAGCCCTGGTCCCTGCCCAGCCTGCGCGGCGCGGCCCACGCGGGGGGAGGGGGAGGGAGGGAAA GTAGCTCGCCCGCAGATAGCGCGGATGTTTGTAAGGCATCCAAAATAAGCAGCCGCCAGCGCCAATAAATAAGCCCATTAACCGGCGA AGTTCGAGTGTACGATCCCCCATGCTTTTTTCAAAGTTGCTGAGGGGCGGGAATCTTCGTGGCGGGAAGAAGAAAAGGCAAATCCGGC CTGGAAGCGGGGGGCCCTGAGCTGAGAGCCAGAGAAGGGCCATTTCCCTTCCCCTGGACCTCGGAATCGCCCAGCTATGTATCCTGGCT CCTGGAGAAACTTGAGGGAGGGCCCTTGACCCCCGAATCGGTTTTTCCTGCCTTCCCCATTGGACCAATGATGCCCTTCTTTCTCCCCTT ATCGAGTCTTGGGCAATCAGGGCCCTGGGGTGAGACAGCCAAGCTGCCTGGCCCATCTTCCAAGTAAGCACCCCGCGCTCCTAGCCTGG GGGCTACAGGAAATGCTTGTCTGCCATATGGCAAGAGGCAAAGAAAAGCGTTAAGTTCAAGATGTACAGCCTGCCCTCCCAGGCCTTTC CTTCTGCAAGCATCTACGGCTTAGCGCTAAAACAGGTGTTTGGAAAAGTGGGGGAAATGTAAATTGGAAGGGTCATGTAGATTGAAGG CCCACTCAATTTTTGTCATGACTTATGGAGGAACTGCTTGCTCTCAGCAAGCCAAAAACGGGGGCACGACTCTCTTCTCTGTGACTTGGG ACATCTCTCTTATGGGAGAAACGGAGGCAATTCACCCCCGCGGGCAGCCCGTGTGGCCTCGACTTAATCATCCCCTCTTTATTCTCTTAC ATGCCAGGCAATTCCAAAGGTGAAAACGCAGCCAACTGGCTCACGGCAAAGAGTGGTCGGAAGAAGCGCTGCCCCTACACGAAGCAC CAGACACTGGAGCTGGAGAAGGAGTTTCTGTTCAATATGTACCTTACTCGAGAGCGGCGCCTAGAGATTAGCCGCAGCGTCCACCTCAC GGACAGACAAGTGAAAATCTGGTTTCAGAACCGCAGGATGAAACTGAAGAAAATGAATCGAGAAAACCGGATCCGGGAGCTCACAGC CAACTTTAATTTTTCCTGATGAATCTCCAGGCGAC Cmo y dnde encontramos los genes? Slide 5 Alta densidad de genes y estructura sencilla Genes cortos con poca informacin Genes solapados Gen procariota (bacterias) Slide 6 Ejemplo de promotor procariota Pribnow box located at 10 (6-7bp) Promoter sequence located at -35 (6bp) Slide 7 Eukaryotic Gene Organisation Transcription: core promoter:loosely conserved initiator region (Inr) around TSS ~ - 25: TATA-box proximal promoter:~ - 75: CAT (CCAAT) ~ - 170: GC-box enhancer / silencer: upstream or downstream to promoter Core PromoterProximal Promoter TSS TATA GC Inr CAAT Promoter core proximal Translation: 5 Kozak sequence: GCCACCATG 3 polyadenylation site: AATAAA Slide 8 Baja densidad de genes y estructura compleja Corte y unin alternativo (Splicing alternativo) Pseudo-genes Gen eucariota (con ncleo) preRNA: 3'UTR5'UTR Exon 1Exon 2Exon 3 Intron 1Intron 2 AAAAAAAAA mRNA: Splicing / Polyadenylation polyA ATG TAA active protein: Traduccin CPLTW..............GFL CPLTW..............PJC Splice variant Modificacin post- transduccional CPLTW..............LAC ATGTAA Slide 9 Genes Procariotas vs Eucariotas Slide 10 Espliceosoma Slide 11 Por homologa (similaridad en la secuencia) Requiere una secuencia similar no muy distante Ab initio Requiere: Informacin de su composicin Informacin de seales Cmo reconocer un gen? Slide 12 Alineamientos por pares - Globales: Needleman & Wunsch - Locales: Smith & Watterman Identificacin por homologa Alineamientos mltiples: - Clustalw - T-Coffee - Di-Align - DSC Bsquedas en Bases de Datos - BLAST- Phi-BLAST - FASTA - Megablast - Psi-BLAST - BLAT - WU-BLAST Slide 13 Problemas: Genes sin homlogos en las bases de datos no son detectados Se requiere de homlogos cercanos para deducir la estructura del gen Mtodos por homologa Slide 14 Integra la deteccin de seales con las estadsticas de codificacin stas se deducen de un conjunto de entrenamiento Detectar pequeos motivos de ADN (promotores, start/stop codons, splice sites, etc.) Un sistema de puntuacin scoring puede ser utilizado para evaluar estas predicciones Mtodos ab Initio Slide 15 El problema de la deteccin de seales Las seales de ADN tienen un bajo nivel de informacin Son altamente inespecficas y degeneradas Dificultad para distinguir un TP de un FP Como mejorar la deteccin de seales Tomar en cuenta el contexto (ej. un sitio aceptor debe encontrarse entre un intrn y un exn) Combinar las estadsticas de codificacin Deteccin de seales Gribskov Profiles PSWM Modelos Ocutos de Markov Redes Neuronales Slide 16 Secuencia de ADN Probabilidad de laregin codificante Mtodos ab Initio Gribskov Profiles PSWM Modelos Ocutos de Markov Redes Neuronales Bsqueda de seales y regiones codificantes Slide 17 Computational annotation tools Gene finding Repeat finding EST/cDNA alignment Homology searching BLAST, FASTA, HMM-based methods, etc. Protein family searching PFAM, Prosite, etc. Slide 18 Which analyses need to be run? Similarity searches BLAST (Altschul et al., 1990) BLASTN (nucleotide databases) BLASTX (amino acid databases) TBLASTX (amino acid databases, six-frame translation) sim4 (Miller et al., 1998) Sequence alignment program for finding near-perfect matches between nucleotide sequences containing introns Gene predictors Genefinder (Green, unpublished) GenScan (Burge and Karlin, 1997) Genie (Reese et al., 1997) Other analyses tRNAscanSE (Lowe and Eddy, 1996) Slide 19 Which analyses need to be run and how? mRNAs ORFFinder(Frise, unpublished) Protein translations HMMPFAM 2.1 (Eddy 1998) against PFAM (v 2.1.1 Sonnhammer et al. 1997, Bateman et al. 1999) Ppsearch (Fuchs 1994) against ProSite (release 15.0) filtered with EMOTIF ( Nevill-Manning et al. 1998) Psort II (Horton and Nakai 1997) ClustalW (Higgins et al. 1996) Slide 20 Raw sequence: Adh.fa GAATTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATAC TTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTG TTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGG GCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAA ACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAAAGTAACCTGCGGG AATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGAGCCCAAATGAGCGATAGATAGATAGATCGTGC GGCGATCTCGTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGGTTCTGGCTTACGATCGGGTTT TGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGT TTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCC CTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCAC TGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGG AATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGA GATAGGAGAAGAAAACAGAACAACAGCAAATACTGAGCCCAAATGAGCGATAGATAGATAGATCGTGCGGCGATCTCGTACTGGTAACT GGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGGTTCTGGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCC AGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAA TTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTA CAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCAT TGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCC GTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGT AAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGA GACCCTTCTCGCTTAGCATCGAAAAGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAA TTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTC TCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAG CTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGA CCTGATCCTGTTTGACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAG AACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCAT GTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCA ACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGA CTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTAT CAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGA TAGGAGAAGAAAACAGAACAACAGCAAATACTGTGCGGCGATCTCGTACTGGACGGAAATGTCAGGAGATAGGAGAAGAAAA Slide 21 Promotores Elementos del nucleo promotor Caja TATA Initiador (Inr) Elementos posteriores al promotor (DPE) Factores de transcripcin (TF) caja CAAT caja GC sitios SP-1 caja GAGA Sitios activadores de la transcripcin Secuencias reguladoras Slide 22 Espliceosoma Slide 23 Gracias por su atencin http://www.m4m.es