Download - Hvordan få oversikten?
![Page 1: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/1.jpg)
Hvordan få oversikten?
![Page 2: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/2.jpg)
Annotering av sekvensen
![Page 3: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/3.jpg)
Kromosom 16: et av de minste
![Page 4: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/4.jpg)
Finding genesWhat are we looking for?Proteins encoded in mRNANon-coding RNA (ncRNA) genes
Where are we looking?ProkaryotesEukaryotes (often introns)
![Page 5: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/5.jpg)
Classes of RNAfRNA: Functional RNA — essentially synonymous with non-
coding RNAmRNA: Messenger RNA — coding for proteinsmiRNA: MicroRNA — putative translational regulatory gene familyncRNA: Non-coding RNA — all RNAs other than mRNArRNA: Ribosomal RNAsiRNA: Small interfering RNA — active molecules in RNA
interferencesnRNA: Small nuclear RNA — includes spliceosomal RNAssnmRNA: Small non-mRNA — essentially
synonymous with small ncRNAssnoRNA:Small nucleolar RNA — usually involved in rRNA
modificationstRNA: Small temporal RNA — e.g. lin-4 and let-7 in C. eleganstRNA: Transfer RNA
Source: Eddy SR (2001) Nature Reviews in Genetics
![Page 6: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/6.jpg)
Informasjon i sekvensen som kan brukes for å finne gener
”Signaler” i sekvensen: Spleisesignaler, promotere, termineringssignaler, polyA-signaler, CpG-øyer (Gene search by signal)
”Innholdet” i sekvensen: ORFs, kodonstatistikk osv.(Gene search by content)
Likhet med kjente gener (Gene search by similarity)
![Page 7: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/7.jpg)
Fra gen til protein: så lett for cellen, så vanskelig for oss
![Page 8: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/8.jpg)
![Page 9: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/9.jpg)
Simple protein findingExamine all 6 possible reading frames 3 frames on forward strand 3 frame on reverse strand
Plot positions of Initiation (start) (Methionine) codon: ATG Termination (stop) codons: TAA, TAG, TGA
Look for long stretches without stop codons after a start codon
Source: http://cwx.prenhall.com/horton/medialib/media_portfolio/
![Page 10: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/10.jpg)
Standard Genetic Code
The standard genetic code is used in most organisms
Another code is use din mitochondria and some organismsOverview of gentic codes in various organisms:http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c
![Page 11: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/11.jpg)
Start and stop codon distribution
Distribution of start codons (short lines) and stop codons (long lines) in the six reading frames along a genomic sequence (lacZ operon in E.coli)There is an open reading frame (lacZ) in frame +3 from position 1284 to 4355.Created by DNA STRIDER.
![Page 12: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/12.jpg)
Prokaryotic promotor regions
Source: http://cwx.prenhall.com/horton/medialib/media_portfolio/
![Page 13: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/13.jpg)
Transcription termination
![Page 14: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/14.jpg)
Shine-Dalgarno (SD) sequenceThe 16S rRNA ribosomal protein binding site
![Page 15: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/15.jpg)
![Page 16: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/16.jpg)
Transcription and translationGenomic DNA
Primary transcript
Spliced mRNA
Protein
Promotor
Cap
5’UTR 3’UTR
Exon1 Exon2 Exon3
StartAUG
StopTAA/TAG/TGA
GU…AG
M
GU…AG
Intron1 Intron2
Terminator
AAAA…
![Page 17: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/17.jpg)
Gene, exon and intron number for whole ExInt and subdivisions
Gene number Exon number Intron number
Whole ExInt 94 615 518 169 525 870
Non-redundant ExInt 15 271 113 457 128 065
Rattus norvegicus 835 4889 7191
Homo sapiens 8287 60 499 43 127
Mus musculus 3044 18 920 15 407
Drosophila melanogaster 15 220 64 271 89 969
Caenorhabditis elegans 18 924 121 708 108 803
Arabidopsis thaliana 25 216 158 629 127 386
Saccharomyces cerevisiae 589 1695 1438
![Page 18: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/18.jpg)
Fordeling av eksonstørrelser i ExInt
![Page 19: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/19.jpg)
Fordeling av intronstørrelser i ExInt
![Page 20: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/20.jpg)
0 1 2
All ExInt257 713 (49%)
147 625 (28%)
120 532 (23%)
Non-redundant60 979 (48%)
35 438 (28%)
31 608 (24%)
Rattus norvegicus 2842 (39%) 2365 (33%) 1384 (28%)
Mus musculus 6703 (44%) 5921 (38%) 2783 (18%)
Caenorhabditis elegans51 251 (47%)
28 553 (26%)
28 999 (27%)
Homo sapiens19 102 (44%)
15 423 (36%) 8602 (20%)
Arabidopsis thaliana71 958 (56%)
28 178 (22%)
27 250 (22%)
Drosophila melanogaster
38 101 (42%)
28 896 (32%)
22 972 (26%)
Saccharomyces cerevisiae 641 (45%) 428 (30%) 369 (25%)
Intron phase 0 1 2
Intron-fase: ekson/intron-overganger mellom kodoner eller i dem
![Page 21: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/21.jpg)
Hvordan finne spleisesignaler og eksoner?
Vektsmatriser: Hvordan er fordelingen av nukleotider rundt spleiseseter?”Weight array matrices” hvor det tas hensyn til nabonukleotider”Maximal dependence decomposition”: Korrelasjoner med ikke-nabonukleotiderSkjulte Markov-modellerNeurale nettverk: En mønstergjenkjenningsteknikk som ”lærer”
![Page 22: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/22.jpg)
Slik lages en vektmatrise
![Page 23: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/23.jpg)
Og slik brukes den
![Page 24: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/24.jpg)
Konsensus-sekvenser for ekson/intron-overganger
![Page 25: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/25.jpg)
Forskjellige klasser av eksoner som må oppdages på forskjellige måter
Innledende eksoner: Begynner med et startkodon og slutter med et spleisedonorsete
Interne eksoner: Begynner med et akseptorsete og slutter med et donorsete
Terminale eksoner: Begynner med et akseptorsete og slutter med et stoppkodon
Enkelteksongener: Begynner med et startkodon og slutter med et stoppkodon
![Page 26: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/26.jpg)
![Page 27: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/27.jpg)
Integrert genfinning: Hva følger etter hva?
![Page 28: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/28.jpg)
Neuronnettverk: et eksempel
The Grail II system for finding exons in eukaryotic genes (Uberbacher and Mural 1991; Uberbacher et al. 1996). The method uses a neural network to identify patterns characteristic of coding sequences. The network includes three layers, an input layer for the data with the data coming from a candidate exon sequence, and a hidden layer for discerning relationships among the input data. An output layer comprising one neuron indicates whether or not the region is likely to be an exon. Each neuron receives information from a set in the layer above, some
with a positive value and others with a negative value; sums these values; and then converts them to an output of approximately 0 or 1.
The system is trained using a set of known coding sequences, and as each sequence is utilized, the strengths and types of connections (positive or negative) between the neurons are adjusted, decreasing or increasing the signal to the next neuron in a manner that produces the correct output. The major difference between neural networks for exon and secondary structure prediction is that the exon prediction uses sequence pattern information as input whereas secondary structure prediction uses a window of amino acid sequence in the protein. In Grail II, a candidate sequence is evaluated by calculating pattern frequencies in the sequence and applying these values to the neural network. If the output is close to a value of 1, then the region is predicted to be an exon.
![Page 29: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/29.jpg)
Sekvens”innhold”: Forskjeller mellom den ekte leserammen og de to andre
Ramme 1 er den ekte, og inneholder kodoner som koder for et protein med gjennomsnittlig aminosyresammensetning
![Page 30: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/30.jpg)
Kodonbruk i de tre leserammene
![Page 31: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/31.jpg)
Basefordeling på de tre kodonposisjonene
![Page 32: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/32.jpg)
Å skille mellom kodende og ikkekodende sekvenser ut fra basesammensetningen av de
tre kodonposisjonene
Antall ganger en base forekommer i hver av de tre kodonposisjonene i vinduet = Nij.
Forventet verdi for hver base i hver av de tre kodonposisjoneneEij=(Ni1+Ni2+Ni3)/3
Divergensen D=Σ|Eij-Nij|Vindu: 67 kodonerEMBL-databasen 1984
![Page 33: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/33.jpg)
Codon usage in the E.coli genome
Source: http://www.kazusa.or.jp/codon/
Escherichia coli [gbbct]: 11865 CDS's (3662594 codons)
fields: [triplet] [amino acid] [fraction] [frequency: per thousand] ([number])
UUU F 0.58 22.1 ( 80995) UCU S 0.17 10.4 ( 38027) UAU Y 0.59 17.5 ( 63937) UGU C 0.46 5.2 ( 19138) UUC F 0.42 16.0 ( 58774) UCC S 0.15 9.1 ( 33430) UAC Y 0.41 12.2 ( 44631) UGC C 0.54 6.1 ( 22188) UUA L 0.14 14.3 ( 52382) UCA S 0.14 8.9 ( 32715) UAA * 0.61 2.0 ( 7356) UGA * 0.30 1.0 ( 3623) UUG L 0.13 13.0 ( 47500) UCG S 0.14 8.5 ( 31146) UAG * 0.08 0.3 ( 989) UGG W 1.00 13.9 ( 50991) CUU L 0.12 11.9 ( 43449) CCU P 0.18 7.5 ( 27340) CAU H 0.57 12.5 ( 45879) CGU R 0.36 20.0 ( 73197) CUC L 0.10 10.2 ( 37347) CCC P 0.13 5.4 ( 19666) CAC H 0.43 9.3 ( 34078) CGC R 0.36 19.7 ( 72212) CUA L 0.04 4.2 ( 15409) CCA P 0.20 8.6 ( 31534) CAA Q 0.34 14.6 ( 53394) CGA R 0.07 3.8 ( 13844) CUG L 0.47 48.4 (177210) CCG P 0.49 20.9 ( 76644) CAG Q 0.66 28.4 (104171) CGG R 0.11 5.9 ( 21552) AUU I 0.49 29.8 (109072) ACU T 0.19 10.3 ( 37842) AAU N 0.49 20.6 ( 75436) AGU S 0.16 9.9 ( 36097) AUC I 0.39 23.7 ( 86796) ACC T 0.40 22.0 ( 80547) AAC N 0.51 21.4 ( 78443) AGC S 0.24 15.2 ( 55551) AUA I 0.11 6.8 ( 24984) ACA T 0.17 9.3 ( 33910) AAA K 0.74 35.3 (129137) AGA R 0.07 3.6 ( 13152) AUG M 1.00 26.4 ( 96695) ACG T 0.25 13.7 ( 50269) AAG K 0.26 12.4 ( 45459) AGG R 0.04 2.1 ( 7607) GUU V 0.28 19.8 ( 72584) GCU A 0.18 17.1 ( 62479) GAU D 0.63 32.7 (119939) GGU G 0.35 25.5 ( 93325) GUC V 0.20 14.3 ( 52439) GCC A 0.26 24.2 ( 88721) GAC D 0.37 19.2 ( 70394) GGC G 0.37 27.1 ( 99390) GUA V 0.17 11.6 ( 42420) GCA A 0.23 21.2 ( 77547) GAA E 0.68 39.1 (143353) GGA G 0.13 9.5 ( 34799) GUG V 0.35 24.4 ( 89265) GCG A 0.33 30.1 (110308) GAG E 0.32 18.7 ( 68609) GGG G 0.15 11.3 ( 41277)
Coding GC 50.58% 1st letter GC 57.71% 2nd letter GC 40.68% 3rd letter GC 53.36% Genetic code 1: Standard
![Page 34: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/34.jpg)
Codon usage in the human genome
Source: http://www.kazusa.or.jp/codon/
Homo sapiens [gbpri]: 44580 CDS's (19894411 codons)
fields: [triplet] [amino acid] [fraction] [frequency: per thousand] ([number])
UUU F 0.45 16.9 (336562) UCU S 0.18 14.6 (291040) UAU Y 0.44 12.0 (239268) UGU C 0.45 9.9 (197293) UUC F 0.55 20.4 (406571) UCC S 0.22 17.4 (346943) UAC Y 0.56 15.6 (310695) UGC C 0.55 12.2 (243685) UUA L 0.07 7.2 (143715) UCA S 0.15 11.7 (233110) UAA * 0.28 0.7 ( 14322) UGA * 0.50 1.3 ( 25383) UUG L 0.13 12.6 (249879) UCG S 0.06 4.5 ( 89429) UAG * 0.22 0.5 ( 10915) UGG W 1.00 12.8 (255512) CUU L 0.13 12.8 (253795) CCU P 0.28 17.3 (343793) CAU H 0.41 10.4 (207826) CGU R 0.08 4.7 ( 93458) CUC L 0.20 19.4 (386182) CCC P 0.33 20.0 (397790) CAC H 0.59 14.9 (297048) CGC R 0.19 10.9 (217130) CUA L 0.07 6.9 (138154) CCA P 0.27 16.7 (331944) CAA Q 0.25 11.8 (234785) CGA R 0.11 6.3 (126113) CUG L 0.41 40.3 (800774) CCG P 0.11 7.0 (139414) CAG Q 0.75 34.6 (688316) CGG R 0.21 11.9 (235938) AUU I 0.36 15.7 (313225) ACU T 0.24 12.8 (255582) AAU N 0.46 16.7 (331714) AGU S 0.15 11.9 (237404) AUC I 0.48 21.4 (426570) ACC T 0.36 19.2 (382050) AAC N 0.54 19.5 (387148) AGC S 0.24 19.4 (385113) AUA I 0.16 7.1 (140652) ACA T 0.28 14.8 (294223) AAA K 0.42 24.0 (476554) AGA R 0.20 11.5 (228151) AUG M 1.00 22.3 (443795) ACG T 0.12 6.2 (123533) AAG K 0.58 32.9 (654280) AGG R 0.20 11.4 (227281) GUU V 0.18 10.9 (216818) GCU A 0.26 18.6 (370873) GAU D 0.46 22.3 (443369) GGU G 0.16 10.8 (215544) GUC V 0.24 14.6 (290874) GCC A 0.40 28.5 (567930) GAC D 0.54 26.0 (517579) GGC G 0.34 22.8 (453917) GUA V 0.11 7.0 (139156) GCA A 0.23 16.0 (317338) GAA E 0.42 29.0 (577846) GGA G 0.25 16.3 (325243) GUG V 0.47 28.9 (575438) GCG A 0.11 7.6 (150708) GAG E 0.58 40.8 (810842) GGG G 0.25 16.4 (326879)
Coding GC 52.65% 1st letter GC 56.26% 2nd letter GC 42.37% 3rd letter GC 59.31% Genetic code 1: Standard
![Page 35: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/35.jpg)
Codon usage diagram
Usage of various codons along the sequence of lacZO: Optimal codon usageS: Suboptimal codon usageR: Rare codon usageU: Unique codon usage
Created by DNA STRIDER.
![Page 36: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/36.jpg)
Comparative genomics methods
Gene finding by sequence comparison to sequences known to be transcribed or translatedCompare the genomic sequence to sequence databases Proteins mRNA sequences EST sequences (mRNA)
Both exact matches and approximate matches are interestingConserved sequences between speciesProgram: Procrustes
![Page 37: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/37.jpg)
![Page 38: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/38.jpg)
Et eksempel på et resultat med søkeprogrammet Genscan
![Page 39: Hvordan få oversikten?](https://reader036.vdocuments.net/reader036/viewer/2022081418/5681581b550346895dc58175/html5/thumbnails/39.jpg)
Genfinnere på nettet