how can we find genes? search for them look them up

14
How can we find genes? Search for them Look them up

Upload: jonathan-stevenson

Post on 21-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How can we find genes? Search for them Look them up

How can we find genes?

Search for themLook them up

Page 2: How can we find genes? Search for them Look them up

How do I get from this…

>mouse_ear_cress_1080 GAAATAATCAATGGAATATGTAGAGGTCTCCTGTACCTTCACAGAGATTCTAGGCTGAGAGCAGTGCATATAGATATCTTTCGTACTCATCTGCTTTTTCTGGTCTCCATCACAAAAGCCAACTAGGTAATCATATCAATCTCTCTTTACCGTTTACTCGACCTTTTCCAATCAGGTGCT TCTGGTGTGTCTACTACTATCAGTTTTAGGTCTTTGTATACCTGATCTTATCTGCTACTG AGGCTTGTAAAAGTGATTAAAACTGTGACATTTACTCTAAGAGAAGTAACCTGTTTGATGCATTTCCCTAATATACCGGTGTGGAAAAGTGTAGGTATCTGTACTCAGCTGAAATGGTGGACGATTTTGAAGAAGATGAACTCTCATTGACTGAAAGCGGGTTGAAGAGTGAAGATGGCGTTATTATCGAGATGAATGTCTCCTGGATGCTTTTATTATCATGTTTGGGAATTTACCAAGGGAGAGGTATCAGAATCTATCTTAGAAGGTTACATTTAGCTCAAGCTTGCATCAACATCTTTACTTAGAGCTCTACGGGTTTTAGTGTGTTTGAAGTTTCTTAACTCCTAGTATAATTAGAATCTTCTGCAGCAGACTTTAGAGTTTTGGGATGTAGAGCTAACCAGAGTCGGTTTGTTTAAACTAGAATCTTTTTATGTAGCAGACTTGTTCAGTACCTGAATACCAGTTTTAAATTACCGTCAGATGTTGATCTTGTTGGTAATAATGGAGAAACGGAAGAATAATTAGACGAAACAAACTCTTTAAGAACGTATCTTTCAGTTTTCCATCACAAATTTTCTTACAAGCTACAAAAATCGAACTATATATAACTGAACCGAATTTAAACCGGAGGGAGGGTTTGACTTTGGTCAATCACATTTCCAATGATACCGTCGTTTGGTTTGGGGAAGCCTCGTCGTACAAATACGACGTCGTTTAAGGAAAGCCCTCCTTAACCCCAGTTATAAGCTCAAAGTTGTACTTGACCTTTTTAAAGAAGCACGAAACGAAAAACCCTAAAATTCCCAAGCAGAGAAAGAGAGACAGAGCAAGTACAGATTTCAACTAGCTCAAGATGATCATCCCTGTTCGTTGCTTTACTTGTGGAAAGGTTGATATTTTCCCCTTCGCTTTGGTCTTATTTAGGGTTTTACTCCGTCTTTATAGGGTTTTAGTTACTCCAAATTTGGCTAAGAAGAGATCTTTACTCTCTGTATTTGACACGAATGTTTTTAATCGGTTGGATACATGTTGGGTCGATTAGAGAAATAAAGTATTGAGCTTTACTAAGCTTTCACCTTGTGATTGGTTTAGGTGATTGGAAACAAATGGGATCAGTATCTTGATCTTCTCCAGCTCGACTACACTGAAGGGTAAGCTTACAATGATTCTCACTTCTTGCTGCTCTAATCATCATACTTTGTGTCAAAAAGAGAGTAATTGCTTTGCGTTTTAGAGAAATTAGCCCAGATTTCGTATTGGGTCTGTGAAGTTTCATATTAGCTAACACACTTCTCTAATTGATAACAGAAGCTATAAAATAGATTTGCTGATGAAGGAGTTAGCTTTTTATAATCTTCTGTGTTTGTGTTTTACTGTCTGTGTCATTGGAAGAGACTATGTCCTGCCTATATAATCTCTATGTGCCTATCTAGATTTTCTATACAATTGATATTTGATAGAAGTAGAAAGTAAGACTTAAGGTCTTTTGATTAGACTTGTGCCCATCTACATGATTCTTATTGGACTAATCATTCTTTGTGTGAAAATAGAATACTTTGTCTGAACATGAGAGAATGGTTCATAATACGTGTGAAGTATGGGATTAGTTCAACAATTTCGCTATTGGAGAAGCAAACCAAGGGTTAATCGTTTATAGGGTTAAGCTAATGCTCTGCTCTTTATATGTTATTGGAACAGACTATTGTTGTGCCTATCTTGTTTAGTTGTAGATTCTATCTCGACTGTTATAAGTATGACTGAAGGCTTGATGACTTATGATTCTCTTTACACCTGTAGAAGGATTTAAGCTTGGTGTCTAGATATTCAATCTGTGTTGGTTTTGTCTTTCTTTTGGCTCTTAGTGTTGTTCAATCTCCTCAATAGGTATGAAGTTACAATATCCTTATTATTTTGCAGGGACGCACTTGATGCACTCCAGCTAGTCAGATACTGCTGCAGGCGTATGCTAATGACCTTGCATCAACATCTTTACTTAGAGCTCTACGGGTTTTAGTGTGT

Page 3: How can we find genes? Search for them Look them up

…to this?

Page 4: How can we find genes? Search for them Look them up

Meaning?

Page 5: How can we find genes? Search for them Look them up

Mathematical Tools (Code; statistics)

Page 6: How can we find genes? Search for them Look them up

Comparative Tools (Database searches)

Page 7: How can we find genes? Search for them Look them up

What do we know about genes?• Expressed (Transcribed)

– Transcriptional start & termination sites (TXSS, TXTS)– Transcription artefacts (cDNA & ESTs)

• Regulated– Promoters (TATAAA)– Transcription Factor Binding Sites– CpG (Cytosin methylation)

• Meaningful (Translated)– 3n basepairs– Codon usage– Translational start & stop/termination codons (TLSS, TLTS)– Translation artefacts (proteins)

• Spliced– Splice sites (GT-AG)

• Derived (Homology: Paralogy/Orthology)– Search for known genes, proteins (BLAST)

Page 8: How can we find genes? Search for them Look them up

How might this knowledge help to find genes?

• Predict genes– Look for potential starts and stops.– Connect them into open reading frames (ORFs).– Filter for “correct’ length & codon usage.

• Search databases– Known genes: UniGene– Known proteins: UniProt

• Use transcript evidence– cDNA– ESTs– proteins

Page 9: How can we find genes? Search for them Look them up

Operating computationally

• Go to beginning of sequence start SCAN• If ATG register putative TLSS; then

– Move in 3-steps & count steps (=COUNTS)– If 3-step = (TAA or TAG or TGA), register putative TLTS– If register evaluate COUNTS (= triplets)

If COUNTS < minimum discard; then go behind ATG above and start SCAN

If COUNTS > maximum discard; then go behind ATG above and start SCAN

If minimum < COUNTS < maximum record as GENE with TLSS, TLTS; then go behind ATG above and start SCAN.

• Arrive at end of sequence stop SCAN

Page 10: How can we find genes? Search for them Look them up

Find gene families

Mathematical evidence

Analyze large data

sets

Browse in ccontext

Construct gene

models

Annotation workflow

Biological evidence

Browse results

Get/Generate sequence

Page 11: How can we find genes? Search for them Look them up

Annotation Cheat Sheet• Open existing project or generate new (Red square)

• Run RepeatMasker

• Generate evidence (Predictions, BLAST searches)

• Synthesize evidence into gene models (Apollo)

• Browse results locally and in context (Phytozome)

• Conduct functional analysis (link from Browser)

• Prospect for gene family (Yellow Line from Browser)• Select region that holds biological gene evidence

• Optimize work space and zoom to region (View tab)

• Expand all tiers (Tiers tab)

• Drag evidence item(s) onto workspace (mouse)

• Edit to match biol. evidence (right-click item for tools)

• Record what was done in Annotation Info Editor

• Assess necessity to build alternative model(s)

• Upload model(s) to DNA Subway (File tab)

A. DNA Subway

B. Apollo

Page 12: How can we find genes? Search for them Look them up

Predictors (mathematical evidence)

• Utilize predominantly mathematical methods (statistical).• Search for patterns

– Some score starts, stops, splice sites (GenScan).– Some score nucleotides (Augustus, FGenesH).

• Few incorporate EST data and/or known genes/proteins.• Require optimization for each new species (training).• Accuracy:

– False positives (scoring non-genes as genes):5% - 50%.– False negatives (missed genes): 5%-40%.– Weak or unable in determining first and last exons, and UTRs.

• Specific for gene models (spliced genes, non-spliced genes).• Specialty predictors (tRNA Scan, RepeatMasker).

Page 13: How can we find genes? Search for them Look them up

Search tools (biological evidence)

• Search sequence (molecules; tangible) databases:– Known genes– Known proteins– cDNAs & ESTs

• Utilize alignment methods (BLAST, BLAT).• Reliability:

– Good in determining gene locations and general gene structures.– Weak in exactly determining exon/intron borders.– Unlikely to correctly determine TXSS and TXTS.– Should be used with cDNA/EST from same species as genome.

Page 14: How can we find genes? Search for them Look them up

Sequence & course material repository

http://gfx.dnalc.org/files/evidenceDon’t open items, save them to your computer!!

• Annotation (sequences & evidence)• Manuals (DNA, Subway, Apollo, JalView)• Presentations (.ppt files)• Prospecting (sequences)• Readings (Bioinformatics tools, splicing, etc.)• Worksheets (Word docs, handouts, etc.)• BCR-ABL (temporary; not course-related)