how to find a gene?*

42
How to find a gene?* • One way is too search for an open reading frame (ORF). • An ORF is a sequence of codons in DNA that starts with a Start codon, ends with a Stop codon, and has no other Stop codons inside. * = inexact science

Upload: naava

Post on 02-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

How to find a gene?*. One way is too search for an open reading frame (ORF). An ORF is a sequence of codons in DNA that starts with a Start codon, ends with a Stop codon, and has no other Stop codons inside. * = inexact science. Each strand has 3 possible ORFs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: How to find a gene?*

How to find a gene?*

• One way is too search for an open reading frame (ORF).

• An ORF is a sequence of codons in DNA that starts with a Start codon, ends with a Stop codon, and has no other Stop codons inside.

* = inexact science

Page 2: How to find a gene?*

Each strand has 3 possible ORFs.

5'                                3’ atgcccaagctgaatagcgtagaggggttttcatcatttgagtaa

1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag taa M   P   K   L   N   S   V   E   G   F   S   S   F   E   * 

2  tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agt  C   P   S   *   I   A   *   R   G   F   H   H   L   S   

3   gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gta   A   Q   A   E   *   R   R   G   V   F   I   I   *   V     

Page 3: How to find a gene?*

Eukaryotic Genomes

• Finding a gene is much more difficult in eukaryotic genomes than in prokaryotic genomes. WHY??

Page 4: How to find a gene?*

Prokaryotic (bacterial) genomes:

• Are much smaller than eukaryotic genomes-

E. coli = 4,639,221 bp, 4.6 Mb

Human = ~~ 3,300 Mb

Page 5: How to find a gene?*

Prokaryotic (bacterial) genomes:

• Contain fewer genes:

E. coli- 4285 protein coding genes

- 122 Structural RNA genes

• Human- ~ ~ ~ 32,000 genes

Page 6: How to find a gene?*

Prokaryotic (bacterial) genomes:

Contain a small amount of noncoding DNA-

E. coli= ~ 11% (average intergenic distance = 130 bp)

Human = > 95% (there are islands, hundreds of thousands of bp, apparently without a gene.)

Page 7: How to find a gene?*

Eukaryotic Genomes:

• Contain massive amounts of repetitive DNA sequences (Define).

• Human- repeat seqeunces comprise over 50% of genome.

• E. coli- DNA is almost entirely unique

Page 8: How to find a gene?*

What are the human repetitive DNA sequences?

1) Simple ‘stutters’ (CAGCAGCAGCAGCAGCAG . . . .)

2) Psuedogenes

3) Transposable elements (= > 40% of HG)

4) Segmental duplications (~ 10 - -300 kb)

5) Gene Families (maybe a reflection of genomic duplications)

Page 9: How to find a gene?*

Shocking discovery in mid 1970s:

Eukaryotic genes are interrupted by noncoding DNA!

Almost all transcripts (mRNA) are spliced before leaving the nucleus.

Page 10: How to find a gene?*
Page 11: How to find a gene?*

Exon

=

Genetic code

Intron

=

Non-essential DNA ? ?

Page 12: How to find a gene?*
Page 13: How to find a gene?*

• The mechanism of splicing is not well understood.

Page 14: How to find a gene?*
Page 15: How to find a gene?*

Variable mutation rate?

• Most mutations in introns and intergenic DNA are (apparently) harmless

• Consequently, intron and intergenic DNA sequences diverge much quicker than exons.

Page 16: How to find a gene?*

Shocking discovery in late1990s:

• Some eukaryotic genomes have thousands of genes that are alternatively spliced.

• In the human genome, it is now estimated that 35% of the genes undergo alternative splicing

Page 17: How to find a gene?*

Alternate Splice sites generate various proteins isoforms

Page 18: How to find a gene?*

Bacteria cells are different:

• Prokaryotic cells- No splicing (i.e. – no split genes)

• Eukaryotic cells- Intronless genes are rare (avg. # of introns in HG is 3-7, highest # is 234); dystrophin gene is > 2.4 Mb.

Page 19: How to find a gene?*

Identifying all of the human genes

a) Is tough

b) Is easy

c) Is really tough

Page 20: How to find a gene?*

Making it tough:

• Pseudogenes

• Large intergenic regions

• Prevelant and long introns

• Alternative splicing

Page 21: How to find a gene?*
Page 22: How to find a gene?*

Comparison of 4 plant genomes:

Page 23: How to find a gene?*

8 genes in C. elegans- 5 intronic genes:

Page 24: How to find a gene?*

12 of the 64 genes duplicated between human chr. #18 and #20

Page 25: How to find a gene?*

Is there a gene in there? 5’

CAGACTGTAGTCGTAGTCGTGTAGTCGTATGGCCGTAGTCGTAGTCGATCGTGATTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTAGTCGTAGTCGTAGCTGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTGTACGTGTAGTCGTAGTCGTAGCTGTACTAGTCGTATGCGTAGTCGTAGTCGTAGCGAGTCTGAGTGTACGTCGTAGTGCTAGTTGCGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGCTGTAGTCGTAGTCGTAGTCGTAGTCGTGTACGTAGTGTCGTATGCGAGGCTAGTCAGGTCGTATGGCTAGTATGCGTAGTCGAGTCGTAGTCGTAGTGTACGTCGTAGTGTCAGTCGTCAGTTGACGTACGTAGTGTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTGAGTGTACGTTGCGTATGGCTATGTATGTGCAGTGCTGTAGTCGTAGTGCTGTAGTCAGTTGCGTAGTGATGTACGTGTATGCGTATGCGTAGTCTGAGTTGCTGAGTGCTAGTCTGAGTGTCGTAGTCGTAGTGCGTAGTCGTATGCGTATGCGTATCGGATTGCGTAGTGTAGCTGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGTCAGTCGTGTAGTAGTCGTATGACCGCGGCGCGAGTTGGTGCGGCGGGGGCTATTTTTCGGAGCGTGTAAGGTTATTAGGTTTTTCCTATTATATGCGCTTAGCGTAGCGCGATTAGCGTATAGCGCATTATATATGCGCCTTCTCTCTTCGAGAGATCTCAGCGTCGTAGTGTACGTCGT

CGAGGCACTGTAGTCGTAGTCGTGTAGTCGTATGGCCGTAGTCGTAGTCGATCGTGATTCGTAGTGGTAGTCGTAGTCGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTAGTCGTAGTCGTAGCTGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTGTACGTGTAGTCGTAGTCGTAGCTGTACTAGTCGTATGCGTAGTCGTAGTCGTAGCGAGTCTGAGTGTACGTCGTAGTGCTAGTTGCGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGCTGTAGTCGTAGTCGTAGTCGTAGTCGTGTACGTAGTGTCGTATGCGAGGCTAGTCAGGTCGTATGGCTAGTATGCGTAGTCGAGTCGTAGTCGTAGTGTACGTCGTAGTGTCAGTCGTCAGTTGACGTACGTAGTGTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTGAGTGTACGTTGCGTATGGCTATGTATGTGCAGTGCTGTAGTCGTAGTGCTGTAGTCAGTTGCGTAGTGATGTACGTGTATGCGTATGCGTAGTCTGAGTTGCTGAGTGCTAGTCTGAGTGTCGTAGTCGTAGTGCGTAGTCGTATGCGTATGCGTATCGGATTGCGTAGTGTAGCTGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGTCAGTCGTGTAGTAGTCGTATGACCGCGGCGCGAGTTGGTGCGGCGGGGGCTATTTTTCGGAGCGTGTAAGGTTATTAGGTTTTTCCTATTATATGCGCTTAGCGTAGCGCGATTAGCGTATAGCGCATTATATATGCGCCTTCTCTCTTCGAGAGATCTCAGCGTCGTAGTGTACGT

CAGACTGTAGTCGTAGTCGTGTAGTCGTATGGCCGTAGTCGTAGTCGATCGTGATTCGTAGTCGTAGTCGTAGTCGTAGTCGGGCTTGTAGTCGAGTCGTAGTCGTAGTCGTAGTCGTAGCTGTAGTCGTAGTCGTAGTCGAGTCGTAGTCGTGTACGTGTAGTCGTAGTCGTAGCTGTACTAGTCGTATGCGTAGTCGTAGTCGTAGCGAGTCTGAGTGTACGTCGTAGTGCTAGTTGCGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGCTGTAGTCGTAGTCGTAGTCGTAGTCGTGTACGTAGTGTCGTATGCGAGGCTAGTCAGGTCGTATGGCTAGTATGCGTAGTCGAGTCGTAGTCGTAGTGTACGTCGTAGTGTCAGTCGTCAGTTGACGTACGTAGTGTCGTAGTCGTAGTCGTAGTCGTAGTCGTAGTGAGTGTACGTTGCGTATGGCTATGTATGTGCAGTGCTGTAGTCGTAGTGCTGTAGTCAGTTGCGTAGTGATGTACGTGTATGCGTATGCGTAGTCTGAGTTGCTGAGTGCTAGTCTGAGTGTCGTAGTCGTAGTGCGTAGTCGTATGCGTATGCGTATCGGATTGCGTAGTGTAGCTGTAGTCGTAGTCGTAGTGTCGTAGTCGTGTAGTCAGTCGTGTAGTAGTCGTATGACCGCGGCGCGAGTTGGTGCGGCGGGGGCTATTTTTCGGAGCGTGTAAGGTTATTAGGTTTTTCCTATTATATGCGCTTAGCGTAGCGCGATTAGCGTATAGCGCATTATATATGCGCCTTCTCTCTTCGAGAGATCTCAGCGTCGTAGTGTACGTCGC

3’

Page 26: How to find a gene?*

How to confirm the identification of a gene?

• Possible answer- Identify the gene by identifying its promoter.

Page 27: How to find a gene?*

Promoters are DNA regions that control when genes are activated.

Promoter coding region

[ ]

Page 28: How to find a gene?*

• Exons encode the information that determines what product will be produced.

Promoters encode the information that determines when the protein will be produced.

Page 29: How to find a gene?*

Nucleotides of a particular gene are often numbered:

Page 30: How to find a gene?*

Demonstration of a consensus sequence.

• De

Page 31: How to find a gene?*

Three current bioinformatic challenges:

• 1) verification of the data (it is correct?)

• 2) Thorough annotation of the data (includes developing appropriate means of annotating)

• 3) How to handle data of ever-larger chunks

Page 32: How to find a gene?*

A dot = a promoter. Dark purple = left to right, light purple = right to left. Overlapping

genes= green

Page 33: How to find a gene?*

Inner circle = ccw direction, outer circle = cw direction

Page 34: How to find a gene?*

How to find a gene?

• Look for a substantial ORFs and associated ‘features’.

ORFs- open reading frames

Page 35: How to find a gene?*

• Two nucleic acids, that are exact complements of each other will hybridize.

• Two nucleic acids that are mostly complementary (some mismatchs) will . . .

. . . hybridize under the right conditions.

Page 36: How to find a gene?*

Recombinant DNA techniques?

• Many popular tools of recDNA rely on the principle of DNA hybridization.

• In large mixes of DNA molecules, complementary sequences will pair.

Page 37: How to find a gene?*

Hybridization ‘in silico’

• Algorithms have been written that will compare two nucleic acid sequences. Two similar DNA sequences (they would hybridize in solution) are said ‘to match’ when software determines that they are of significant similarity.

Page 38: How to find a gene?*

8/10= 80%

Mouse ATGCCGTGCTA

: : : : : : : :

Human ATG--CGGGCAA

Page 39: How to find a gene?*

Protein- Protein similarity searches?

• Many algorithms have been designed to compare strings of amino acids (single letter amino acid code) and find those of a defined degree of similarity.

Page 40: How to find a gene?*

60 70 80 90

#1 TSIDQLRATTSYDELRQDGSTTISYDDYSR : : : . : : : : : : : : : : : : : : : : : : : : .

: : : : :#2 TSIEQLRATTSYDELRQDGSTTISTDDYSR

Page 41: How to find a gene?*

Significance of sequence similarity

• DNA similarity suggests:

• Similar function

• Similar structure

• Evolutionary relationship

Page 42: How to find a gene?*

The End