genome sequencingprojects

25
Genome projects and Human Genome Projects Sucheta Tripathy, 27 th September 2012 https://sites.google.com/site/suchetala

Upload: sucheta-tripathy

Post on 13-Jun-2015

335 views

Category:

Education


1 download

DESCRIPTION

Coursework 100 Bioinformatics CSIR - IICB

TRANSCRIPT

Page 1: Genome sequencingprojects

Genome projects and Human Genome

ProjectsSucheta Tripathy,

27th September 2012https://sites.google.com/site/suchetalab/

Page 2: Genome sequencingprojects

Introduction. History of Genome Sequencing. Rationale behind genome sequencing. How genomes are sequenced. What happens next.

◦ Assembly and Annotation.◦ Sequence Submissions.

Microbial Genome Sequencing. Human Genome Project.

◦ Encode Project.◦ 1000 genomes project.

Topics to be covered

Page 3: Genome sequencingprojects

Gene + Chromosome -> Genome

What is a Genome?

A/T/G/C

A/U/G/C

Page 4: Genome sequencingprojects

Determining the order of billions of chemical units that builds the genetic material.◦ Secrets of life is locked up in the order of the 4

letters!!!!

Why determine the order of nucleotides?

5-100 million living species???

Page 5: Genome sequencingprojects

Genome Sequencing History

Organism Year Institute Genome Size

Bacteriophage MS2

1976 Walter Fiers at the University of Ghent

3569 bp

Phage Φ-X174 1977 Fred Sanger Cambridge

5386 bp

Haemophilus influenzae

1995 TIGR 1,830,138 bp

Saccharomyces cerevisiae

1996 European Effort

12,495,682(16 chromosomes)

Human Genome Project

2000 Multiple Organizations

3.3 x 109

(3 billion letters)

Page 7: Genome sequencingprojects

Sanger Dideoxy Sequencing methods(1977) Maxam Gilberts Chemical degradation methods(1977) Two Labs that owned automated sequencers:

1. Leroy Hood at Caltech, 1986(commercialized by AB)2. Wilhelm Ansorge at EMBL, 1986(commercialized by Pharmacia-Amersham and GE healthcare)3.Hypoxanthine-guanine phosphoribosyltransferase (HGPRT)Alu sequences4. Hitachi Laboratory developed High throughput capillary array sequencer, 1996.1991, A patent filed by EMBL on media less, solid support based sequencing.

How Genomes are sequenced?

Page 8: Genome sequencingprojects

454 sequencing methods(2006)◦ Principles of pyrophosphate detection(1985, 1988)

Illumina(Solexa) Genome sequencing methods(2007) Applied Biosystems ABI SOLiD System(2007) Helicos single molecule sequencing(Helioscope, 2007) Pacific Biosciences single-molecule real-time(SMRT)

technology, 2010 Sequenom for Nanotechnology based sequencing. BioNanomatrixnanofluidiscs RNAP technologyhttp://www.ncbi.nlm.nih.gov/books/NBK20261/

How Genomes are sequenced

Page 9: Genome sequencingprojects

Shotgun sequencing http://www.scq.ubc.ca/genome-projects-uncovering-the-blueprints-of-biology/

Page 11: Genome sequencingprojects

Gene Prediction Comparative Genomics Orthologs search Blast Analysis Functional Categories

Annotation

Page 13: Genome sequencingprojects

http://www.insdc.org/

http://www.ebi.ac.uk/embl/Contact/collaboration.html

Page 14: Genome sequencingprojects

JGI – IMG [http://img.jgi.doe.gov/]

Broad TIGR WashU VBI at Virginia Tech

Microbial Genome Sequencing

Page 15: Genome sequencingprojects

Human Genome Project

In October 1990 Human

Genome project started

First Publication in 2000

Finished paper in

2003

NHGRI Solicited

pilot proposal

for ENCODE

First Report on Encode Published in 2007

RFAs were sought for

full ENCODE

ENCODE published

2012

GWAS -90% lies outside coding

2005

Page 17: Genome sequencingprojects

• 95% of the genome is “junk”.– 2.94% of the genome is coding

• cis regulatory elements occur within a limited genome distance.

• Most of the genome is transposable elements that are of obscure origin are dying.

• Transcribed elements are most often translated than not.

What we knew

Page 18: Genome sequencingprojects

Encyclopedia Of DNA Elements

Page 19: Genome sequencingprojects

• http://www.nature.com/encode/• http://www.encodeproject.org/ENCOD

E/• http://www.factorbook.org/ • http://encodeproject.org/ENCODE/dat

aStandards.html• http://1000genomes.org• http://genome.ucsc.edu/ENCODE/

Some of the useful links:

Page 20: Genome sequencingprojects

http://www.gencodegenes.org/data.html

Page 21: Genome sequencingprojects

http://www.nature.com/nature/journal/v489/n7414/full/489049a.html

Page 22: Genome sequencingprojects

• 80% of the human genome is active!!– 70,000 promoters and 400,000 enhancers

• 75% of the genome transcribed in some tissue or other during life time.

• Environment plays great role in switching on or off of a lot many genes. [Epigenetics]

• Most of the diseases don’t lie with the genes but the switches!!

• Dark matters controlling the genes are physically close to the genes they control.

Key Findings:

Page 23: Genome sequencingprojects

• Genes and the switches don’t hold one to one relationship!

• 4 million switches controlling 21,000 genes!!

• Identical twins are NOT identical – greatly influenced by environments.

• Astronomy and genetic Biology looks similar(95% of the Universe is called as dark matter – we don’t understand)

Key Findings:

Page 25: Genome sequencingprojects

To study the effect of environment and their effects on diseases.

99.5% DNA are similar. 269 individuals genotype. One million SNPs genotyped

◦ Rose to 10 million including polymorphic sites.

HapMap Project