genome sequencingprojects
DESCRIPTION
Coursework 100 Bioinformatics CSIR - IICBTRANSCRIPT
Genome projects and Human Genome
ProjectsSucheta Tripathy,
27th September 2012https://sites.google.com/site/suchetalab/
Introduction. History of Genome Sequencing. Rationale behind genome sequencing. How genomes are sequenced. What happens next.
◦ Assembly and Annotation.◦ Sequence Submissions.
Microbial Genome Sequencing. Human Genome Project.
◦ Encode Project.◦ 1000 genomes project.
Topics to be covered
Gene + Chromosome -> Genome
What is a Genome?
A/T/G/C
A/U/G/C
Determining the order of billions of chemical units that builds the genetic material.◦ Secrets of life is locked up in the order of the 4
letters!!!!
Why determine the order of nucleotides?
5-100 million living species???
Genome Sequencing History
Organism Year Institute Genome Size
Bacteriophage MS2
1976 Walter Fiers at the University of Ghent
3569 bp
Phage Φ-X174 1977 Fred Sanger Cambridge
5386 bp
Haemophilus influenzae
1995 TIGR 1,830,138 bp
Saccharomyces cerevisiae
1996 European Effort
12,495,682(16 chromosomes)
Human Genome Project
2000 Multiple Organizations
3.3 x 109
(3 billion letters)
Eukaryotes [2231] ◦ Animal◦ Fungi◦ Plants◦ Protists◦ Others
Prokaryotes [14268] Viruses [3219]Ref:
http://www.ncbi.nlm.nih.gov/genome/browse/
Genomes Sequenced so far…19987 – 19718 (26th Sept 2012)
Sanger Dideoxy Sequencing methods(1977) Maxam Gilberts Chemical degradation methods(1977) Two Labs that owned automated sequencers:
1. Leroy Hood at Caltech, 1986(commercialized by AB)2. Wilhelm Ansorge at EMBL, 1986(commercialized by Pharmacia-Amersham and GE healthcare)3.Hypoxanthine-guanine phosphoribosyltransferase (HGPRT)Alu sequences4. Hitachi Laboratory developed High throughput capillary array sequencer, 1996.1991, A patent filed by EMBL on media less, solid support based sequencing.
How Genomes are sequenced?
454 sequencing methods(2006)◦ Principles of pyrophosphate detection(1985, 1988)
Illumina(Solexa) Genome sequencing methods(2007) Applied Biosystems ABI SOLiD System(2007) Helicos single molecule sequencing(Helioscope, 2007) Pacific Biosciences single-molecule real-time(SMRT)
technology, 2010 Sequenom for Nanotechnology based sequencing. BioNanomatrixnanofluidiscs RNAP technologyhttp://www.ncbi.nlm.nih.gov/books/NBK20261/
How Genomes are sequenced
Shotgun sequencing http://www.scq.ubc.ca/genome-projects-uncovering-the-blueprints-of-biology/
Assemblyhttp://www.springerimages.com/Images/Biomedicine/1-10.1007_s12575-009-9004-1-1http://en.wikipedia.org/wiki/Sequence_assembly
Gene Prediction Comparative Genomics Orthologs search Blast Analysis Functional Categories
Annotation
http://www.genomesonline.org/cgi-bin/GOLD/index.cgi
http://www.insdc.org/
http://www.ebi.ac.uk/embl/Contact/collaboration.html
JGI – IMG [http://img.jgi.doe.gov/]
Broad TIGR WashU VBI at Virginia Tech
Microbial Genome Sequencing
Human Genome Project
In October 1990 Human
Genome project started
First Publication in 2000
Finished paper in
2003
NHGRI Solicited
pilot proposal
for ENCODE
First Report on Encode Published in 2007
RFAs were sought for
full ENCODE
ENCODE published
2012
GWAS -90% lies outside coding
2005
http://www.youtube.com/watch?v=N4i6lYfYQzY
• 95% of the genome is “junk”.– 2.94% of the genome is coding
• cis regulatory elements occur within a limited genome distance.
• Most of the genome is transposable elements that are of obscure origin are dying.
• Transcribed elements are most often translated than not.
What we knew
Encyclopedia Of DNA Elements
• http://www.nature.com/encode/• http://www.encodeproject.org/ENCOD
E/• http://www.factorbook.org/ • http://encodeproject.org/ENCODE/dat
aStandards.html• http://1000genomes.org• http://genome.ucsc.edu/ENCODE/
Some of the useful links:
http://www.gencodegenes.org/data.html
http://www.nature.com/nature/journal/v489/n7414/full/489049a.html
• 80% of the human genome is active!!– 70,000 promoters and 400,000 enhancers
• 75% of the genome transcribed in some tissue or other during life time.
• Environment plays great role in switching on or off of a lot many genes. [Epigenetics]
• Most of the diseases don’t lie with the genes but the switches!!
• Dark matters controlling the genes are physically close to the genes they control.
Key Findings:
• Genes and the switches don’t hold one to one relationship!
• 4 million switches controlling 21,000 genes!!
• Identical twins are NOT identical – greatly influenced by environments.
• Astronomy and genetic Biology looks similar(95% of the Universe is called as dark matter – we don’t understand)
Key Findings:
1000 genomes project
http://en.wikipedia.org/wiki/1000_Genomes_Project
Copy Number Variation
SNPs
Indels
Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Peruvians in Perú; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles
To study the effect of environment and their effects on diseases.
99.5% DNA are similar. 269 individuals genotype. One million SNPs genotyped
◦ Rose to 10 million including polymorphic sites.
HapMap Project