introduction to genomes & genome browsers content introduction the human genome human genetic...

38
Introduction to genomes & genome browsers Content Introduction The human genome Human genetic variation SNPs CNVs Alternative splicing Browsing the human genome Celia van Gelder CMBI UMC Radboud December 2013 [email protected]

Upload: gwenda-colleen-cummings

Post on 31-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Introduction to genomes & genome browsers

Content

Introduction The human genome Human genetic variation

SNPs CNVs Alternative splicing

Browsing the human genomeCelia van Gelder

CMBIUMC Radboud

December [email protected]

Page 2: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Exponential Growth in Genomic Sequence Data

# of genomes Currently1000+ completed

genomes

First 2 bacterial genomes complete

First eukaryotecomplete

(yeast) First metazoancomplete(flatworm)

Page 3: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

©CMBI 2013

Genome projects

http://www.genomesonline.org/

Page 4: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

The pig genome

Page 5: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

The human genome

• Genome: the entire sequence of DNA in a cell

• 3 billion basepairs (3Gb)

• 22 chromosome pairs + X en Y chromosomes

• Chromosome length varies from ~50Mb to ~250Mb

• About 20000 protein-coding genes(average gene length 3000 bases, but largest known gene is 2.4 Mb (dystrophin))

• Human genome is 99.9% identical among individualsThis means that every 2 persons differ in 3 million nts!!

Page 6: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Eukaryotic Genomes: more than collections of genes

• Genes & regulatory sequences make up 5% of the genome

– Protein coding genes – RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA)– Structural DNA (centromeres, telomeres)– Regulation-related sequences (promoters, enhancers,

silencers, insulators)– Parasite sequences (transposons)– Pseudogenes (non-functional gene-like sequences)– Simple sequence repeats

Page 7: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

The human genome cntnd

From: Molecular Biology of the Cell

(4th edition) (Alberts et al., 2002)

• Only 1.2% codes for proteins

• Long introns, short exons

• Large spaces between genes

• More than half consists of repetitive DNA

Alu repeat~300 bp> million copies

Page 8: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Chromosome organisation (1)

Genes that are OFF

Genes that are ON

Page 9: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Introduction to genomes & genome browsers

Content

Introduction The human genome Human genetic variation

CNVs SNPs Alternative splicing

Browsing the human genome

Page 10: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Human Genetic Variation

• Every human has essentially the same set of genes, but there are different forms of each gene -- known as alleles

• Genetic variation explains some of the differences among people, such as:– Blood group– Eye color– Skin color– Hair color– Higher or lower risk for getting particular diseases

• Cystic fibrosis, Sickle cell disease, • Diabetes, Cancer, Arthritis, Asthma • Stroke, Heart disease• Alzheimer's disease, Parkinson's disease• Depression, Alcoholism

Page 11: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Variations in the Genome

Common Sequence

Variations

Polymorphism

Deletions

Translocations

Insertions

Chromosome

Page 12: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Today’s focus

1. Single Nucleotide Polymorphisms (SNPs)

2. Copy number variations (CNV)

3. Alternative transcripts

Page 13: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Single Nucleotide Polymorphisms (SNPs)

• SNPs are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered.

• For a variation to be considered a SNP, it must occur in at least 1% of the population.

• SNPs make up about 90% of all human genetic variation and occur every 100 to 300 bases.

• SNPs can occur in coding (gene) and non coding regions of the genome; <1% alter the protein sequence

Page 14: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

SNPs

• determine properties like eye color, hair (curly or straight), or if you can taste bitter or not.

• are used for identification and forensics • are used for estimating predisposition to disease• can cause drug side–effects and/or non responsiveness for

the drug • have impact on how humans respond to environmental

factors like bacteria, viruses, toxins and chemicals• are used to predict specific genetic traits• are used for classifying patients in clinical trials• are used for mapping and genome-wide association studies

of complex diseases

Page 15: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

SNP - Bitter tasting, TAS2R38

Page 16: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

SNP & disease, Alzheimer

Alzheimer's disease (AD) & apolipoprotein E (APOE)

• Apolipoprotein E is a cholesterol carrier that is found in the brain and other organs.

• APOE is suspected to be involved in amyloid beta aggregation and clearance, influencing the onset of amyloid beta deposition.

• APOE contains 2 SNPs that result in 3 possible alleles: E2, E3, E4.

• Variant rs429358 rs7412 E2 T + T

E3 T + C E4 C + C

• A person who inherits at least one E4 allele will have a greater chance of developing AD.

Page 17: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Today’s focus

1. Single Nucleotide Polymorphisms (SNPs)

2. Copy number variations (CNV)

3. Alternative transcripts

Page 18: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Copy Number Variation

• Copy Number Variations (CNVs):gains and losses of large chunks of DNA sequence (10kB – 5Mb)

• When there are genes in the CNV areas, this can lead to variations in the number of gene copies between individuals

• CNVs contribute to our uniqueness.

• CNVs can also influence the susceptibility to disease.

• CNVs may either be inherited or caused by de novo mutation

Page 19: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Copy Number Variation

Normal cell

deletion amplification

CN=0 CN=1 CN=3 CN=4

CN=2

Page 20: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

CNVs & disease

• Many inherited genetic diseases result from CNVs; – Gene copy number can be elevated in cancer cells– Autism– Schizophrenia (dept. human genetics)– Mental retardation (dept. human genetics)– Parkinsons disease

• There are CNVs that protect against HIV infection and malaria.

• The contribution of CNV to the common, complex diseases, such as diabetes and heart disease, is currently less well understood

Page 21: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Today’s focus

1. Copy number variations (CNV)

2. Single Nucleotide Polymorphisms (SNPs)

3. Alternative transcripts

Page 22: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Alternative splicing

Page 23: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Alternative splicing

• Defects in alternative splicing have been implicated in many diseases, including:

– neuropathological conditions such as Alzheimer disease

– cystic fibrosis, those involving growth and developmental defects

– many human cancers, e.g. BRCA1 in breast cancer

– Beta-globin in Beta-thalassemia

– Parkinsons Disease

Page 24: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Introduction to genomes & genome browsers

Content

Introduction The human genome Human genetic variation

CNVs SNPs Alternative splicing

Browsing the human genome

Page 25: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Annotating the genome

Annotation: attaching biological information to sequences. Two main steps:

• identifying elements on the genome• attaching biological information to these elements.

Page 26: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Basic & Advanced Genome Annotation

• Basic:– Genomic location– Gene features: Exons, Introns, UTRs– Transcript(s)– Pseudogenes, Non-coding RNA– Protein(s)– Links to other sources of information

• Advanced– Cytogenetic bands– Polymorphic markers– Genetic variation, including SNPs & CNVs– Repetitive sequences– cDNAs or mRNAs from related species– Genomic sequence variation– Regulation sequences (enhancers, silencers, insulators)

Page 27: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

[Human] Genome Browsers

EBIEnsembl

NCBIMap Viewer

UCSC Genome Browser

Not limited toonly human data

Page 28: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Ensembl

©EMBL-EBI

Page 29: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Other Ensembl Installations

©EMBL-EBI

Page 30: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

genes & predictions

variations & repeats

cross-speciescomparative data

& many more types of data from expression& regulation to mRNA and ESTs…

Gene X

DescriptionTranscript dataStructureGene OntologyPathway DataHomologous GenesExpression DataEtc….

Organized Data Based on Chromosome Location

track

s

Page 31: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing
Page 32: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

HGNC – a unique name and symbol for every gene in human http://www.genenames.org/

ENSG### Ensembl Gene IDENST### Ensembl Transcript IDENSP### Ensembl Peptide IDENSE### Ensembl Exon ID

Page 33: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Ensembl: An Example

Click for

more

details

track

str

ack

s

Page 34: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Direction of transcription

Above blue line: forward strand

Below blue line: reverse strand

Page 35: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Ensembl Transcripts

©EMBL-EBI

Page 36: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing
Page 37: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing
Page 38: Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing

Synopsis- What can I do with Ensembl?

• View, examine & explore annotated information for any chromosomal region:– Genes, – ESTs, mRNAs, alternative transcripts– Proteins– SNPs, and SNPs across strains (rat, mouse), populations

(human), or even breeds (dog)– homologues and phylogenetic trees across more than 40

species– whole genome alignments– conserved regions across species– gene expression profiles

• Upload your own data and use BLAST/BLATagainst any Ensembl genome

• Export sequence, or create a table of gene information