introduction to molecular biology, genetics and genomics sushmita roy [email protected]...

53
Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy www.biostat.wisc.edu/bmi576/ [email protected] September 6, 2012 BMI/CS 576

Upload: tiffany-brooks

Post on 22-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Introduction to Molecular Biology, Genetics and Genomics

Sushmita Roy

www.biostat.wisc.edu/bmi576/[email protected]

September 6, 2012

BMI/CS 576

Page 2: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Goals for today

• Molecular biology crash course:– The different parts of a cell– DNA, RNA, chromosomes, nucleus, cytoplasm– Bio-chemical entities of a cell: mRNA, proteins,

metabolites– genes, heredity, transcription, translation, gene regulation,

gene expression, alternative splicing

• Genomics crash course:– Genomes, functional genomics, other omes, networks

Page 3: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Organization of biological information

Organism

Tissue

Gene

Chromosome

Cell

http://publications.nigms.nih.gov/thenewgenetics/chapter1.html

Page 4: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

The central dogma of Molecular biology

DNA

RNA

Proteins

Transcription

Translation

Page 5: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

image from the DOE Human Genome Programhttp://www.ornl.gov/hgmis

Page 6: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

DNA

• Short for Deoxy ribonucleic acid

• composed of small chemical units called nucleotides (or bases)– adenine (A), cytosine (C), guanine (G) and thymine (T)– ATGC is the alphabet

• DNA is double stranded: made up two twisting strands

• Each strand of DNA is a string composed of the four letters: A, C, G, T

Page 7: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

DNA is a double helical molecule

DNA molecules consist of two strands arranged in a double helix

• DNA is made up of nucleotides

Double-helical structure is needed for the DNA molecule to store and pass with great precision

James Watson, Francis Crick, Maurice Wilkins and Rosalind Franklin

Page 8: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Watson-Crick Base Pairs

A always bonds to T C always bonds to G

This is called base pairing.A and G are double ringed structures called purines.C and T single ringed structures called pyrimidines

Page 9: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

5’ and 3’ of a DNA molecule• The backbone of this molecule has

alternating carbon and phosphate molecules

• each strand of DNA has a “direction”– at one end, the terminal carbon atom

in the backbone is the 5’ carbon atom of the terminal sugar

– at the other end, the terminal carbon atom is the 3’ carbon atom of the terminal sugar

• therefore we can talk about the 5’ and the 3’ ends of a DNA strand

Page 10: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

DNA stores the blue print of an organism

• The heredity molecule• Has the information needed to make an organism• Base pairing enables self-replication:

– one strand has all the information

Page 11: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Chromosomes

• All the DNA of an organism is divided up into individual chromosomes

• prokaryotes (single-celled organisms lacking nuclei) typically have a single circular chromosome

• eukaryotes (organisms with nuclei) have a species-specific number of chromosomes

Image from www.genome.gov

Page 12: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

DNA packaging in Chromatin

DNA is very long (3m in humans), cell is very smallChromosome compresses the DNA molecule 50,000Collection of DNA and proteins is called chromatin.

Page 13: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Different organisms have different numbers of chromosomes

Organism # of chromosomes

Yeast 32

Human 46

Fly 8

Mouse 40

Arabidopsis 10

Worm 12

Page 14: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Genes

• genes are the basic units of heredity

• a gene is a sequence of bases which specifies a protein or RNA genes

• the human genome comprises ~ 25,000 protein-coding genes (still being revised)

• One gene can have many functions• One function can require many

genes…GTATGTCTAAGCCTGAATTCAGTCTGCTTTAAACGGCTTC…

Page 15: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Structure of genes

DNA

GeneNon-coding Promoter

Gene A Gene B Gene C

Page 16: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Genomes

• Refers to the complete complement of DNA for a given species

• the human genome consists of 2X23 chromosomes

• every cell (except egg and sperm cells and mature red blood cells) contains the complete genome of an organism

Page 17: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Some Greatest Hits

Genome Where Year

H. Influenza TIGR 1995

E. Coli K -12 Wisconsin 1997

S. cerevisiae (yeast) internat. collab. 1997

C. elegans (worm) Washington U./Sanger 1998

Drosophila M. (fruit fly) multiple groups 2000

E. Coli 0157:H7 (pathogen) Wisconsin 2000

H. Sapiens (that’s us) internat. collab./Celera 2001

Mus musculus (mouse) internat. collaboration 2002

Rattus norvegicus (rat) internat. Collaboration 2004

Page 18: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Some Genome Sizes

genome # base pairs

HIV 9750

E. coli 4.6 million

S. cerevisiae 12 million

C. elegans 97 million

Drosophila M. 137 million

human 3.1 billion

Page 19: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Number of sequenced genomes

Page 20: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

The central dogma of Molecular biology

DNA

RNA

Proteins

Transcription

Translation

Page 21: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

RNA

• RNA is like DNA except:– single stranded– U is used in place of T

• a strand of RNA can be thought of as a string composed of the four letters: A, C, G, U

Page 22: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Transcription• In eukaryotes: happens inside the nucleus• RNA polymerase is an enzyme that builds an RNA strand

from a gene• RNA Pol II is recruited at specific parts of the genome in a

condition-specific way. • Transcription factor proteins are assigned the job of Pol II

recruitment.

• RNA that is transcribed from a gene is called messenger RNA (mRNA)

Page 23: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Transcription: Process of turning DNA into RNA

mRNA

Page 24: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

The central dogma of Molecular biology

DNA

RNA

Proteins

Transcription

Translation

Page 25: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Translation

• Process of turning mRNA into proteins.

• Happens inside the cytoplasm in ribosomes

• ribosomes are the machines that synthesize proteins from mRNA

• Translation process reads one codon at a time

• translation begins with the start codon

• translation ends with the stop codon

Page 26: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Translation happens in ribosomes

Page 27: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Codons

• Each triplet of bases is called a odon• How many codons are possible?• Each codon is responsible for coding a particular

amino acid.

Page 28: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

The Genetic Code

Page 29: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Codons and Reading Frames

Alanine

Threonine

Page 30: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Proteins

• Proteins are long strings of composed of amino acids

• There are 20 different amino acids known

Page 31: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Amino AcidsAlanine Ala A

Arginine Arg R

Aspartic Acid Asp D

Asparagine Asn N

Cysteine Cys C

Glutamic Acid Glu E

Glutamine Gln Q

Glycine Gly G

Histidine His H

Isoleucine Ile I

Leucine Leu L

Lysine Lys K

Methionine Met M

Phenylalanine Phe F

Proline Pro P

Serine Ser S

Threonine Thr T

Tryptophan Trp W

Tyrosine Tyr Y

Valine Val V

Page 32: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Proteins are the workhorses of the cell

• structural support• storage of amino acids• transport of other substances• coordination of an organism’s activities• response of cell to chemical stimuli• movement• protection against disease• selective acceleration of chemical reactions

Page 33: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Proteins are complex molecules

• Primary amino acid sequence

• Secondary structure• Tertiary structure• Quarternary structure

Page 34: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Some well-known proteins

Hemoglobin: carries oxygen Insulin: metabolism of sugarActin: maintenance of cell structure

Page 35: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Hemoglobin protein HBA1

>gi|224589807:226679-227520 Homo sapiens chromosome 16, GRCh37.p9 Primary Assembly

1 CCCACAGACT CAGAGAGAAC CCACCATGGT GCTGTCTCCT GACGACAAGA CCAACGTCAA

61 GGCCGCCTGG GGTAAGGTCG GCGCGCACGC TGGCGAGTAT GGTGCGGAGG CCCTGGAGAG

121 GATGTTCCTG TCCTTCCCCA CCACCAAGAC CTACTTCCCG CACTTCGACC TGAGCCACGG

181 CTCTGCCCAG GTTAAGGGCC ACGGCAAGAA GGTGGCCGAC GCGCTGACCA ACGCCGTGGC

241 GCACGTGGAC GACATGCCCA ACGCGCTGTC CGCCCTGAGC GACCTGCACG CGCACAAGCT

301 TCGGGTGGAC CCGGTCAACT TCAAGCTCCT AAGCCACTGC CTGCTGGTGA CCCTGGCCGC

361 CCACCTCCCC GCCGAGTTCA CCCCTGCGGT GCACGCCTCC CTGGACAAGT TCCTGGCTTC

421 TGTGAGCACC GTGCTGACCT CCAAATACCG TTAAGCTGGA GCCTCGGTGG CCATGCTTCT

481 TGCCCCTTTG G

DNA sequence (491 bp)

>sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

Amino acid sequence (142 aa) Protein 3d structure

Page 36: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

RNA Processing in Eukaryotes

• eukaryotes are organisms that have enclosed nuclei in their cells

• in many eukaryotes, RNAs consist of alternating exon/intron segments

• exons are the coding parts

• introns are spliced out before translation

Page 37: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

RNA Splicing

Page 38: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

RNA Genes

• not all genes encode proteins• for some genes the end product is RNA

– ribosomal RNA (rRNA), which includes major constituents of ribosomes

– transfer RNAs (tRNAs), which carry amino acids to ribosomes

– micro RNAs (miRNAs), which play an important regulatory role in various plants and animals

– linc RNAs (long non-coding RNAs), play important regulatory roles.

Page 39: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Central Dogma revisited

DNA

RNA

Proteins

Transcription

Translation

ncRNA, miRNA, rRNAs

Non-coding RNA processing

Page 40: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Summary

• Key concepts in molecular biology– Central Dogma– DNA, RNA, proteins– Chromosomes, Nucleus, Ribosomes

• Important processes– Transcription– Translation– RNA splicing

Page 41: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Functional Genomics

• Aims to characterize gene, proteins in an organism in an unbiased way using high throughput technologies.

• Really focused on “beyond the genetic sequence”• What does a piece of DNA do?

– Gene, regulatory element, a mutation

• Has generated large collections of “omics” datasets– Gene expression– Protein expression– Metabolite levels

Page 42: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Metabolites

• Metabolism:– A set of chemical processes in cells – Need for sustaining life

• Small molecules that are intermediates of metabolism– Sugar– Glycerol

• Metabolic pathway– A set of chemical reactions in a cell

Page 43: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

The Tri-Carboxylic Acid cycle

Metabolites

Enzyme

Courtesy KEGG Pathways

Page 44: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Yeast metabolic pathways

Page 45: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Context-specific expression of a cell

• The DNA is static • But the set of mRNA per cell type, environment, time-

point may be different.• A key process is gene regulation

– determines which genes are expressed when

Environmental signal

Page 46: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Transcriptional gene regulation

• Key control process that determines what genes are expressed when

• Requires– RNA Polymerase– Transcription factors– Energy

http://www.youtube.com/watch?v=WsofH466lqk

Page 47: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Transcriptional gene regulation

Transcription factor level (trans)

HSP12

Transcription factor binding sites (cis)

mRNA levels

P2P1

Promoter

Page 48: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Regulation of GAL genes

• GAL genes are required for yeasts to grow on Galactose.

• There are 4 genes that are metabolic– GAL1, GAL10, GAL2 and GAL7

• There are three that are regulatory– GAL4, GAL80 and GAL3

Page 49: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Regulation of GAL genes

No Galactose

In Galactose

A metabolic GAL gene

Page 50: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Transcriptome

• The entire set of RNA products in a cell• A cell can decide to make more or less of a particular

RNA– Levels change

• It’s constituents are context-specific• Context is determined by environment of a cell

Page 51: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Transcriptional Regulatory networks

• The entire set of interactions between TFs and genes in an organism

• The transcriptome is the output of a regulatory network

Image courtesy: Dr. Mike Snyder, http://compbio.pbworks.com/w/page/16252928/Transcription%20Regluatory%20Network#1

Page 52: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Understanding cells requires an iterative approach spanning multiple levels

Ideker et al., Science 2002

Page 53: Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy  sroy@biostat.wisc.edu September 6, 2012 BMI/CS 576

Summary

• Cells are made up of many different molecular entities

• Functional genomics enables us to identify these entities

• Cells function via the interaction of these entities• Putting it together into comprehensive models is a

major goal of systems biology