the organization and sequences of cellular...

51
The Organization and Sequences of Cellular Genomes

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Organization and Sequences of Cellular Genomes

Page 2: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

진핵생물게놈의복잡성

분자생물학: 유전자의구조와기능을연구하는학문(DNARNAProtein)

Large-scale sequencing project 는박테리아, 이스트, 식물, 동물의전체게놈시퀀스를규명하였음.

이것으로유전자연구가끝인가?

대부분의진핵생물의게놈은원핵생물보다크나게놈사이즈는꼭유전적복합성과부합되는것은아님.

대부분진핵생물을게놈은많은부분의 noncoding DNA를가지고있음.

Page 3: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.1 Genome size

Page 4: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

진핵생물게놈의복잡성

사람의제놈은 20,000 to 25,000 gene을가지고있음—E. coli보다단지 5배!!

진핵생물의복잡성은 noncoding sequences로부터오며고등진핵생물의대부분 DNA를차지하고있음.

유전자 (gene): DNA의한부분으로기능을가지는생산물을발현할수있음. (e.g., rRNA, tRNA, or a polypeptide).

Long DNA sequences (spacer sequences) 가 genes 사이에존재함.

Noncoding DNA는유전자안에도존재함 (intron) genes.

Page 5: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.2 The structure of eukaryotic genes

• 유전자내에 Coding sequence (exons)와noncoding sequences (introns)가존재함.

• 유전자가 RNA로전사될때 splicing을통해intron이제거되어 exon 만이 mRNA에나타남..

Page 6: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Key Experiment 5.1 The Discovery of Introns: An electron micrograph and tracing of hexon mRNA hybridized to adenovirus DNA

• Intron은 1977에사람세포주에서 adenovirus 연구를수행하다발견됨.

• mRNA-DNA hybrids

Page 7: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.3 Identification of introns in adenovirus mRNA

RNA splicing: RNA transcript에서 intron을제거하여 coding sequence 만을모은mRNA를만드는과정 (lariat, GU-AG).

Page 8: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.4 The mouse β-globin gene

Page 9: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Intron sequence의 DNA양이 exon보다클수있음

평균인간 gene은 9 exon, 8 intron을가지고있고30,000 base pairs에걸쳐존재함 (30 kilobases, or kb).

Page 10: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

Intron은대부분존재하지만모두존재하는것은아님. Histone gene은intron이없음.

Yeast 같은간단한진핵생물에는Intron이없음.

많은 intron식물과동물의유전자에보존되어있다는사실은진화초기에일어났다는사실을알려줌.

Page 11: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes많은 intron은 functional RNAs를가짐; 특히 transcription and

mRNA processing를조절하는 regulatory sequences를가지고있음.

Intron은 gene의 exon을여러조합으로결합시켜같은gene으로부터다양한 protein을만드는 alternative splicing을가능케함.

Page 12: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

Intron은진화에서다른 gene의 exon을재조합하여진화에서중요한역할을함—exon shuffling.

다른유전자의 intron 사이의 Recombination은새로운protein-coding sequence를갖는유전자를탄생시킬수있음.

진핵생물의제놈은 highly repeated noncoding DNA sequence를가지고있음.

이러한사실은 denatured DNA fragment에서다시reassociation될때발견되었으며그속도 (rate)는DNA strand 농도에의존적임

Page 13: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.6 Identification of repetitive sequences by DNA reassociation

Page 14: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

포유동물 DNA의약 50%가 highly repetitive sequences를가지고있음; 어떤경우 105 to 106정도반복됨.

Genome sequencing과분석이다양한방식의 highly repeated sequence를발견하게함.

Page 15: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

Simple-sequence repeats—tandem arrays of short sequences, from 1 to 500 nucleotides.

They can be separated by equilibrium centrifugation in CsCl density gradients: AT-rich sequences are less dense than GC-rich sequences.

Such repeat-sequence DNAs band as “satellites” separate from the main band of DNA, so they are called satellite DNAs.

They are not transcribed but some play important roles in chromosome structure.

Page 16: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.7 Satellite DNA

Page 17: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

SINEs (short interspersed elements)—they are transcribed, but their function is unknown.

LINEs (long interspersed elements)—are transcribed and some encode proteins, but have no known function.

SINEs and LINEs are transposable elements—capable of moving to different sites in genomic DNA (“Jumping” gene).

Both are retrotransposons—their transposition is mediated by reverse transcription.

Page 18: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.8 Movement of retrotransposons

Page 19: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

Retrovirus-like elements also move within the genome by reverse transcription.

DNA transposons move through the genome by being copied and reinserted as DNA sequences.

Nearly half the human genome consists of repetitive elements that have replicated and moved through the genome by RNA or DNA intermediates.

Some may help regulate gene expression, but most appear not to make a useful contribution to the cell.

Transposable elements have, however, played a major role in stimulating gene rearrangements that have contributed to the generation of genetic diversity

Page 20: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

Many eukaryote genes are present in multiple copies.

Multiple copies of some genes are needed to produce RNAs or proteins in large quantities, such as ribosomal RNAs or histones.

Members of a group of related genes (a gene family) may be transcribed in different tissues or at different stages of development.

α and β subunits of hemoglobin are encoded by gene families; different members of these families are expressed in embryonic, fetal, and adult tissues.

Page 21: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.9 Globin gene families

Page 22: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

Gene families are thought to have arisen by duplication of an original ancestral gene, followed by mutation and divergence of different family members.

The result is proteins optimized for different functions. E.g., fetal globins have a higher affinity for O2 than do adult globins.

But some mutations result in loss of function.

Pseudogenes are nonfunctional gene copies.

There are more than 20,000 pseudogenes in the human genome.

Page 23: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

Gene duplications can arise in two ways:

• Duplication of a segment of DNA results in the transfer of a block of DNA to a new location in the genome.

• Duplication by reverse transcription of an mRNA, followed by integration of the cDNA copy into a new chromosomal site.

Duplication of a gene by reverse transcription usually yields an inactive gene copy called a processed pseudogene—they lack introns and the normal sequences that direct transcription.

Page 24: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.10 Formation of a processed pseudogene

Page 25: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Complexity of Eukaryotic Genomes

The increased size of the genomes of higher eukaryotes is due far more to the presence of large amounts of repetitive sequences and introns than to an increased number of genes.

The genomes of prokaryotes are contained in single chromosomes, usually circular DNA molecules.

The genomes of eukaryotes are composed of multiple chromosomes, each containing a linear molecule of DNA.

The basic structure of chromosomes is the same in all eukaryotes.

Page 26: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences
Page 27: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinChromatin is a complex of eukaryotic DNA and

proteins.

The major proteins are the histones.

There are five major types: H1, H2A, H2B, H3, and H4; which are very similar among different species of eukaryotes.

Page 28: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinThe nucleosome is the basic structural unit of chromatin.

The structure was first described by Kornberg in 1974.

The DNA is wrapped around histones in nucleosome core particles and sealed by histone H1. Nonhistone proteins bind to linker DNA between nucleosome core particles.

Page 29: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and Chromatin

The nucleosome core particles contain 147 base pairs of DNA wrapped 1.67 turns around a histone core consisting of two each of H2A, H2B, H3, and H4 (the core histones).

A chromatosome consists of 166 base pairs of DNA wrapped around the histone core and held in place by H1 (a linker histone).

Page 30: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinPackaging of DNA with histones yields a chromatin fiber approximately 10

nm in diameter, which shortens its length about sixfold.

It is further condensed by coiling into 30-nm fibers, resulting in a total condensation of about fiftyfold.

Page 31: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinThe extent of chromatin condensation varies during the life cycle of the cell.

In interphase (nondividing) cells, most of the chromatin (called euchromatin) is relatively decondensed and distributed throughout the nucleus.

About 10% of interphase chromatin (called heterochromatin) is in a very highly condensed state that resembles the chromatin of cells undergoing mitosis.

Heterochromatin is transcriptionally inactive and contains highly repeated DNA sequences.

Page 32: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinAs cells enter mitosis, their chromosomes become highly condensed.

At metaphase, the DNA has been condensed nearly ten-thousandfold.

Page 33: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinElectron micrographs show that DNA in metaphase chromosomes is

organized into large loops attached to a protein scaffold,

But the detailed structure and the mechanism of chromatin condensation is not currently understood.

Page 34: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and Chromatin

Metaphase chromosomes are so highly condensed that their morphology can be studied using the light microscope.

Staining yields characteristic patterns of light and dark bands, resulting from preferential binding of stains to AT-rich versus GC-rich DNA sequences.

Page 35: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinThe centromere is a specialized region of a chromosome that helps

ensure the correct distribution of duplicated chromosomes to daughter cells during mitosis.

Page 36: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinDNA is replicated during interphase, resulting in two copies of each

chromosome.

Metaphase chromosomes consist of two identical sister chromatids, held together at the centromere.

Microtubules of the mitotic spindle attach to the centromere, and the two sister chromatids separate and move to opposite poles.

Then the nuclear membrane re-forms, and the chromosomes decondense.

Each daughter nuclei contains one copy of each parental chromosome.

Page 37: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinCentromeres are DNA sequences to which proteins bind, forming a kinetochore.

Spindle microtubules bind to the kinetochore.

Proteins associated with the kinetochore act as “molecular motors” to drive the movement of chromosomes along the spindle fibers.

Page 38: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinCentromeric sequences were initially defined in yeasts, by assaying plasmid

segregation.

Plasmids with functional centromeres segregate like chromosomes and are equally distributed to daughter cells.

With no centromere, plasmids don’t segregate properly, and many daughter cells fail to inherit plasmid DNA.

Page 39: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinThese assays have allowed determination of sequences required for

centromere function.

Centromere sequences vary considerable in different organisms.

Page 40: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and Chromatin

The chromatin at centromeres has a unique structure.

Histone H3 is replaced by an H3-like variant called centromeric H3 (CenH3) or CENP-A. It is uniformly present at centromeres of all organisms.

CenH3-containing nucleosomes are required for assembly of the other kinetochore proteins needed for centromere function.

Unique chromatin structure allows centromeres to be stably maintained at cell division.

This is an example of epigenetic inheritance—the transfer of information that is not based on DNA sequences.

Page 41: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinWhen chromosomal DNA replicates, the parental nucleosomes

are distributed to the progeny strands, so CenH3 is present in the centromeres of both newly replicated chromosomes.

These CenH3-containing nucleosomes direct assembly of new CenH3-containing nucleosomes into chromatin.

Page 42: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinTelomeres are sequences at the ends of chromosomes, and are required

for the replication of linear DNA molecules.

The sequences are similar among eukaryotes, with repeats containing clusters of G residues on one strand.

They are repeated hundreds or thousands of times and end with a 3′ overhang of single-stranded DNA.

Page 43: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinThe telomere sequences of some organisms (including humans) form loops

at the ends.

They bind a protein complex (shelterin) that protects the chromosome termini from degradation.

Page 44: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Chromosomes and ChromatinThe ends of linear chromosomes can’t be replicated by DNA polymerase.

Instead, telomerase, which uses reverse transcriptase activity, replicates telomeric DNA sequences.

Maintenance of telomeres appears to be important in determining the lifespan and reproductive capacity of cells.

Studies of telomeres and telomerase may provide new insights into aging and cancer.

Page 45: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Sequences of Complete GenomesComplete genome sequences of many organisms have now been established.

These provide the potential of identifying all the genes in an organism, making them accessible for study, and for identifying the sequences that regulate gene expression.

Page 46: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Sequences of Complete GenomesThe first complete genome was reported in 1995, of the bacterium Haemophilus

influenzae, by Venter and colleagues.

It contains 1.8 × 106 base pairs (1.8 megabases, or Mb).

Potential protein-coding regions were identified by computer analysis to detect open-reading frames—long stretches of that don’t contain any stop codons.

Page 47: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Sequences of Complete Genomes

Complete genome sequences of more than 500 different bacteria are now known.

The origin of mitochondria by endosymbiosis was supported by the genome of Rickettsia prowazekii, which is closely related to genomes of present-day mitochondria.

Sequences of archaebacteria showed that eukaryotic genes for proteins involved in DNA replication, transcription, and translation were derived from archaebacteria rather than eubacteria.

The E. coli genome was completely sequenced in 1997 (4288 genes).

Many of the genes could be identified or their functions deduced, but the function of 1/3 of the genes remains to be determined.

The yeast Saccharomyces cerevisiae has the simplest eukaryotic genome, making it a useful model for eukaryotic cells.

An entire yeast chromosome was sequenced in 1992, and the entire genome was sequenced in 1996 (1.2 × 107 base pairs; 6000 genes).

Page 48: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

Figure 5.25 Yeast chromosome III

Page 49: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Sequences of Complete Genomes

This was followed by creation of strains in which each of the 6000 identified genes had been inactivated by homologous recombination.

Analysis of these mutants has now revealed functions for over 5000 genes.

Another yeast, S. pombe, has also been sequenced.

The two yeasts have very different biology and genomes.

S. pombe has only 4800 genes, plus more and larger introns. Most of the genes are homologs, but about 700 genes are unique to S. pombe.

Page 50: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Sequences of Complete Genomes

The genome sequence of C. elegans was determined in 1998.

The initial phase used DNA fragments cloned in cosmids, but complete sequencing was accomplished by cloning of much larger pieces of DNA in yeast artificial chromosome (YAC) vectors.

YACs contain centromeres and telomeres, allowing them to replicate as linear chromosome-like molecules.

Page 51: The Organization and Sequences of Cellular Genomescontents.kocw.net/KOCW/document/2014/gacheon/parktaesik/... · 2016-09-09 · The Sequences of Complete Genomes Complete genome sequences

The Sequences of Complete GenomesThe C. elegans genome is 97 × 106 base pairs (19,000 genes).

Protein-coding sequences account for only about 25% of the C. elegans genome.

Compare to 60–70% of S. pombe and S. cerevisiae and nearly 90% of bacterial genomes.