anatomy of a gene
TRANSCRIPT
-
8/8/2019 Anatomy of a Gene
1/33
-
8/8/2019 Anatomy of a Gene
2/33
BASIC GENETIC MECHANISMS
-
8/8/2019 Anatomy of a Gene
3/33
How did we know that genes are made of DNA?
Streptococcus pneumoniae comes in 2 forms that differ from one another in their
microscopic appearance and in their ability to cause disease. Cells of the pathogenic
strain, which are lethal when injected into mice, are encased in a slimy, glistening
polysaccharide capsule, designated the S form. The harmless strain of lacks thisprotective coat; it forms colonies that appear flat and rough, referred to as the R form.
Fred Griffith found in the 1920s that a substance present in the virulent S strain could
permanently change, or transform, the nonlethal R strain into the deadly S strain.
-
8/8/2019 Anatomy of a Gene
4/33
Avery, MacLeod, and McCarty in the 1930s prepared an extract from
the disease-causing S strain and identified the transforming
principle that would permanently change R-strain pneumococci into
the lethal S strain as DNA. This was the first evidence that DNA couldserve as the genetic material.
-
8/8/2019 Anatomy of a Gene
5/33
(A) In 1952, Hershey and Chase worked with T2 viruses, which are
made of protein and DNA. (B) To determine whether the genetic
material of the T2 virus is protein or DNA, the researchers radioactively
labeled the DNA in one batch of viruses with 32P and the proteins in a
2nd
batch of viruses with35
S. These labeled viruses were then allowedto infect E. coli, and the mixture was disrupted by brief pulsing in a
Waring blender to separate the infected bacteria from the empty viral
heads. When radioactivity was measured, they found that most of the32P-labeled DNA had entered the bacterial cells, while most of the 35S-
labeled proteins remained in solution with the spent viral particles.
-
8/8/2019 Anatomy of a Gene
6/33
In molecular terms, a GENE is the entire DNAsequence required for synthesis of a functional
protein or RNA molecule.
A gene includes: exons (coding), control or
regulatory regions and introns (non-coding).
Most bacterial and yeast genes lack introns,
whereas most genes in multicellular organisms
contain them. The total length of intron
sequences often is much longer than that of exon
sequences. A simple eukaryotic transcription unit produces a
single monocistronic mRNA, which is translated
into a single protein.
WHAT IS A GENE?
-
8/8/2019 Anatomy of a Gene
7/33
-
8/8/2019 Anatomy of a Gene
8/33
A bacterial operon comprises a single transcription
unit, which is transcribed from a particular
promoter into a single primary transcript. Genesand transcription units are distinguishable in
prokaryotes.
Most eukaryotic genes and transcription units
generally are identical, and the two terms are used
interchangeably.
-
8/8/2019 Anatomy of a Gene
9/33
A complex eukaryotic transcription unitis
transcribed into a primary transcript that can be
processed into 2 or more different monocistronicmRNAs depending on the choice of splice sites or
polyadenylation sites. Eukaryotic transcription units
are classified into 2 types, depending on the fate of
the 10 transcript:
1. The 10 transcript produced from a simple
transcription unit is processed to yield a single
type of mRNA, encoding a single protein.
2. In complex transcription units, the 10 RNA
transcript can be processed in more than one way,leading to formation of mRNAs containing
different exons. Each mRNA is monocistronic, with
translation usually initiating at the first AUG in the
mRNA.
-
8/8/2019 Anatomy of a Gene
10/33
(Top) If a 10
transcript
contains
alternativesplice sites, it
can be
processed into
mRNAs with the
same 5 and 3exons but
different
internal exons.
(Bottom) If a 10
transcript has
two poly(A)sites, it can be
processed into
mRNAs with
alternative 3
exons.
-
8/8/2019 Anatomy of a Gene
11/33
If alternative promoters (f or g) are active in different cell types, mRNA1,
produced in a cell type in which f is activated, has a different exon (1A) than
mRNA2 has, which is produced in a cell type in which g is activated (and
where exon 1B is used). Mutations in control regions (a and b) and those
designated c within exons shared by the alternative mRNAs affect theproteins encoded by both alternatively processed mRNAs. In contrast,
mutations (d and e) within exons unique to one of the alternatively processed
mRNAs affect only the protein translated from that mRNA. For genes that are
transcribed from different promoters in different cell types (bottom),
mutations in different control regions (f and g) affect expression only in the
cell type in which that control region is active.
-
8/8/2019 Anatomy of a Gene
12/33
(a) The tryptophan (trp) operon is a continuous segment of the E. colichromosome, containing 5 genes (blue) that encode the enzymes necessary for the stepwise
synthesis of tryptophan. The order of the genes in the bacterial genome parallels the sequential
function of the encoded proteins in the tryptophan pathway. (b) The 5 genes encoding the enzymes
required for tryptophan synthesis in yeast(Saccharomyces cerevisiae) are carried on 4 different
chromosomes. Each gene is transcribed from its own promoter to yield a primary transcript that is
processed into a functional mRNA encoding a single protein.
-
8/8/2019 Anatomy of a Gene
13/33
MAJOR CLASSES OF EUKARYOTIC DNA AND THE HUMAN GENOME
-
8/8/2019 Anatomy of a Gene
14/33
LINES, SINES, retroviral-like elements, and DNA-only transposons are all mobile
genetic elements that have multiplied in our genome by replicating themselves andinserting the new copies in different positions. Simple sequence repeats are short
nucleotide sequences (less than 14 nucleotide pairs) that are repeated for long
stretches. Segmental duplications are large blocks of the genome (1000200,000
nucleotide pairs) that are present at two or more locations in the genome. Over half
of the unique sequence consists of genes and the remainder is probably regulatory
DNA. Most of the DNA present in heterochromatin has not yet been sequenced.
-
8/8/2019 Anatomy of a Gene
15/33
PROTEIN-CODING GENES
1. Solitary genes - roughly 2550% of the protein-
coding genes represented only once in the haploidgenome
2. Duplicated genes constitute the second group of
protein coding genes with close but nonidentical
sequences that generally are located within 550
kb of one another. In vertebrate genomes,duplicated genes constitute half the protein-
coding DNA sequences.
3. Gene family is a set of duplicated genes that
encode proteins with similar but nonidentical
amino acid sequences. The encoded, closelyrelated, homologous proteins constitute a protein
family. A few protein families, such as protein
kinases, transcription factors, and vertebrate
immunoglobulins, include hundreds of members.
-
8/8/2019 Anatomy of a Gene
16/33
GENE FAMILY FUNCTION #
Translation, ribosomal structure and biogenesis 61
Transcription 5
Replication, repair, recombination 13
Cell division and chromosome partitioning 1
Molecule chaperones 9
Outer membrane, cell-wall biogenesis 3
Secretion 4
Inorganic ion transport 9
Signal transduction 1
Energy production and conversion 18
Carbohydrate metabolism and transport 14
Amino acid metabolism and transport 40
Nucleotide metabolism and transport 15
Coenzyme metabolism 23
Lipid metabolism 8
General biochemical function predicted;
specific biological role unknown33
Function unknown 1
Numbersof gene
families,
classified
by
function,
that are
common to
all 3
domains ofthe living
world
-
8/8/2019 Anatomy of a Gene
17/33
TANDEMLY REPEATED GENES encode rRNAs, tRNAs,
histones
rRNAs are encoded in tandem arrays in genomic DNA.
Multiple copies of tRNA and histone genes also occur,often in clusters, but not generally in tandem arrays.
REPETITIOUS DNA are concentrated in specific
chromosomal locations
1. Simple-sequence or satellite DNA consists largely of
quite short sequences repeated in long tandem arraysand is preferentially located in centromeres (they assist
in attaching chromosomes to spindle fibers during
mitosis), telomeres, and specific locations within the arms
of particular chromosomes.
Repeats containing 113 bp are often called micro-satellites and cause about 14 neuromuscular diseases
(myotonic dystrophy, spinocerebelllar ataxia).
The length of a particular simple-sequence tandem array
is quite variable between individuals in a species. These
differences form the basis for DNA fingerprinting.
-
8/8/2019 Anatomy of a Gene
18/33
2. Mobile DNA elements are moderately repeated DNA
sequences interspersed at multiple sites throughout
the genomes of higher eukaryotes. They are less
frequent in prokaryotes.
a. DNA transposons are mobile DNA elements that
transpose to new sites directly as DNA.
b. Retrotransposons are first transcribed into anRNA copy of the element, which then is reverse-
transcribed into DNA.
A common feature of all mobile elements is the
presence of short direct repeats flanking the
sequence. Enzymes encoded by mobile elements themselves
catalyze insertion of these sequences at new sites in
genomic DNA.
-
8/8/2019 Anatomy of a Gene
19/33
. (a)Eukaryotic DNA
transposons (orange)moveviaaDNA intermediate,
which is excised fromthe
donorsite.(b)Retrotransposons (green)
are firsttranscribed intoan
RNAmolecule, whichthenisreverse-transcribed into
double-strandedDNA. In
bothcases, the double-
strandedDNA intermediate
is integratedintothe target-
site DNA tocompletemovement. Thus DNA
transposons move byacut-
and-paste mechanism,
whereas retrotransposons
move byacopy-and-pastemechanism.
-
8/8/2019 Anatomy of a Gene
20/33
Retrotransposons are much more abundant in vertebrates. However, DNAtransposons which are similar in structure to bacterial IS elements occur (e.g.,
the Drosophila P element). The relatively large central region of an IS element,
which encodes one or two enzymes required for transposition, is flanked by an
inverted repeat at each end. The sequences of the inverted repeats are nearly
identical, but they are oriented in opposite directions. The sequence is
characteristic of a particular IS element. The 5 and 3 short direct (as opposedto inverted) repeats are not transposed with the insertion element; rather, they
are insertion-site sequences that become duplicated, with one copy at each
end, during insertion of a mobile element. The length of the direct repeats is
constant for a given IS element, but their sequence depends on the site of
insertion and therefore varies with each transposition of the IS element.
Arrows indicate sequence orientation.
-
8/8/2019 Anatomy of a Gene
21/33
LTR retrotransposons or viral retrotransposons (8% of
human genomic DNA) are flanked by long terminal
repeats (LTRs), similar to those in retroviral DNA; theyencode reverse transcriptase and integrase.
They move in the genome by being transcribed into RNA,
which then undergoes reverse transcription and
integration into the host-cell chromosome.
The central protein-coding region is flanked by 2 long terminal repeats (LTRs),
which are element-specific direct repeats. Like other mobile elements, integrated
retrotransposons have short target-site direct repeats at each end. The protein-
coding region constitutes 80% or more of a retrotransposon and encodes reverse
transcriptase, integrase, and other retroviral proteins.
-
8/8/2019 Anatomy of a Gene
22/33
The left LTR directs cellular RNA polymerase II to initiate transcription at the
first nucleotide of the left R region. The resulting primary transcript extends
beyond the right LTR. The right LTR, now present in the RNAprimary transcript, directs cellular enzymes to cleave the primary
transcript at the last nucleotide of the right R region and to add a poly(A)
tail, yielding a retroviral RNA genome. A similar mechanism generates the
RNA intermediate during transposition of retrotransposons. The short
direct-repeat sequences (black) of target-site DNA are generated during
integration of the retroviral DNA into the host-cell genome.
-
8/8/2019 Anatomy of a Gene
23/33
-
8/8/2019 Anatomy of a Gene
24/33
The genomic RNA is packaged in the virion with a retrovirus-specific
cellular tRNA hybridized to a complementary sequence near its 5 end called the primer-
binding site (PBS). The retroviral RNA has a short direct-repeat terminal sequence (R) at
each end. The overall reaction is carried out by reverse transcriptase.
-
8/8/2019 Anatomy of a Gene
25/33
Nonviral retrotransposons are the most abundant
mobile elements in mammals. They form two classes
in mammalian genomes: LINEs and SINEs (long andshort interspersed elements.
Both LINEs and SINEs lack LTRs and have an A/T-
rich stretch at one end. They move by a nonviral
retrotransposition mechanism mediated by LINE
encoded proteins involving priming by chromosomal
DNA.
SINE sequences exhibit extensive homology with
small cellular RNAs transcribed by RNA polymerase
III. Alu elements, the most common SINEs in humans,
are 300-bp sequences found scattered throughout
the human genome.
-
8/8/2019 Anatomy of a Gene
26/33
The length of the target-site direct repeats varies among
copies of the element at different sites in the genome.
Although the full-length L1 sequence is 6 kb long,
variable amounts of the left end are absent at over 90% of
the sites where this mobile element is found. The shorteropen reading frame (ORF1), 1 kb in length, encodes an
RNA-binding protein. The longer ORF2, 4 kb in length,
encodes a bifunctional protein with reverse transcriptase
and DNA endonuclease activity.
-
8/8/2019 Anatomy of a Gene
27/33
Only ORF2 protein is represented.
Newly synthesized LINE DNA isshown in black.
-
8/8/2019 Anatomy of a Gene
28/33
Some moderately repeated DNA sequences are
derived from cellular RNAs that were reverse-
transcribed and inserted into genomic DNA at sometime in evolutionary history.
Processed pseudogenes are derived from mRNAs,
lack introns; a feature that distinguishes them from
pseudogenes, which arose by sequence drift of
duplicated genes.
The human globin gene cluster contains two pseudogenes
(white); these regions are related to the functional globin-type
genes but are not transcribed. Each red arrow indicates the
location of an Alu sequence, an 300-bp noncoding repeated
sequence that is abundant in the human genome.
-
8/8/2019 Anatomy of a Gene
29/33
Mobile DNA elements were earlier viewed as
selfish molecular parasites. Today, they are
viewed as contributors to the evolution of
higher organisms by promoting:
the generation of gene families via gene
duplication
the creation of new genes via shuffling of
preexisting exons
formation of more complex regulatory
regions that provide multifaceted control of
gene expression
-
8/8/2019 Anatomy of a Gene
30/33
Mobile DNA elements most likely influenced evolution
significantly by serving as recombination sites and by
mobilizing adjacent DNA sequences. They have also beenfound in mutant alleles associated with several
human genetic diseases.
Recombination between interspersed repeats in the introns of separate
genes produces transcription units with a new combination of exons.
A double crossover between two sets of Alu repeats results in an exchange
of exons between the two genes.
-
8/8/2019 Anatomy of a Gene
31/33
Transposase can
recognize and cleave the DNA at the ends of the transposon
inverted repeats. In gene 1, if the transposase cleaves at the leftend of the transposon on the left and at the right end of the
transposon on the right, it can transpose all the intervening DNA,
including the exon from gene 1, to a new site in an intron of gene 2.
The net result is an insertion of the exon from gene 1 into gene 2.
-
8/8/2019 Anatomy of a Gene
32/33
Some LINEs have weak
poly(A) signals. If such a LINE is in the 3-most intron of gene 1,
during transposition its transcription may cntinue beyond its ownpoly(A) signals and extend into the 3 exon, transcribing the
cleavage and polyadenylation signals of gene 1 itself. This RNA
can then be reverse transcribed and integrated by the LINE ORF2
protein into an intron on gene 2, introducing a new 3 exon (from
gene 1) into gene 2.
-
8/8/2019 Anatomy of a Gene
33/33