all nucleotides contain three components: 1. a nitrogen heterocyclic base 2. a pentose sugar 3. a...

105
l nucleotides contain three components: A nitrogen heterocyclic base A pentose sugar A phosphate residue Nucleic Acids DNA and RNA are nucleic acids, long, thread-like polymers made up of a linear array of monomers called nucleotides

Upload: julia-alyson-hancock

Post on 18-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

All nucleotides contain three components:1. A nitrogen heterocyclic base2. A pentose sugar3. A phosphate residue

Nucleic AcidsDNA and RNA are nucleic acids, long, thread-like polymers made up of a linear array of monomers called nucleotides

Ribonucleotides have a 2’-OHDeoxyribonucleotides have a 2’-H

Chemical Structure of DNA vs RNA

Structure of Nucleotide BasesBases are classified as Pyrimidines or Purines

The nucleus contains the cell’s DNA (genome)RNA is synthesized in the nucleus and exported

to the cytoplasmNucleus

Cytoplasm

DNA

RNA (mRNA)

Proteins

replication

transcription

translation

dA dG dT dC

Deoxyribonucleotides found in DNA

Nucleotides arelinked by

phosphodiesterbonds

Bases form a specific hydrogen bond patternDNA is double stranded

The strands of DNA are antiparallel

The strands are complimentary

There are Hydrogen bond forces

There are base stacking interactions

There are 10 base pairs per turn

Properties of a DNA double helix

DNA is a Double-Helix

RNase P M1 RNA

Transcription of a DNA molecule results in a mRNA molecule that is single-stranded.

RNA molecules do not have a regular structure like DNA.

Structures of RNA molecules are complex and unique.

RNA molecules can base pair with complementary DNA or RNA sequences.

G pairs with C, A pairs with U, and G pairs with U.

bulge

internal loop

hairpin

Nucleic Acids in Acid and Base

The glycosidic bond of DNA and RNA is hydrolyzed by acids.

Order of stability: dA, dG < rA, rG < dC, dT < rC, rUdA, dG hydrolyzed in boiling 0.1 M hydrochloric acid in 30 minrA, rG hydrolyzed in boiling 1 M hydrochloric acid in 60 minrC, rU hydrolyzed in boiling 12 M perchloric acid in 60 min

DNA is quite stable under basic conditions.

RNA is readily hydrolyzed by base.

RNA is hydrolyzed under alkaline (basic) conditions

Methylation of Nucleotide basesCertain nucleotide bases in DNA molecules are methylated, catalyzed by enzymes.Adenine and Cytosine are methylated more often than Guanine and Thymine.Methylation is confined to specific regions of DNA and aid in biological processes.E. coli DNA is methylated to distinguish its DNA from that of foreign invaders.In eukaryotic cells about 5% of cytidines are methylated, producing 5-methylcytidine.

Spontaneous Alterations in Nucleic AcidsIn a human cell, DNA undergoes spontaneous alterations in structure (mutations).As a cell ages, the number of mutations increases, making it likely that a cell’s normal processes may be altered.There is a link between spontaneous mutation, aging, and carcinogenesis.

Depurination

Hypothesis:

If DNA contined uracil, during replication of DNA theuracils would be base-paired with adenine.

Deaminated cytosines would also be base-paired with adenine.This would decrease the number of G-C base pairs over time

and increase the number of A-U base pairs.Eventually all the G-C base pairs could be lost.The genetic code would not exist as we know it.

Why does DNA contain thymine and not uracil?

Ultraviolet light is damaging to DNA

Near-UV radiation (wavelengths of 200 – 400 nm) is a significant portion of the solar spectrum.Upon exposure to ultraviolet radiation, two adjacent pyrimidine bases can dimerize.This happens most often between two adjacent thymines.

Two products often form:cyclobutane thymine dimer6-4 photoproduct

Nucleic Acids

Where are they found in nature?

and

What do they look like?

GenomesSource of DNA Size (bases) Type

Escherichia coli 9,200,000 Closed-circular double-stranded DNA

Bacillus subtilis 4,200,000 Closed-circular double-stranded DNA

F plasmid 95,000 Closed-circular double-stranded DNA

phage 48,500 Linear double-stranded DNA

T7 phage 40,000 Linear double-stranded DNA

M13 phage 6,400 Closed-circular single-stranded DNA

MS2 phage 3,600 Linear single-stranded RNA

Human 6,000,000,000 Linear double-stranded DNA

Fruit fly 270,000,000 Linear double-stranded DNA

HIV 9,700 Linear single-stranded RNA

DNA molecules are packaged in the cell as structures called chromosomes.

Bacteria have a single chromosome. Eukaryotes have multiple chromosomes.

A single chromosome contains thousands of genes, each encoding a protein.All of an organism’s chromosomes make up the genome.

Humans have 46 chromosomes.The human genome has about 3 billion nucleotide base pairs.

The Human Genomehttp://www.ncbi.nlm.nih.gov/genome/guide/human/

How is DNA packaged into a cell?

E. coli has a single double-strandedDNA molecule as its genome.There are 4,639,221 base pairs

in the E. coli genome.The DNA is 1.7 mM long,

850 times the length of an E. coli cell.

plasmid

Large DNA moleculesare compacted in a cell

by supercoiling.

relaxed supercoiled

DNA in eukaryotic cells is packaged into nucleosomes,which contain proteins called histones.

DNA wrapped around a histone core (side view)

Nucleosomes are packaged to form 30 nm fibers

Compaction of 30 nm fibers uses nuclear

scaffolds

In eukaryotes,genes contain exons (coding regions)

and introns (non-coding regions).

Prokaryotic genes do notcontain introns.

Telomeres

Telomeres are sequences at the end of eukaryoticchromosomes that help stabilize the chromosome.Telomeres are repeats of the following sequence:

5’-(TxGy)n x and y = 1 to 43’-(AxCy)n The TG strand is longer

5’-TTTGGTTTGGTTTGGTTTGGTTTGGTTTGG…3’-AAACCAAACCAAACC…

Can be >10,000 nucleotides in mammals.The ends of the chromosome are replicated by

the enzyme telomerase.

Telomeres and agingThere appears to be a relationship between the length of

telomeres at the end of chromosomes and the age of an individual.

The older you are, the shorter your telomeres are.

Germ-line cells (reproductive cells) contain telomerase activity.Non-germ-line cells (somatic cells) do not contain telomerase

activity.We have a certain length of telomeres that we are born with.As we age, the telomeres get shorter.

Is our life-span pre-determined by the length of our telomeres?

Internet Resources

Nucleic Acids

National Center for Biotechnology Information (NCBI)

National Library of Medicine (NLM)

National Institutes of Health (NIH)

http://www.ncbi.nlm.nih.gov/

GenBank® is the NIH genetic sequence database, an annotated

collection of all publicly available DNA sequences ( Nucleic Acids Research , 2011 Jan;39(Database

issue):D32-7 ). There are approximately 126,551,501,141 bases in 135,440,924 sequence

records in the traditional GenBank divisions and 191,401,393,188 bases

in 62,715,288 sequence records in the WGS division as of April 2011.

GenBank

BLAST SEARCHWhat is BLAST?

BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the

query is protein or DNA. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random

background hits. BLAST uses an algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share

only isolated regions of similarity.

The core of NCBI 's BLAST services is BLAST 2.0 otherwise known as "Gapped BLAST". This service is designed to take protein and nucleic acid sequences and

compare them against a selection of NCBI databases.

Instead of relying on global alignments (commonly seen in multiple sequence alignment programs) BLAST emphasizes regions of local alignment to detect relationships among sequences which share only isolated regions of similarity.

Therefore, BLAST is more than a tool to view sequences aligned with each other or to calculate percent homology, but a program to locate regions of sequence similarity with

a view to comparing structure and function.

Below is a table of these programs.

Program  Description

blastp Compares an amino acid query sequence against a protein sequence database.

blastn Compares a nucleotide query sequence against a nucleotide sequence database.

blastxCompares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation

products of an unknown nucleotide sequence.

tblastnCompares a protein query sequence against a nucleotide sequence database

dynamically translated in all reading frames.

tblastx

Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that the tblastx

program cannot be used with the nr database on the BLAST Web page because it is computationally intensive.

The BLAST search pages allow you to select from several different programs

Database Description

nr All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences).

month All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days.

dbest Non-redundant database of GenBank+EMBL+DDBJ EST Divisions.

dbsts Non-redundant database of GenBank+EMBL+DDBJ STS Divisions.

mouse ests The non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism mouse.

human ests The Non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism human.

other ests The non-redundant database of GenBank+EMBL+DDBJ EST Divisions all organisms except mouse and human.

yeastYeast (Saccharomyces cerevisiae) genomic nucleotide sequences. Not a collection of all Yeast nucelotides sequences, but the sequence fragments from the Yeast complete genome.

E. coli E. coli (Escherichia coli) genomic nucleotide sequences.

pdb Sequences derived from the 3-dimensional structure of proteins.

kabat [kabatnuc] Kabat's database of sequences of immunological interest.

patents Nucleotide sequences derived from the Patent division of GenBank.

vector Vector subset of GenBank(R), NCBI

mito Database of mitochondrial sequences

alu Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. It is available at

epd Eukaryotic Promotor Database ISREC in Epalinges s/Lausanne (Switzerland).

gss Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.

htgs High Throughput Genomic Sequences.

Nucleotide Databases

CGTGATGAACGGCTTCGAGCGATACGAGGGAGTGCGTCACTGCCGCTATGTGGACGAGTTGCAGATCGTCCAGAATGCGCCATGGACTCTGTCCGATGAATTCATCGCCGACAACAAAATCGACTTTGTGGCCCACGACGACATTCCGTATGTAACCGATGGCATGGACGACATCTATGCTCCTCTCAAGGCGCGCGGCATGTTTGTGGCCACGGAGCGCACTGAGGGTGTGTCCACCTCGGACATCGTAGCCCGGATCGTCAAGGATTACGATCTGTATGTGCGTCGTAATCTGGCCAGAGGCTATTCGGCCAAGGAACTCAATGTGTCGTTCCTGTCCGAGAAGAAGTTCCGGCTGCAGAACAA

Nucleic Acid SequenceWhat does it encode?

Problem• As an employee of the Environmental Protection Agency

(EPA) you are charged with maintaining safe public swimming in lakes.

• In a sample isolated from a lake used for swimming and boating you discover the following nucleic acid molecule that you believe is part of a larger gene sequence. You suspect the organism from which the gene came may be harmful to the public.

• 5’-CATCCAGGGAATCACCAAGCCCGCCATTCGCCGTCTGGCTCGCCG-3’

• Determine if you should shut down public access to the lake, or if the lake is safe.

Problem #2• In the middle of the swimming season you re-test the lake to

make sure it is safe for human use. • In a sample isolated from the lake you discover the

following nucleic acid molecule that you again believe is part of a larger gene sequence. You wonder the organism from which the gene came may be harmful to the public.

• 5’-GTCGAAGCGCCACTCGAAGGAGAAGGACACGCTCGGGGGCATCAC-3’

• As before, determine if you should shut down public access to the lake, or if the lake is safe.

DNA sequences recognized by regulatory proteins are often invertedrepeats of a short DNA sequence. These repeats form a palindrome with two-fold symmetry about a central axis.

DNA binding proteins are often dimeric, with two identical protein subunits.

Each subunit binds to one strand of the DNA.

5’-TACGGTACTGTGCTCGAGCACTGCTGTACT-3’3’-ATGCCATGACACGAGCTCGTGACGACATGA-5’

central axis

Regulatory Proteins

Proteins often bind to specific sequences of DNA.Example: Restriction enzyme EcoRI binds to the DNA sequence

5’-GAATTC-3’3’-CTTAAG-5’

Protein – DNA interaction

A variation in sizes of DNA seen after cutting with restriction enzymes.

Restriction enzymes cut DNA at a specific site. For example, the EcoR1 restriction enzyme cuts DNA whenever it sees the letters GAATTC:

DNA before cutting by EcoR1: 5’-AATCTAGGGAATTCACAGCGATGCGAATTCGCAATTA-3’3’-TTAGATCCCTTAAGTGTCGCTACGCTTAAGCGTTAAT-5’

DNA after cutting by EcoR1: 5’-AATCTAGGG AATTCACAGCGATGCG AATTCGCAATTA-3’3’-TTAGATCCCTTAA GTGTCGCTACGCTTAA GCGTTAAT-5’

In this example, EcoR1 has cut the one strand of 37 base pairs into 3 smaller strands of DNA. If another person has slightly different DNA, EcoR1 may cut the DNA into pieces of different lengths. (For example: If the second GAATTC is GAATTT, EcoR1 will cut this other person's DNA in only one place, producing 2 smaller strands of DNA.)

The words "fragment length polymorphism" mean "DNA pieces of different lengths." RFLPs are a quick way to see if two pieces of DNA are identical, without having to look at the entire DNA sequence.

Restriction Fragment Length Polymorphism (RFLP)

IS6110 Fingerprints of M. tuberculosis

Each person has a unique set of fingerprints. As with a person’s fingerprint no two individuals share the same genetic makeup. This genetic makeup, which is the hereditary blueprint imparted to us by our parents, is stored in the chemical deoxyribonucleic acid (DNA), the basic molecule of life. Examination of DNA from individuals, other than identical twins, has shown that variations exist and that a specific DNA pattern or profile could be associated with an individual. These DNA profiles have revolutionized criminal investigations and have become powerful tools in the identification of individuals in criminal and paternity cases.

The first widespread use of DNA tests involved RFLP (restriction fragment length polymorphism) analysis, a test designed to detect variations in the DNA from different individuals. In the RFLP method, DNA is isolated from a biological specimen (e.g., blood, semen, vaginal swabs) and cut by an enzyme into restriction fragments. The DNA fragments are separated by size into discrete bands in a gel (gel electrophoresis), transferred onto a membrane, and identified using probes (known DNA sequences that are "tagged" with a chemical tracer). The resulting DNA profile is visualized by exposing the membrane to a piece of x-ray film which allows the scientist to determine which specific fragments the probe identified among the thousands in a sample of human DNA. A "match" is made when similar DNA profiles are observed between an evidentiary sample and those from a suspect’s DNA. A determination is then made as to the probability that a person selected at random from a given population would match the evidence sample as well as the suspect. The entire analysis may require from 6 to 10 weeks for completion.

DNA Profiling

Technique, also known as DNA fingerprinting, that allows familial relationships to be established by comparing the characteristic

polymorphic patterns that are obtained when certain regions of genomic DNA are amplified (typically by PCR) and cut with certain restriction

enzymes. In principle, an individual can be identified unambiguously by RFLP (hence the use of RFLP in forensic analysis of blood, hair or

semen). Similarly, if a polymorphism can be identified close to the locus of a genetic defect, it provides a valuable marker for tracing the

inheritance of the defect.

restriction fragment length polymorphism (= RFLP)

Parentage Testing

The matching process for identifying DNA profile patterns which either "exclude" or "include" a person as being the parent of a child is shown in the figure below. In this instance man 1 is excluded from paternity and

man 2 is included as a possible father of the child.

RNA and DNA RNA and DNA VirusesViruses

Viruses• disease-causing agents that can multiply only in

cells • viruses are DNA or RNA enclosed by a

protective coat that enables them to move from one cell to another.

• Viral-infected cells often break open (lyse) and allows viruses access to nearby cells.

• A protein shell (capsid) surrounds the nucleic acid of most viruses. In many viruses the protein capsid is further enclosed by a lipid bilayer membrane that contains proteins.

Viral capsid

The capsids of some viruses, all shown at the same scale. (A) Tomato bushy stunt virus; (B) poliovirus; (C) simian virus 40 (SV40); (D) satellite tobacco necrosis virus.

Acquisition of a viral envelope

The Coats of Viruses

Bacteriophage T4, a large DNA-containing virus that infects E. coli.

Potato virus X, a filamentous plant virus that contains an RNA genome.

Adenovirus, a DNA-containing virus that can infect human cells. The protein capsid forms the outer surface of this virus.

Influenza virus, a large RNA-containing animal virus whose protein capsid is enclosed in a lipid envelope with protruding spikes of viral glycoprotein

Several types of viral genomes

The smallest viruses contain only a few genes and can have an RNA or a DNA genome; the largest viruses contain hundreds of genes and have a double-stranded DNA genome.

T4 bacteriophage chromosome

This schematic shows the positions of the more than 30 genes involved in T4 DNA replication. The genome of bacteriophage T4 consists of 169,000 nucleotide pairs and encodes about 300 different proteins.

The life cycle of the Semliki forest virusThe virus parasitizes the host cell for most of its biosyntheses

The life cycle of a retrovirus

• The retrovirus genome consists of an RNA molecule of about 8500 nucleotides; two such molecules are packaged into each viral particle.

• The enzyme reverse transcriptase first makes a DNA copy of the viral RNA molecule and then a second DNA strand, generating a double-stranded DNA copy of the RNA genome.

• The integration of this DNA double helix into the host chromosome, catalyzed by the viral enzyme integrase, is required for the synthesis of new viral RNA molecules by the host-cell RNA polymerase.

reversetranscription

messenger RNA (mRNA)

transfer RNA (tRNA)

ribosomal RNA (rRNA)

The life cycle of a retrovirus

The AIDS Virus Is a Retrovirus • In 1982 physicians first became aware of a new sexually transmitted disease that

was associated with an unusual form of cancer (Kaposi's sarcoma) and a variety of unusual infections. Because both of these problems reflect a severe deficiency in the immune system - specifically in helper T lymphocytes - the disease was named acquired immune deficiency syndrome (AIDS). By culturing lymphocytes from patients with an early stage of the disease, a retrovirus was isolated that is now known to be the causative agent of AIDS.

• The retrovirus, called human immunodeficiency virus (HIV), enters helper T lymphocytes by first binding to a functionally important plasma membrane protein called CD4. There are two features of HIV that make it especially deadly. First, it eventually kills the helper T cells that it infects rather than living in symbiosis with them, as do most other retroviruses, and helper T cells are vitally important in defending us against infection. Second, the provirus tends to persist in a latent state in the chromosomes of an infected cell without producing virus until it is activated by an unknown rare event; this ability to hide greatly complicates any attempt to treat the infection with antiviral drugs.

• Much current research on AIDS is aimed at understanding the life cycle of HIV. The complete nucleotide sequence of the viral RNA, which encodes nine genes, has been determined. This has made it possible to identify and study each of the proteins that it encodes. The three-dimensional structure of its reverse transcriptase is being used to help design new drugs that inhibit the enzyme.

A map of the HIV genome

The HIV genome is about 9000 nucleotides and contains nine genes. Three of the genes (green) are common to all retroviruses: gag encodes capsid proteins, env encodes envelope proteins, and pol encodes both the reverse transcriptase and the integrase proteins. The HIV genome contains six small genes (in red) plus the three (in green) that are normally required for the retrovirus life cycle.

1. Attachment • CD4-gp120 Interaction • Gp120-Chemokine Receptor Interaction 2. Viral Fusion/Uncoating 3. Reverse Transcription 4. RNaseH Degradation 5. Second Strand Synthesis 6. Migration to Nucleus 7. Integration 8. Latency 9. Early Transcription 10. Late Transcription 11. RNA Processing 12. Protein Synthesis 13. Protein Glycosylation 14. Assembly of Virion 15. Viral Budding 16. Virion Maturation

HIV binds to the CD4 receptor on the host cell. CD4 is present on the surface of many lymphocytes, which are a critical part of the body's immune system. A coreceptor, CXCR4 and/or CCR5, is needed for HIV to enter the cell.

The HIV envelope fuses with the host cell membrane.

The viral capsid and its contents enter the host cell

The RNA HIV genome and the enzyme reverse transcriptase are released in to the host cell

Reverse transcriptase makes a DNA copy of the RNA HIV genome. First, a single-stranded DNA is made

Reverse transcriptase then makes a double-stranded DNA copy of the HIV genome

The enzyme integrase fuses the double-stranded copy of the DNA genome with the host cell genome in the nucleus

mRNA is produced encoding HIV proteins

mRNA is translated to produce HIV-encoded polypeptides, including HIV protease

HIV protease cleaves polypeptides and makes functional HIV proteins

A new HIV particle is assembled at the cell surface and buds off

The HIV virus particle leaves to infect other cells

Human immunodeficiency virus (HIV)leaving an infected T lymphocyte

Preventing and treating AIDS

Vaccines?

Preventing and treating AIDS

Vaccines?Modern vaccines for viral infections often consist of one

or more coat proteins of the virus that are notthemselves infectious, but elicit an immune responsefrom the person receiving the vaccine.

Preventing and treating AIDS

Vaccines?Modern vaccines for viral infections often consist of one

or more coat proteins of the virus that are notthemselves infectious, but elicit an immune responsefrom the person receiving the vaccine.

HIV reverse transcriptase has an error rate of one nucleotideper 2000. This means that the amino acid sequenceof the HIV coat proteins is constantly changing.

Preventing and treating AIDS

Drugs?

What should we target?

Anti-HIV chemotherapy• Antiretroviral Agents Currently Available (generic name/Trade name) • zidovudine/Retrovir (AZT, ZDV) • didanosine/Videx, Videx EC (ddI) • zalcitabine/HIVID (ddC) • stavudine/Zerit (d4T) • lamivudine/Epivir (3TC) • abacavir/Ziagen (ABC) • nevirapine/Viramune (NVP) • delavirdine/Rescriptor (DLV) • efavirenz/Sustiva (EFV) • tenofovir DF/Viread (TDF) • indinavir/Crixivan • ritonavir/Norvir • saquinavir/Invirase, Fortovase • nelfinavir/Viracept • amprenavir/Agenerase • lopinavir/ritonavir, Kaletra • FUZEON (enfuvirtide, ENF or T-20) • Anti-PDI antibodies• T22 – Tyr-5,12,Lys-7]polyphemusin II

Title SlidePharmacogenomicsPharmacogenomics

The study of how an individual's genetic The study of how an individual's genetic inheritance affects the body's response to drugs. inheritance affects the body's response to drugs.

Holds the promise that drugs might one day be Holds the promise that drugs might one day be tailor-made for individuals and adapted to each tailor-made for individuals and adapted to each

person's own genetic makeup. person's own genetic makeup.

Combines traditional pharmaceutical sciences such Combines traditional pharmaceutical sciences such as biochemistry with annotated knowledge of as biochemistry with annotated knowledge of

genes, proteins, and single nucleotide genes, proteins, and single nucleotide polymorphisms. polymorphisms.

Wouldn’t it be wonderful…Wouldn't it be wonderful if you knew exactly what Wouldn't it be wonderful if you knew exactly what

measures you could take to stave off, or even prevent, measures you could take to stave off, or even prevent, the onset of disease? the onset of disease?

Wouldn't it be a relief to know that you are not Wouldn't it be a relief to know that you are not allergic to the drugs your doctor just prescribed? allergic to the drugs your doctor just prescribed?

Wouldn't it be a comfort to know that the treatment Wouldn't it be a comfort to know that the treatment regimen you are undergoing has a good chance of regimen you are undergoing has a good chance of

success because it was designed just for you? success because it was designed just for you?

With the recent harvest of millions of With the recent harvest of millions of SSingle ingle NNucleotide ucleotide PPolymorphisms (SNPs) biomedical olymorphisms (SNPs) biomedical

researchers now believe that such exciting medical researchers now believe that such exciting medical advances are not that far away.advances are not that far away.

Sanger dideoxy nucleotide DNA sequencing uses a DNA polymerase to determine the sequence of DNA

Sanger dideoxy sequencing incorporates dideoxynucleotides, preventing further synthesis of the DNA strand

Automated DNA SequencingAutomated DNA sequencing uses

a mixture of unlabeled deoxy nucleotides and

dideoxy nucleotides labeled witha fluorescent dye. A computer

then determines the identity of thelabeled nucleotide as each DNA

fragment migrates through a polyacrylamide gel.

A Single Nucleotide Polymorphism, or SNP (pronounced "snip") is a small genetic change, or variation, that can occur within a person's DNA sequence. A single base change found in 1% of an ethnically diverse population is defined as a SNP.

An example of a SNP is the alteration of the DNA segment AAGGTTA to ATGGTTA. Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences." SNPs found within a coding sequence are of particular interest to researchers as they are more likely to alter the biological function of a protein. Due to recent advances in technology, coupled with the unique ability of these genetic variations to facilitate gene identification, there has been a recent flurry of SNP discovery and detection.

Although many SNPs do not produce physical changes in people, scientists believe that other SNPs may predispose a person to disease and even influence their response to a drug regimen.

What are SNPs and How are They Found?

Finding single nucleotide changes in the human genome seems like a daunting prospect, but, over the last 20 years, advances in DNA sequencing and recombinant DNA technology have made it possible to do just that. Selected regions of a DNA sequence obtained from multiple individuals who share a common trait are compared.

Many common diseases in humans are caused by genetic variation within genes, some influenced by complex interactions among multiple genes. Therefore, a person may have a genetic predisposition, or the potential to develop a disease based on genes and hereditary factors.

Genetic factors may also determine the severity or progression of disease. Since we do not yet know all of the factors involved in these intricate pathways, researchers have found it difficult to develop screening tests for most diseases and disorders. By studying stretches of DNA that have been found to harbor a SNP associated with a disease trait, researchers may begin to reveal relevant genes associated with a disease.

Needles in a Haystack

Estimate: Most commonly used drugs will only be effectivein 30% to 60% of patients with the same disease. Inaddition, a subset of these patients may suffer side effects.

Adverse drug reactions have been reported to be in the top fiveleading causes of death in the United States, with aneconomic impact up to $100 billion annually.

Severe adverse effects have lead to the withdrawal of blockbusterdrugs Rezulin, Seldane, Redux, and Pondimin.

Bringing a new drug to market is estimated to cost as much as $500 million. Being able to predict a population’s responseto a drug would be invaluable to the pharmaceutical industry.

Why are SNPs important topharmaceutical companies?

SNPs and Drug Interactions

Transporter

Drug

Absorption in the breast

Drug in breast tissue

Metabolism in the liver

Excretion in the kidney

Drug becomes inactive or toxicTransportation in

the blood

Drug in bloodstream

SNP Profiles and Response toDrug Therapy

Does Not Respond to Standard Drug Treatment

Breast Cancer Patients

Individual SNP Profiles Are Sorted

SNP profile A SNP profile B

SNP profile D

SNP profile E SNP profile C

Responds to Standard Drug Treatment

"WE ALREADY know that if we sample tumor tissues from 100 different women, those tissues would have a molecular makeup that would break up into different categories.

In essence, those patients [each] have a different disease, but we just happen to be calling it the same thing--breast cancer," Conway says.

"We think we're going to subdivide diseases. Once we get people with the right disease diagnosis, the disease definitions are going to change from 'You have breast cancer' to 'You have molecular profile A, B, C, or D.' The treatments of those diseases are going to be different."

Cancer is many diseases

SNPs and Cancer

SNPs A SNPs B

SNPs C SNPs D

SNPs May Be the Solution

What Is Variation in the Genome?Common Sequence

Variations

Polymorphism

Deletions

Translocations

Insertions

Chromosome

Variations Causing Latent Changes

Many years laterMany years later

= Variations in DNA that cause latent effects

What Is Variation in the Genome?Common Sequence

Variations

Polymorphism

Deletions

Translocations

Insertions

Chromosome

Variations Causing Latent Changes

Many years laterMany years later

= Variations in DNA that cause latent effects

Sequenom's scientists are interested in changes in the frequency of SNPs as the population ages. "We take advantage of the fact that most human diseases are late-onset. Age is a major risk factor," Cantor says. "If young people are carrying a harmful variation, they're still well, whereas an old person carrying that same variation has a very high chance that he's been made sick or killed by it. You make the prediction that variations that are harmful to health should decline in frequency as a function of age in the healthy population.“

One percent of genes appears to show an age-dependent frequency in SNPs, Cantor says. He suspects that only 200 to 400 genes will be involved in disorders that affect a major population. Finding these genes in healthy people, however, gives no indication of what the diseases actually are. "After we find them in the healthy population, we have to go back and look at biochemically stratified populations or clinically stratified populations," he explains. "The advantage is that instead of having to do all the genes with these tricky populations, we only have to do 200 to 400. We can pay a lot more attention to the details."

Age-dependent Frequency in SNPs

Laboratory Experiment

Isolation of genomic DNA from the bacterium Escherichia coli

(E. coli)

E. coli has a single double-strandedDNA molecule as its genome.

There are 4,639,221 base pairs in the E. coli genome.

Promega Wizard® Genomic DNA Purification Kit

The Wizard Genomic DNA Purification Kit is designed for isolation of DNA from white blood cells, tissue culture cells and animal tissue, plant tissue, yeast, Gram-positive and Gram-negative bacteria.

1.Lyse (break open) the cells and the nuclei. An RNase digestion step may be included at this time. Depending on the DNA isolation method used, RNA will be co-purified with genomic DNA. Spectrophotometric measurements do not differentiate between DNA and RNA, so RNA contamination can lead to overestimation of DNA concentration. Treatment with RNase A will remove contaminating RNA; this can either be incorporated into the purification procedure or performed after the DNA has been purified.

2.Remove the cellular proteins by a salt-precipitation step, which precipitates the proteins but leaves the high molecular weight genomic DNA in solution.

3.Concentration of the genomic DNA followed by desalting by isopropanol precipitation.

Laboratory Experiment

1. Determination of the molar absorptivity of adenosine 5’-monophosphate (AMP)

2. Determination of the concentration of AMP in an aqueous solution

3. Determination of the concentration of DNA in purified E. coli genomic DNA solution

4. Determination of the concentration of DNA in oligonucleotide solutions