the genomics era: a vast resource for educators

74
ASMCUE 2008- “The year genomics bombarded ASMCUE” David J. Baumler Genome Center of Wisconsin [email protected] The Genomics Era: A Vast Resource for Educators #1) If you haven't already, download all materials in the ASMCUE2008 folder at: http://asap.ahabs.wisc.edu/~baumler/ #2) Download Progressive Mauve at http://asap.ahabs.wisc.edu/mauve/download.php (Perna et al. Nature 2001)

Upload: lisandra-bishop

Post on 03-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

ASMCUE 2008- “The year genomics bombarded ASMCUE” David J. Baumler Genome Center of Wisconsin [email protected]. The Genomics Era: A Vast Resource for Educators. (Perna et al. Nature 2001). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Genomics Era: A Vast Resource for Educators

ASMCUE 2008- “The year genomics bombarded ASMCUE”

David J. BaumlerGenome Center of Wisconsin

[email protected]

The Genomics Era: A Vast Resource for Educators

#1) If you haven't already, download all materials in the ASMCUE2008 folder at: http://asap.ahabs.wisc.edu/~baumler/

#2) Download Progressive Mauve at http://asap.ahabs.wisc.edu/mauve/download.php

(Perna et al. Nature 2001)

Page 2: The Genomics Era: A Vast Resource for Educators

Dispel a few myths-I need a supercomputer to run genome alignments

-There are so many sequenced genomes, why do we need more?

-How do I get students excited and to relate to genomics?

-I have been teaching too long to get into genomics

-the more I use computers in teaching, the more things go wrong

ASMCUE 2007 data

Page 3: The Genomics Era: A Vast Resource for Educators

My teaching philosophy:3rd dimension of teaching

-Go beyond 2 dimensions with paper and presentations.

-For topics on genomics, you must get computers in the students hands.

•Look towards the future

-Iphones, small laptops that fit in a ziplock bag, personal communication devices with fold out keyboards and magnifying screens

Introductory Biology-”Bring your wireless-ready laptop to

class day

Small laptops Examples: Eee PC, Classmate, HP mini-Note, Ideapad

-one laptop per child……What about one laptop per college student?

-look into laptop check out at your campus

2005-UW Madison 56% of students own a laptop

Photo by Dave Baumler of UW-Madison Introductory Biology class

Page 4: The Genomics Era: A Vast Resource for Educators

0

10

20

30

40

50

60

70

80

90

100

2003 2005 2007 2009 2011 2013 2015

Year

% o

f co

lleg

es w

ith

wir

eles

s cl

assr

oo

ms

Projection for wireless internet in college classrooms

In 2004, only about a third of classrooms provided wireless Internet ... Wireless networks now cover more than half (51.2%) of

college classrooms. ...

As of 1/26/2007 by the campus computing survey

Page 5: The Genomics Era: A Vast Resource for Educators

Today’s session overview:

Introduction

Module #1) Annotate a gene from a phage genome

-key concepts: using ERIC database, BLAST, Interproscan, biological annotations

Module #2) Conduct genome alignments of phage genomes

-using Mauve to conduct whole genome alignments, familiarize yourself with Mauve

Module #3) Compare genomes from 3 outbreaks of E. coli O157:H7

-identify genomic islands using Mauve & conservation of virulence factors

Module #4) Compare genomes from 5 strains of Yersinia pestis

-identify genomic islands, conservation of virulence factors, analyze mutations with phenotypic consequences due to insertion and/or deletion events and Single nucleotide polymorphisms (SNP’s), and paleomicrobiology

Conclusion

difficulty

Page 6: The Genomics Era: A Vast Resource for Educators

Insect Pathogens /Endosymbionts

-Arsenophonus

-Buchnera

-Sodalis

-Wigglesworthia

-Xenorhabdus

Human Pathogens

-Calymmatobacterium

-Cedecea

-Citrobacter

-Edwardsiella

-Enterobacter

-Escherichia

-Ewingella

-Hafnia

-Klebsiella

-Kluyvera

-Leclercia

-Leminorella

-Moellerella

-Morganella

-Plesiomonas

-Proteus

-Providencia

-Rahnella

-Salmonella

-Serratia

-Shigella

-Tatumella

-Yersinia

-Yokenella

Environmental/

Animals/Industrial

-Alterococcus

-Budvicia

-Buttiauxella

-Obesumbacterium

-Pragia

-Trabulsiella

Phytopathogens/

Plant-associated

-Brenneria

-Dickeya

-Erwinia

-Pantoea

-Pectobacterium

-Phlomobacter

-Sacchararobacter

-Samsonia

The ERIC database houses all of the available genomes of the members of family Enterobacteriaceae, all of which are thought to

have descended from a common ancestor

Boxes, represent organisms with at least one genome sequenced

Ancestor

Page 7: The Genomics Era: A Vast Resource for Educators

OrthologsIf at least two of these criteria are met for the pair of genes in question they are typically assigned as orthologs.

•Percentage identity and alignment percentage are in the typical range (see attached spreadsheet).

•Local genome context, the conserved gene is part of an operon with other genes that are already considered orthologs.

•Larger scale conservation of genomic context, the conserved gene is in the same general genomic context as other orthologs.

•Functional conservation, the conserved gene is predicted or known to perform the same function as the potential ortholog in another genome.

BlastP

BlastP

X Y

YX

Reciprocal Best Blast hits

>60%

>60%

Page 8: The Genomics Era: A Vast Resource for Educators

Enterobacteria cont.

Generated from 180 orthologs (Nicole T. Perna unpublished data)

Page 9: The Genomics Era: A Vast Resource for Educators

ERIC-Enteropathogen Resource Integration Center(http://www.ericbrc.org)

Genomes

Tools & Annotations

Genome Views and Comparisons

Page 10: The Genomics Era: A Vast Resource for Educators

Why Phage? Genomics timeline

Phage

X174 10 genes

1977 1982

Phage

46 ge

nes

1995 1996 1997 1998 2000 2001 2008

Haemop

hilus

influ

enza

1,709

Sacch

arom

yces

cerev

isiae

6,269

E. coli

MG165

5 4,200

Caenor

habdit

is ele

gans

19,000

Droso

philia m

elanog

aster

13,00

0

Human

s ~30

-40,0

00

E. coli

EDL93

3 5,20

0

643 C

omplet

e micr

obial

genomes

& 970 in

progress

Teach annotation with a phage genome

Page 11: The Genomics Era: A Vast Resource for Educators

Structural annotation consists of the identification of genomic elements (e.g. genes).

•Open Reading Frames (ORFs) also called coding sequences (CDSs) must have a start codon and a stop codon

•location of regulatory motifs (such as promoters and ribosome binding sites)

•This step is typically automated using gene prediction software (Automation only finds ~50-90% of the genes)

Annotation step #1: Structural Annotation

Example of a gene - the start codon is green and the stop codon is red

The genetic code – (Courtesy of http://history.nih.gov)

Page 12: The Genomics Era: A Vast Resource for Educators

Functional annotation: consists in attaching biological information to genomic elements.

•biochemical function•involved regulation and interactions•expression•cellular location

Three examples of annotations for one gene:

•Name/synonym: a short “word” used to refer to the gene (Ex. ureC)

•Product: a descriptive protein name (Ex. Urease gamma subunit)

•Function : Describes what the protein does (Ex. Catalyzes the hydrolysis of urea to form ammonia and carbon dioxide)

Annotation step #2

Page 13: The Genomics Era: A Vast Resource for Educators

Tools you will use to annotate today

• #1 ERIC database: this is where you will get the sequences and record your functional annotations.

• #2 BLASTP: this is a tool you will use to find similar sequences in the NCBI database of all publicly available known and predicted proteins

• #3 InterproScan: this is a tool you will use to find similar sequences in a database of protein families (groups of related proteins) and domains (functionally significant subregions of proteins)

Note: For background information about Interproscan and Blast, I recommend the book “Bioinformatics for Dummies”.

Page 14: The Genomics Era: A Vast Resource for Educators

We are going to annotate a phage genome today

What type of genes should we anticipate finding in the phage genome?

•Structural components of a phage

•Phage replication proteins

•Machinery for integration into the host genome

•Hypothetical proteins

You are going to annotate the bacteriophage 933W genome. This phage was found in the genome of E. coli O157:H7 strain EDL933. The phage genome contains the genes stx2A and stx2B that encode the shiga toxin 2 protein, that contributes to disease in humans.

Animation Courtesy of Microbelibrary.org

Page 15: The Genomics Era: A Vast Resource for Educators

Welcome to the Enteropathogen Resource Integration Center.

Using your web browser,

#1) go to http://www.ericbrc.org/

#2) in the upper right portion of the screen click on login

Page 16: The Genomics Era: A Vast Resource for Educators

Click on log on under ERIC user accounts.

Then type in the username and password (case sensitive)

Session #1 username: ASMCUE / password: genome

Session #2 username:ASMCUE2 / password: genomes

click the log on button. Note your class has been given access to a unique version of the genome, in which you and your fellow classmates will be the only people annotating the

phage genome

#1 #2

Page 17: The Genomics Era: A Vast Resource for Educators

Click on Annotations

Then use the pull down bar to select bacteriophage 933W

(the last one on the list), then click the OK button

#1

#2

Page 18: The Genomics Era: A Vast Resource for Educators

Every gene in a genome in the ERIC database has what we call a feature ID, which consists of three capitol letters a dash and seven numbers For example ABC-1234567

Your genome will have a unique 3 letter code and each gene or coding sequence (CDS) will have a unique seven digit number. Choose your gene from the list that corresponds to your birthday and type in the feature ID and click Submit

On the next page, click on the link for the feature ID

#1

#2

Page 19: The Genomics Era: A Vast Resource for Educators

Your webpage should look like this

On the left there is information about your coding sequence and also some links for tools you will be using

On the right are the annotations, this is where you will be adding annotations

Page 20: The Genomics Era: A Vast Resource for Educators

Lets split up the classLeft half of the classroom, use Interproscan to add annotations, refer to slide #8 and proceed through #14 (in the students instructions for adding annotations.ppt file

located in the ASMCUE2008 folder at: http://asap.ahabs.wisc.edu/~baumler/)

Right half of the classroom, use BlastP at NCBI to add annotations, refer to slide #15 and proceed through #21 (in the students instructions for adding annotations file.ppt file located in the ASMCUE2008 folder at: http://asap.ahabs.wisc.edu/~baumler/)

-If there is no good match, it is called a hypothetical protein

-add an annotation for product as hypothetical protein

-use Unpublished Sequence analysis as Evidence

-type in author name, email

-submit to Database

Page 21: The Genomics Era: A Vast Resource for Educators

Once you have completed your annotations for you gene(s), you can view the genome of the phage and see how your fellow classmates are doing by clicking on

Show Feature Context (GaPP)

Page 22: The Genomics Era: A Vast Resource for Educators

A new window will appear in a few seconds,

The gene you are working on is highlighted in blue, and you are visualizing the entire Bacteriophage 933W genome, scroll over each gene (in pink) and you should see the name and the product information provided in the boxes below the genome, also double click any of the genes, and your web-browser will open the annotation page in ERIC and you can view the function annotation, evidence, etc.

Page 23: The Genomics Era: A Vast Resource for Educators

Learning assessment Pre and Post test

0

10

20

30

40

50

60

70

80

90

100

Q1 Q2 Q3 Q4 Q5

Question

Pe

rce

nta

ge

wit

h C

orr

ec

t re

sp

on

se

pre

post

#1. Within a sequenced microbial genome, identification of a gene predicted to encode a protein should contain which of the following characteristics?

 

#2. What percentage of the protein coding genes do you think automated computer approaches applied to a newly sequenced microbial genome will find:

 

#3. What type of biological annotation cannot be assigned to a newly sequenced gene based solely on comparisons to known protein/gene(s)?

 

#4. In a newly sequenced microbial genome, every identified gene produces a protein that is similar to a known protein?

 

#5. Which of these web-based resources are useful to

find biological information about a gene sequence?

“I really enjoyed learning more about bacterial genetics and the tools that are available online for genomic research and gene identification. This is an area of bacteriology that I have little experience in and I think that having experience using these websites will prove valuable as my research continues.” –UW-Madison student in Bacteriology 650

“The concepts of using BLAST and Interproscan are pretty neat, and it is great that anyone can access this information, not just the insider scientists that put it together. Thank you for teaching our class how to use these tools! I doubt I would have ever learned this stuff on my own had you not taught us.” – UW-Madison student in Bacteriology 650

Student Testimonials

P<0.2 P<0.02 P<0.01

Page 24: The Genomics Era: A Vast Resource for Educators

Module #2 Conduct genome alignments of phage genomes

-this module is developed to teach how to use Mauve using enterobacteria phage

-Phage genomes can be aligned using Mauve in a matter of minutes.

-applicable as a teaching tool to decipher the mosaicism of phage genomes.

-comparative studies of 30 mycobacteriophage genomes reveal new insights into the diverse architecture and insight about gene exchange (Hatfull et al. PLoS genetics et al. 2006)

You could align EVERY mycobacteriophage genome using Mauve!!!

-How diverse are enterobacteriophage?

(the following series of slides are Mauve alignments of phage isolated from E. coli, Salmonella spp., Yersinia spp., and Shigella spp.) all alignments are also provided for further inquiry

-Since we just annotated a stx2-containing phage from E. coli O157:H7, we will run alignments with 3 phage genomes

Page 25: The Genomics Era: A Vast Resource for Educators

Mauve: Multiple Genome Aligner

• Able to identify and align collinear regions of multiple genomes even in the presence of rearrangements

• Find and extend seed matches

• Group into locally collinear blocks

• Align intervening regions

(Darling et al. Genome Res. 2004 Jul;14(7):1394-403.)

Page 26: The Genomics Era: A Vast Resource for Educators

Module #2 Understanding phage, the viruses that infect microorganisms, via genome alignments

I recently aligned 56 enterobacterial phage, phage genomes are an ideal training tools for teaching how to set up mauve alignments, in the ASMCUE2008 folder, in module #2 you are provided with ~50 enterobacteriaphage genome files to conduct alignments

Page 27: The Genomics Era: A Vast Resource for Educators

Step #1 copy the folder called 3 phage genomes for ASMCUE workshop, and paste it on the harddrive of your computer (C: drive)

Step #2 from the start menu, in programs select Mauve 2.1.1

Step #3 under the File pull down select Align with progressive Mauve

This new window will appear

#4 click here to choose where to send the output file, find the folder (from Step#1), and double click on the folder

#5 Type in a file name, and click on Save

Page 28: The Genomics Era: A Vast Resource for Educators

Next add the sequences to align

Click on Add sequence

Select the first phage genome and click on Open, then continue with the 2nd and 3rd phage genomes. Then click on Align to start the genome alignment

Page 29: The Genomics Era: A Vast Resource for Educators

When viewing the LCB’s, mauve displays regions that are highly conserved/identical as full color.

Areas that are unique/variable to one genome appear in white, and represent unique islands

Page 30: The Genomics Era: A Vast Resource for Educators

Your tool bar is at the top on the left, the tools you will use are in the View pulldown, and also the buttons

Search for featuresZoom in/out, you

can also hold down the ctrl button and use the arrows on the keyboard

Move left or right, you will find this useful to center a region of interest in the middle of the screen prior to zooming in

Returns the viewer back to home

Page 31: The Genomics Era: A Vast Resource for Educators

Other useful commands in Mauve

Function Key

Zoom in Ctrl+Up

Zoom out Ctrl+Down

Scroll Left Ctrl+Left

Scroll Right Ctrl+Right

Export the current view as Ctrl+E

An image

Page 32: The Genomics Era: A Vast Resource for Educators

Module #3) Dissecting virulence of E. coli O157:H7 using genome alignments

Page 33: The Genomics Era: A Vast Resource for Educators

-determination of the complete E. coli sequence required almost 6 years

-E. coli is the preferred model in biochemical genetics, molecular biology, and biotechnology and its genomic characterization will undoubtedly further research toward a more complete understanding of this important experimental, medical, and industrial organism

(Blattner et al. Science 1997)

The first E. coli genome sequenced was the non-pathogenic E. coli K-12 genome MG1655

Page 34: The Genomics Era: A Vast Resource for Educators

(Perna et al. Nature 2001)

-In 1982 Escherichia coli O157:H7 recognized as a pathogen for human disease

-Also known as EDL933 from the Michigan outbreak in 1982 from ground beef

-shiga toxin producing (STEC)

The first pathogenic E. coli genome sequence was enterohaemorrhagic (EHEC) Escherichia coli O157:H7

strain 933 EDL

Page 35: The Genomics Era: A Vast Resource for Educators

(Hayashi et al. DNA Res. 2001)

-In July 1996, an outbreak of Escherichia coli O157:H7 infection occurred among schoolchildren in Sakai City, Osaka, Japan.

-8,938 schoolchildren sickened, 3 deaths

- We are starting to ask-What genomic differences determine differences in virulence, epidemiology, and fatality?

The completion of the 2nd E. coli O157:H7 (EHEC) sequence strain Sakai

Page 36: The Genomics Era: A Vast Resource for Educators

In 2006 E. coli O157:H7 outbreak from bagged spinach(from CDC)

-multistate outbreak

205 people sickened, 3 deaths

-Produce associated outbreak strains caused higher incidence of hemolytic-uremic syndrome (HUS)

(Manning et al. PNAS 2008)

-genome alignments can be used to find variations

Page 37: The Genomics Era: A Vast Resource for Educators

Currently there are 13 E. coli O157:H7 Genomes sequenced, we will have you focus on three that are all in the

Enteropathogen Resource Integration Center (ERIC) database (www.ericbrc.org)

The three strains you will focus on are:

Escherichia coli EDL933 (EHEC) -1982 ground beef outbreak

Escherichia coli Sakai (EHEC) (also called RIMD) -1996 radish sprout outbreak

Escherichia coli EC4042 (EHEC) –2006 Fresh bagged spinach outbreak

Page 38: The Genomics Era: A Vast Resource for Educators

In your start menu under programs go to Mauve 2.1.1, start up Mauve, notice there is a users guide in pdf form in this folder, this will contain useful information and commands to navigate

Note: your computer may need to update Java, since mauve uses a Java platform for the alignment.

You should see a window for Mauve appear

Page 39: The Genomics Era: A Vast Resource for Educators

Next double click on the 3 O157H7 folder in the ASMCUE2008 folder, it should contain the following 19 files, take the first one (3 O157 alignment), and drag and drop it into the mauve window

It should start to say reading sequences here, and in a few seconds the alignment will appear, note computers with less than 512MB RAM may not be able to open the file

Page 40: The Genomics Era: A Vast Resource for Educators

Your alignment should look like this

Organism name notice the first is EDL933, the second is RIMD(Sakai), and the third is EC4042 (spinach)

Using the up or down arrows, you can switch the position of the genomes

Page 41: The Genomics Era: A Vast Resource for Educators

The colored blocks are called local colinear blocks (LCB’s), and represent regions of the genome that Mauve has identified as conserved, the lines connect the LCB’s, notice that some are in different positions in the other genomes, some are inverted and appear on the bottom strand of the double stranded genome

Top strand

Bottom strand

Page 42: The Genomics Era: A Vast Resource for Educators

Notice, that when you scroll (slowly) over a white region (island) the black boxes pause in the other genomes, then comes back once you have passed over the island and back into conserved regions

Page 43: The Genomics Era: A Vast Resource for Educators

When you move your mouse over a region of one genome it will show a black box and also show the corresponding region (boxes) in the other two genomes, try scrolling left to right on one genome

Page 44: The Genomics Era: A Vast Resource for Educators

If you would like to look at all three LCB’s, even though one is in a different position, scroll over one LCB and click the mouse button

Page 45: The Genomics Era: A Vast Resource for Educators

Lets use the zoom function, press the home button to restore the alignment to original view

Now click on the white island in the top genome, and using the right button bring it to the center of the screen, now start to zoom in multiple times

You will start to see the genes, scroll over one and pause, and a window will pop-up with the product annotation, so here you can view what genes are present in this EDL933 island, and not in the other two

Page 46: The Genomics Era: A Vast Resource for Educators

Now place you mouse over one of the genes, in my example I have iha (irgA homolog adhesion)

Click your mouse once on the gene, and a window will pop-up, scroll down and select View CDS iha in ERICdb

This will open the page in the ERIC database for that gene, containing all of the annotations, you can look to see if it is involved in virulence

Page 47: The Genomics Era: A Vast Resource for Educators

Lets use the search feature

#1) Click on the search feature

#2) Choose a genome (EDL933)

#3) Type in a gene name (stx2A)

#4) Click on search

Page 48: The Genomics Era: A Vast Resource for Educators

Notice that it has found the stx2A gene (highlighted in blue), and also in the RIMD strain. Just because it isn't aligned in the EC4042 strain does not mean it isn't there, if you look to the right in the EC4042 genome, you will find it

Stx2A

Page 49: The Genomics Era: A Vast Resource for Educators

One last feature you can use in Mauve To find an island that is in 2 out of 3 strains you will use the backbone view

Press the home button first

Then go to the View pull down select color scheme then backbone color

Page 50: The Genomics Era: A Vast Resource for Educators

Your alignment should look like this in backbone color, regions in all three appear in light purple color, there will be regions that are different colors that will correspond to 2 out of 3 genomes (you may have to zoom in a bit to see these regions

Regions in only EDL933 and RIMD appear olive green

Regions in only EDL933 and EC4042 appear maroon

Regions in only RIMD and EC4042 appear tan/brown

This is how you identify islands unique to 2/3 strains

Page 51: The Genomics Era: A Vast Resource for Educators

Learning assessment results Module #3

Individual projects: How did they do? -scores range from 16-20/20 avg. 18.5#1) (5 pts) Run a blast analysis with your virulence gene against the other two strains and provide the results of the % identity in a list or table. Is the gene (and the corresponding protein) conserved in all 3 genomes, are they all the same length? Are there more than one copy? Are they present in the Mauve genome alignment in the 3 genomes, provide the coordinate positions or create an image to include? #2) (5 pts) How is this gene involved in virulence? Briefly summarize the supporting evidence by clicking the link from the ERIC database subsystem virulence or putative virulence factor and reading the evidence. #3 (5 pts) Is this gene or a homolog found in other Enterobacteria? (hint run a blast in ERIC against all other organisms) Is this gene or a homolog found in other microorganisms? (hint run a blast search at NCBI against all bacteria and archaea. Briefly provide the five best blast “hits” with % identity). #4 (5 pts) Using mauve, identify a unique island in one strain and briefly summarize the predicted products that it contains (provide coordinates or an image). Identify a region that is unique to two strains and briefly summarize the predicted products (provide coordinates or an image). Overall based on your analysis of your two identified regions, do you think that they play a role in in virulence and evolution of E. coli O157:H7 genomes.? How important do you think phage are in variation of the genomes?

Student testimonials:

“I think it was an approach that is valuable because we learn about some of these virulence factors they find in a strain or how two strains are similar to each other, but we don’t see how that information is found. It gives a better understanding of what you can learn by looking at genomes and the comparison of different genomes. I think it would be easier to follow and do the assignment if you provide some details in your slides or a handout on exactly what you click on to do what is needed.”

“I know you wanted constructive criticism, but I don't actually have any. Once things were explained, i had a really easy time doing things. I actually don't like genetics that much, I usually find it kind of boring, but it was kind of fun working with Mauve. It's cool being able to do all of that stuff!”

Page 52: The Genomics Era: A Vast Resource for Educators

Using genomics to track the dissemination of

Yersinia pestis strains

Courtesy of www.cdc.gov

Deng et al. 2002 J. Bacteriol. 184:16 4601-4611

Page 53: The Genomics Era: A Vast Resource for Educators

Transmission cycle of Plague

Page 54: The Genomics Era: A Vast Resource for Educators

Historic 3 pandemics of plague

-pandemic: is defined as an epidemic that spreads throughout the human population across a large region such as a continent or worldwide

-1st pandemic ~550 A.D. confined to mainly Africa and some parts of the middle east

-2nd pandemic originated in Central Asia and spread via trading routes into Europe (Killed ~30% of Europe population)

-3rd pandemic started in 1850’s in China’s Yunnan providence confined mainly to Asia

Courtesy of edsitement.neh.gov

Page 55: The Genomics Era: A Vast Resource for Educators

Older methods for comparison of two genomes of Yersinia pestis CO92 & KIM were not interactive

Parkhill et al. 2001 Nature 413, 523-527 Deng et al. 2002 J. Bacteriol. 184:16 4601-4611

FIG. 2. Comparison of KIM and CO92 at the DNA level. The outer circles show the CO92 C-G skew. The second circle shows CO92 IS elements: IS100 (red), IS1541A (blue), IS285 (green), and IS1661 (yellow); short ticks represent partial IS elements. The third circle shows CO92 rRNA operons. The fourth circle shows the CO92 genome in 27 blocks (numbered according to KIM genome order), regions that are conserved by both locations and orientations (red), a single intrareplichore inversion region (yellow), multiple-inversion regions (various blues), and genome-specific sequences (green). The inner four circles show KIM rRNA operons, the KIM genome in blocks, KIM IS elements, and KIM C-G skew. Colors are coded as for CO92. (Deng et al. 2002)

Page 56: The Genomics Era: A Vast Resource for Educators

As of 05/2008 there are 7 complete and 14 Y. pestis draft genomes

Traditionally the strains are classified as serovars (Antiqua, Mediaevalis, Orientalis, and other) based on the following phenotypic characteristics:

-Antiqua = East Africa: (glycerol positive, arabinose positive, and nitrate positive)

-Mediaevalis = Central Asia: (glycerol positive, arabinose positive, and nitrate negative)

-Orientalis Central Asia (glycerol negative, arabinose positive, and nitrate positive)

 -other (ie Microtus, Pestoides) not consistent for these phenotypes

Page 57: The Genomics Era: A Vast Resource for Educators

Partial view of the grave in Dreux investigated in this work, which illustrates anthropologic features of a mass grave suitable for paleomicrobiology research. (courtesy of www.cdc.gov)

Paleomicrobiology

-the prefix paleo comes from the Greek work palaios meaning “ancient”

-bacterial colonization of dental pulp can occur during bacteremia

-Bacteremia (also known as plague septicaemia with Y. pestis) is the presence of bacteria in the blood Courtesy of www.nidcr.nih.gov

Page 58: The Genomics Era: A Vast Resource for Educators

Figure 1   The original protocol developed in our study allows recovering the dental pulp and minimizes the risk of laboratory-acquired contamination of the specimen. The tooth was encasted into sterile resin (1a) ; the apex was sterily sectioned (1b) to give access to the canal system (1c) ; solutions were injected (1d) ; after incubation, the tooth was put upside down into sterile tube (1e) and centrifuged (1f).

Tran-Hung et al. PLoS ONE v.2(10); 2007

Extraction of bacterial DNA from Dental pulp

-Some historians believed that a flu-like virus and not Y. pestis was responsible for the 1st and 2nd pandemics

-DNA detected in dental pulp confirm that Y. pestis was the cause

-Which serovar(s) are most similar to the Y. pestis strain(s) from the dental pulp from the corpses?

Page 59: The Genomics Era: A Vast Resource for Educators

Use of genomic tools to study Y. pestis

Concepts in this module that you will address:

#1) mutations that affect the production of a full functional gene product that has phenotypic consequences (insertions, deletions, single nucleotide polymorphisms [SNP’s]) to study the genes glpD, napA, and araC

#2) Paleomicrobiology investigation, determine which serovar(s) have the most similar matching genes compared to the amplified sequence from the dental pulp of 3 corpses.

#3) use of genome alignments; determine an island that is unique to the 4 genomes that infect humans and is absent in Y. pestis strain 91001

#4) determine the conservation of a virulence factor in the 5 strains in the genome alignment. Determine if it is a full functional product in strain 91001.

Page 60: The Genomics Era: A Vast Resource for Educators

Next double click on the uncompressed Yersinia pestis alignment 5 genome folder, it should contain the following 29 files, take the one (yersinia_pestis_alignment_5genomes), and drag and drop it into the mauve window

It should start to say reading sequences here, and in a few seconds the alignment will appear, note computers with less than 512MB RAM may not be able to open the file

Page 61: The Genomics Era: A Vast Resource for Educators

Your alignment should look like this

Organism name notice the first is CO92, the second is KIM,the third is 91001, the fourth is Antiqua, and the fifth is Nepal516

Using the up or down arrows, you can switch the position of the genomes

Page 62: The Genomics Era: A Vast Resource for Educators

You may find it easier to view the 5 genome alignment without the connecting lines:

on your keyboard press Shift L (pressing this again makes them reappear)

Page 63: The Genomics Era: A Vast Resource for Educators

Now place your mouse over one of the genes,

Click your mouse once on a gene, and a window will pop-up, scroll down and select View CDS in ERICdb

This will open the page in the ERIC database for that gene, containing all of the annotations, you can look to see what is known about it and/or if it is involved in virulence (note you may be prompted to a log-in screen, click on the button that says “Enter ASAP”)

Page 64: The Genomics Era: A Vast Resource for Educators

Lets use the search feature to find the genes glpD, napA, and araC

#1) Click on the search feature

#2) Choose a genome or search all of the genomes

#3) Type in a gene name (glpD)

#4) Click on search

Page 65: The Genomics Era: A Vast Resource for Educators

Notice that it has found the glpD gene (highlighted in blue), and also a corresponding gene in each genome. You need to determine which of the five CDS’s produce the full-length functional protein

Method #1: click on each gene and go to the view CDS in ERICdb, look at the length and if any are labeled as pseudogenes. If so look for a note that describes why it is thought to be a pseudogene

Page 66: The Genomics Era: A Vast Resource for Educators

Identifying mutations in glpD, napA, and araC cont.

Method #2: from the feature page in ERIC

Scroll down to the feature context part of the page

This is a list of all features that are neighboring your gene in the genome, notice some are upstream, downstream, or contained within

Notice that contained within your glpD gene there are polymorphic sites (otherwise known as SNP’s)

For SNP analysis, you will use a new tool called “Snippy”

Page 67: The Genomics Era: A Vast Resource for Educators

In a new tab or web browser window go to http://asap.ahabs.wisc.edu/~cabot/aep/snippy.php

It should look like this:

Highlight and copy all feature ID’s for polymorphic sites from glpD and paste them into here and click submit

feature ID’s

Page 68: The Genomics Era: A Vast Resource for Educators

In the middle of each region you will see the polymorphic site (in this case capitol G’s) and the corresponding base in each genome, note you are interested in variations in YPKIM, YPCO92, YP91001, YPNepal, and YpAntiqua.

-in this case there is no difference in these 5 genomes in this analysis, scroll down and search the remaining polymorphic sites and see if there is any difference in the various polymorphic sites in the 5 genomes, if not it probably is a larger deletion or insertion event

Page 69: The Genomics Era: A Vast Resource for Educators

In your SNP analysis, you want to look for SNP’s that cause a change in the amino acid that it encodes for. In some cases the change results in a premature stop-codon, which may generate a truncated non-functional protein

#1) note Snippy shows you if the SNP variation results in a amino acid change, in this case A (Alanine) to T (Threonine)

#2) In this second SNP, the change resulted in a stop codon

Page 70: The Genomics Era: A Vast Resource for Educators

Using the DNA sequence obtained from the dental pulp from three corpses (found in the file called Ypestis corpse and CA88-4125YPE genes.doc), conduct a BlastN search within the ERIC database with each sequence against the 91001,Nepal, Kim, Antiqua, and CO92 genomes. For each of the three corpses, which serovar is most similar to the strains that caused the 1st and 2nd pandemics?

From the ERIC home page you can select to run a Blast search here

(http://www.ericbrc.org/)

Page 71: The Genomics Era: A Vast Resource for Educators

Paste the first nucleotide sequence from corpse #1

Select entire genomes

Select the genomes to query, hold down the Ctrl key and select Y . pestis genomes 91001, Antiqua, CO92, KIM, and Nepal

Finally click on the Submit Query button, repeat with the other two corpses sequences

Page 72: The Genomics Era: A Vast Resource for Educators

Next repeat the BlastN process using the gene sequences from a known North American ancestor (Y. pestis CA88-4125/YPE) for glpD, napA, and araC. Of the 5 genomes (91001, Antiqua, CO92, KIM, and Nepal) representing the three serovars, which is most similar to the known North American ancestor?

Based on your analysis did Y. pestis arrive in North America via shipping routes over the Atlantic or Pacific?

Atlantic?

(Serovar Antiqua of African origin)

Pacific?

Serovar Orientalis or Mediaevalis of Asian origin

Courtesy of education.usgs.gov

Page 73: The Genomics Era: A Vast Resource for Educators

Your alignment should look like this in backbone color, regions in all five appear in light purple color, there will be regions that are different colors that will correspond to 2, 3, 4 out of 5 genomes (you may have to zoom in a bit to see these regions)

Look for a region in the lightest blue color that is present in CO92, KIM, Antiqua, and Nepal, but absent in the 91001 strain. Analyze the contents and determine if any of the genes may contribute to human infection of Y. pestis.

Page 74: The Genomics Era: A Vast Resource for Educators

Thanks for your timeCollaborators:

Dr. Kai F. (Billy) Hung (UW-Madison/assistant Prof. At Eastern Illinois University Fall 2008)

Dr. Amy C. Wong (UW-Madison)

Dr. Lois Banta (Williams College)

Mentors:

Dr. Nicole Perna (UW-Madison)

Dr. Charles Kaspar (UW-Madison)

Dr. Jeffrey Byrd (St. Mary’s College)

Dr. Bob Kadner and the ASM Summer Institute

Thank you: everyone on the ERIC database team (especially Guy Plunkett III for setting up module #1 & Eric Cabot for making Snippy) and all of the members of the Perna Genome Evolution Laboratory

Funding: This project has been funded with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human services, under contract No. HHSN266200400040C