high throughput sequence analysis reveals hitherto unreported recombination in the genus norovirus

8
High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus Graham J. Etherington a,b, * , Jo Dicks b , Ian N. Roberts a a Department of Food Safety Science, Institute of Food Research, Norwich Research Park, Colney Lane, Norwich NR4 7UA, UK b Computational Biology Group, John Innes Centre, Norwich Research Park, Colney Lane, Norwich NR4 7HA, UK Received 2 June 2005; returned to author for revision 17 September 2005; accepted 23 September 2005 Available online 2 November 2005 Abstract Viruses of the Norovirus genus (Caliciviridae family) are a major cause of human gastroenteritis. In some viruses, recombination is an important evolutionary process and therefore we should try to discover the quantity and characteristics of such events in Noroviruses. In order to identify recombination events, multiple sequence alignments were assembled from publicly available strains, and were tested using RAT, a recently developed software tool. Strains identified by RAT as putative recombinants were tested further, using a phylogenetic approach, the LARD software, and a Monte Carlo method, to gain additional support for their status. The identification of two previously described recombinants, WUG1 and Snow Mountain, was made. Furthermore, three instances of hitherto unreported recombination implicating Norovirus strains MD 145-12, Gifu’96 and Saitama U4 were found, with good statistical support for the latter two of these cases. Lordsdale-like viruses were highlighted as major contributors to recombination events during Norovirus evolution. Finally, the relevance of recombinants to the worldwide transmission of Norovirus is discussed. D 2005 Elsevier Inc. All rights reserved. Keywords: Calicivirus ; Norovirus ; Norwalk virus; Recombination; Gastroenteritis; Recombination analysis tool Introduction The Caliciviridae are a family of positive-sense, single- stranded RNA viruses that comprise the genera Vesivirus , Lagovirus , Norovirus , and Sapovirus and infect a broad host range of animals that include reptiles, cetaceans, cattle, lagomorphs (rabbits and hares), pigs, cats, skunks, chimpan- zees, and humans (Van Regenmortel et al., 2000). They have a genome of approximately 7.5 kilobases (kb), usually containing 3 open reading frames (ORFs). ORF1 encodes the non-structural proteins, ORF2 encodes the capsid protein, and ORF3 encodes a minor structural protein. In some Caliciviruses, ORF1 and ORF2 are fused to form a contiguous polyprotein (Clarke and Lambden, 1997). Calici- viruses in the genera Norovirus and Sapovirus are the major cause of viral gastroenteritis in humans (Koopmans and Duizer, 2004) with symptoms that may include nausea, diarrhoea, vomiting, abdominal cramping, fever, and general malaise (Hardy and Estes, 1996; Van Regenmortel et al., 2000). Norovirus was found to be the most common cause of intestinal disease in the UK, infecting between 20 and 160 people per month, per 100,000 population (FSA, 2000). In the USA, Norovirus is the most common cause of gastroenteritis, with an estimated 23 million cases annually (CDC, 2003). On the basis of sequence variation, the Noroviruses are represented by at least three genogroups (Green et al., 1994; Oliver et al., 2003). Several studies have indicated within- genogroup recombination in Noroviruses to give rise to hybrid strains capable of causing outbreaks in human populations. For example, the strain Arg320 was suggested to be a recombinant of Lordsdale virus and Mexico virus (Jiang et al., 1999). Phylogenetic analysis showed that Norovirus strain Seacroft/90/UK was a recombinant, with sequence from Leeds/90/UK occurring in ORF1, and sequence from another, unknown, strain occurring in ORF2 (Vinje et al., 2000). In the same study, Wortley/90/UK was 0042-6822/$ - see front matter D 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.virol.2005.09.051 * Corresponding author. Department of Food Safety Science, Institute of Food Research, Norwich Research Park, Colney Lane, Norwich NR4 7UA, UK. Fax: +44 1603 450595. E-mail address: [email protected] (G.J. Etherington). Virology 345 (2006) 88 – 95 www.elsevier.com/locate/yviro

Upload: graham-j-etherington

Post on 13-Sep-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus

lsevier.com/locate/yviro

Virology 345 (20

High throughput sequence analysis reveals hitherto unreported

recombination in the genus Norovirus

Graham J. Etherington a,b,*, Jo Dicks b, Ian N. Roberts a

a Department of Food Safety Science, Institute of Food Research, Norwich Research Park, Colney Lane, Norwich NR4 7UA, UKb Computational Biology Group, John Innes Centre, Norwich Research Park, Colney Lane, Norwich NR4 7HA, UK

Received 2 June 2005; returned to author for revision 17 September 2005; accepted 23 September 2005

Available online 2 November 2005

Abstract

Viruses of the Norovirus genus (Caliciviridae family) are a major cause of human gastroenteritis. In some viruses, recombination is an

important evolutionary process and therefore we should try to discover the quantity and characteristics of such events in Noroviruses. In order to

identify recombination events, multiple sequence alignments were assembled from publicly available strains, and were tested using RAT, a

recently developed software tool. Strains identified by RAT as putative recombinants were tested further, using a phylogenetic approach, the

LARD software, and a Monte Carlo method, to gain additional support for their status. The identification of two previously described

recombinants, WUG1 and Snow Mountain, was made. Furthermore, three instances of hitherto unreported recombination implicating Norovirus

strains MD 145-12, Gifu’96 and Saitama U4 were found, with good statistical support for the latter two of these cases. Lordsdale-like viruses were

highlighted as major contributors to recombination events during Norovirus evolution. Finally, the relevance of recombinants to the worldwide

transmission of Norovirus is discussed.

D 2005 Elsevier Inc. All rights reserved.

Keywords: Calicivirus; Norovirus; Norwalk virus; Recombination; Gastroenteritis; Recombination analysis tool

Introduction

The Caliciviridae are a family of positive-sense, single-

stranded RNA viruses that comprise the genera Vesivirus,

Lagovirus, Norovirus, and Sapovirus and infect a broad host

range of animals that include reptiles, cetaceans, cattle,

lagomorphs (rabbits and hares), pigs, cats, skunks, chimpan-

zees, and humans (Van Regenmortel et al., 2000). They have

a genome of approximately 7.5 kilobases (kb), usually

containing 3 open reading frames (ORFs). ORF1 encodes

the non-structural proteins, ORF2 encodes the capsid protein,

and ORF3 encodes a minor structural protein. In some

Caliciviruses, ORF1 and ORF2 are fused to form a

contiguous polyprotein (Clarke and Lambden, 1997). Calici-

viruses in the genera Norovirus and Sapovirus are the major

cause of viral gastroenteritis in humans (Koopmans and

0042-6822/$ - see front matter D 2005 Elsevier Inc. All rights reserved.

doi:10.1016/j.virol.2005.09.051

* Corresponding author. Department of Food Safety Science, Institute of

Food Research, Norwich Research Park, Colney Lane, Norwich NR4 7UA,

UK. Fax: +44 1603 450595.

E-mail address: [email protected] (G.J. Etherington).

Duizer, 2004) with symptoms that may include nausea,

diarrhoea, vomiting, abdominal cramping, fever, and general

malaise (Hardy and Estes, 1996; Van Regenmortel et al.,

2000). Norovirus was found to be the most common cause

of intestinal disease in the UK, infecting between 20 and 160

people per month, per 100,000 population (FSA, 2000). In

the USA, Norovirus is the most common cause of

gastroenteritis, with an estimated 23 million cases annually

(CDC, 2003).

On the basis of sequence variation, the Noroviruses are

represented by at least three genogroups (Green et al., 1994;

Oliver et al., 2003). Several studies have indicated within-

genogroup recombination in Noroviruses to give rise to

hybrid strains capable of causing outbreaks in human

populations. For example, the strain Arg320 was suggested

to be a recombinant of Lordsdale virus and Mexico virus

(Jiang et al., 1999). Phylogenetic analysis showed that

Norovirus strain Seacroft/90/UK was a recombinant, with

sequence from Leeds/90/UK occurring in ORF1, and

sequence from another, unknown, strain occurring in ORF2

(Vinje et al., 2000). In the same study, Wortley/90/UK was

06) 88 – 95

www.e

Page 2: High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus

G.J. Etherington et al. / Virology 345 (2006) 88–95 89

also demonstrated to be a recombinant of Lordsdale virus in

ORF1 and an unknown strain in ORF2. Phylogenetic and

distance analysis of Norovirus Mc37, isolated in Thailand

(Hansman et al., 2004), showed it to be a recombinant, with

the Saitama U1 virus, isolated in Japan, as a potential parent

of the ORF1 region. Using a similar method, the genomes of

both Norovirus WUG1 and Saitama U1 were also found to

be recombinants of Southampton virus and BS5, and of

Lordsdale virus and an unknown virus, respectively

(Katayama et al., 2002). Also, Snow Mountain virus was

found to be a recombinant of the Melksham and Hawaii

viruses (Lochridge and Hardy, 2003). More recently, Bo/

Thirsk10/00/UK was shown to be a recombinant of Bo/

Newbury2/76/UK and Bo/Jena/80/DE, the first known

recombinant of a genogroup 3 Norovirus (Oliver et al.,

2004). Finally, working only on capsid sequences (ORF2),

seven recombinants were found within a number of

genogroups (Rohayem et al., 2005).

In this paper, we use the Recombination Analysis Tool

(RAT) (Etherington et al., 2005) to examine several Norovirus

viral sequences for evidence of recombination. The putative

recombinants discovered by RAT may then be tested

individually for additional statistical evidence of a recombi-

nation event. Here, we chose to analyse our putative

recombinants using firstly a phylogenetic approach and

secondly the LARD software (Holmes et al., 1999) together

with a Monte Carlo procedure. In addition to finding two

known recombinants, we discover three novel examples where

recombination appears to have played a key role in Norovirus

evolution. We conclude by discussing the relevance of

Norovirus recombination to the worldwide transmission of

the viruses.

Fig. 1. An example of the RAT output for a previously identified recombinant.

Southampton virus and Norwalk BS5 from that of the test sequence, WUG1 in thi

Results

Discovery of putative recombinants

Following alignment of the 124 Norovirus sequences

available at the time of the analysis, two major homologous

regions were found. The first consisted of 74 sequences of 770

nucleotides and the second of 46 sequences of 2360 nucleo-

tides. All of these regions spanned the junction of ORF1 and

ORF2. Together, these regions represent the possibility of

240,012 potential recombinants (and 256,977,828 recombina-

tion events, considering the positional information). Initial

screens of these two regions, in addition to the complete

genome alignments, using RAT revealed five putative recom-

binant Noroviruses. In four cases, two parental sequences were

observed, while in the remaining case, only one parental

sequence was thought to be present in the data set. The latter is

likely to be a result of the second parent being an as yet

unsequenced strain or one currently unavailable within public

databases.

Our analysis identified two of the eight previously

discovered recombinants described in the Introduction (Figs.

1 and 2). The first of these (Fig. 1) confirmed the finding that

WUG1 is a recombinant of Southampton virus and Norovirus

strain BS5. In the case of Snow Mountain virus (Fig. 2), our

findings were at odds slightly with that of the authors of the

previous work (Lochridge and Hardy, 2003), by suggesting that

this virus was a recombinant of Melksham and Lordsdale

viruses and not of Melksham and Hawaii viruses. Separate

distance analysis of the areas of sequence in which the

recombination event occurred showed that the implicated

region in Hawaii virus is slightly more closely related to the

The lines on the graph represent the genetic distances of the sequences of

s case.

Page 3: High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus

Fig. 2. RAT output for Snow Mountain virus, showing the similarity of its sequence to the parental sequences of Norovirus strain Melksham and Lordsdale virus.

Previous work by other authors has suggested that Snow Mountain virus is a recombinant between Melksham and Hawaii virus and our own distance analysis of

those viruses suggests that this may be true.

G.J. Etherington et al. / Virology 345 (2006) 88–9590

Snow Mountain sequence than the similar region in Lordsdale

virus (0.850988701 for Hawaii and 0.837570621 for Lords-

dale). Failure to identify five of the remaining six cases

described in the Introduction is due to lack of sequence data

rather than a failure in the algorithmic approach. For the

remaining case, RAT did suggest that Saitama U1 was a

recombinant of Lordsdale virus and an unknown virus but as

Fig. 3. RAT output for Norovirus MD 145-12. Shown here are the closely related

Norovirus virus Gifu’96 and Norovirus virus Saitama U1, and the rather distantly rel

Gifu’96 and Saitama U1 possess high similarity to MD 145-12 and Lordsdale virus

final 2 kb.

we were unable to identify a second parental strain we will not

analyse this case further.

In addition to identifying these previously described

recombinants, we also found evidence of three hitherto

unreported putative Norovirus recombinants. In the first of

these cases, the RAT output suggested that the sequence of MD

145-12 might have undergone recombination (Fig. 3). This

Lordsdale virus, the two closely related and potentially parental sequences of

ated sequences Saitama U3, U4, U16, U17, and U25 (Saitama cluster). Note that

in the first 5 kb of the sequence and subsequently to the Saitama cluster in the

Page 4: High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus

G.J. Etherington et al. / Virology 345 (2006) 88–95 91

sequence shows evidence of having Gifu’96 or Saitama U1 as

one its parental sequences. Unfortunately, the identity of the

second parental sequence remains to be determined. Both

Gifu’96 and its highly homologous isolate Saitama U1 show

similarity to MD 145-12 in the first ¨5 kb but subsequently

show a large drop in sequence similarity and become distantly

related after this point. A BLAST homology search (Altschul et

al., 1990) was carried out using a 3 kb stretch of the 3V end of

MD 145-12 in an attempt to identify GenBank sequences with

high sequence similarity to the final 2 kb of the 3V end, but lowsequence similarity to the 1 kb upstream region. No such

sequences were found (data not shown). Restricting the area of

interest in the main RAT interface and Fzooming in_ showed theputative recombination crossover point to be within the first 20

nucleotides of the capsid sequence (ORF2). As we were unable

to identify two parental strains in this case, no further statistical

testing was carried out on this strain.

The second Norovirus found to demonstrate putative

recombination, Gifu’96 (Fig. 4), had Norovirus strain Vietnam

026 and Hawaii virus as its parental sequences, with the

recombination most likely occurring between ORF1 and

ORF2. Finally, Norwalk isolate Saitama U4 was highlighted

as a possible recombinant of isolate Saitama U17 and

Gwynedd 273 (Fig. 5). The area examined in this case is the

final 700 nucleotides of ORF1 and the full length of ORF2 and

ORF3. A switch in similarity occurs around the ORF1–ORF2

join between the two parental sequences. Unfortunately, the

area further upstream cannot be examined due to the absence of

sequence information for this region in Gwynedd 273.

Phylogenetic analysis of putative recombinants

For each putative recombinant discovered by RAT, a

maximum likelihood phylogenetic tree was obtained on each

Fig. 4. RAT output for the whole genome alignment of Norovirus strain Gifu’96, show

that of Gifu’96. The putative recombination crossover point can be seen at around

side of the recombination breakpoint (Figs. 6, 7, 8, and 9),

using the fastDNAml software (see Methods). In every case,

the recombinant shows a close phylogenetic relationship to one

of its parental strains before the recombination breakpoint and

to the other parental strain after this point. In the case of Snow

Mountain virus (Figs. 2 and 7), it was found that this virus was

slightly more closely related to Hawaii virus than to Lordsdale

virus (as suggested by Lochridge and Hardy, 2003 and our

further distance-based analysis) before the recombination

breakpoint.

Statistical analysis of putative recombinants

The two previously discovered recombinants, WUG1 and

Snow Mountain virus, and the two newly found putative

recombinants, Gifu’96 and Saitama U4, were analysed further

along with their parental strains. MD 145-12 was not analysed,

as two parents could not be found. For our four putative

recombinants, 500 log likelihood ratios were simulated using

the SeqGen and LARD software tools, representing the

difference in support for the hypothesis of no recombination

versus that of recombination (see Methods for full details). A

table showing the relationship of the real log likelihood ratios

for the putative recombinants to the simulated distributions is

given (Table 1). As is apparent from Table 1, the values for the

putative recombinant data sets fall well outside the range of the

simulated values in each of the four cases, lending statistical

support to the alternative hypothesis of recombination.

Discussion

Using a computational approach, two previously identified

Norovirus recombinants and three novel recombinants were

discovered. In all three of the newly discovered recombi-

ing the similarity of the parental sequences of Hawaii virus and Vietnam 026 to

the 5310 nucleotide position.

Page 5: High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus

Fig. 5. RAT output for Norwalk isolate Saitama U4 virus, showing the parental sequences of Saitama U17 and Gwynedd 273. The availability of longer sequences

that include more information of the upstream area of these Noroviruses may provide better support for recombination.

Fig. 6. Maximum Likelihood (ML) trees showing reported recombinant

Norwalk WUG1 and its phylogenetic relationship to the parental strains BS5

and Southampton virus both (a) before the recombination crossover point and

(b) after this point.

G.J. Etherington et al. / Virology 345 (2006) 88–9592

nants, the recombination breakpoint was found to lie between

ORF1 and ORF2. As at least six of the 11 Norovirus

recombinants of which we are aware (not counting the seven

recently discovered recombinants in the capsid sequences;

Rohayem et al., 2005) possess this breakpoint, the Norovirus

genome appears to have a Fhot spot_ for recombination in a

region close to the end of ORF 1 (non-structural polyprotein)

and the start of ORF 2 (capsid protein). This phenomenon is

particularly noticeable in Lordsdale virus (and its close

relatives), which appears from earlier work (Hansman et al.,

2004; Jiang et al., 1999; Katayama et al., 2002) and from the

results presented here to have a history of frequent

recombination. It is also striking that recombinant Noro-

viruses share sequences with isolates from geographically

separate locations. The accession information provided by

GenBank indicates putative recombination to involve

sequences from very distant countries. For example, a

previously identified recombinant sequence isolated in Japan

(WUG1) has parental sequences from viruses (or at least

close relatives of viruses) isolated in the UK (Southampton

virus) and in Germany (Norwalk BS5). Furthermore, a

recombinant sequence isolated in California, USA (Snow

Mountain virus) has its parental sequences originating in the

UK (Norovirus strain Melksham) and in Hawaii (Hawaii

virus). In the new results presented in this paper, a

recombinant isolated in Maryland, USA (MD 145-12) has

a homologous sequence with a UK isolate (Lordsdale virus)

and has a parental sequence originating in Japan (Saitama

U1 or Gifu’96). Similarly, a recombinant sequence isolated

in Japan (Gifu’96) has parental sequences originating in

Hawaii (Hawaii virus) and in Vietnam (Vietnam 026). We

have also found a recombinant sequence isolated in Japan

(Saitama U4) to have parental sequences originating in Japan

(Saitama U17) and in the USA (Gwynedd 273).

Page 6: High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus

Fig. 7. ML trees showing reported recombinant Snow Mountain virus and its

phylogenetic relationship to the parental strains Hawaii or Lordsdale viruses

and Melksham virus both (a) before the recombination crossover point and (b)

after this point. As can be seen from this analysis, Hawaii virus shows a slightly

closer phylogenetic relationship to Snow Mountain virus than to Lordsdale

virus.

Fig. 8. ML trees showing reported recombinant Gifu’96 and its phylogenetic

relationship to the parental strains Hawaii virus and Vietnam026 both (a) before

the recombination crossover point and (b) after this point.

G.J. Etherington et al. / Virology 345 (2006) 88–95 93

Given the rapid rate of RNA virus evolution, the putative

parental sequences identified may not necessarily be the

parental sequences, but rather descendants or relatives of those

sequences. It is therefore possible that there are extant viruses

that are more closely related to the original parental sequences

and which would show greater similarity to the recombinant in

their zone of recombination if sequence data for these viruses

were available. Absence of sequence data may be due to failure

to isolate from outbreaks, failure to cause outbreaks, or

convergent evolution. Similarly, it is possible that ancestral

recombinant viruses were more closely related to the parental

sequences found by RAT, but sequence data for these

recombinants were never obtained from outbreaks.

Overall, our results are consistent with an emerging pattern

indicating that mosaic genomes and recombination events are

far more common than hitherto reported (Magiorkinis et al.,

2004). Given a geographically widespread occurrence of

recombinants, one implication is that correct identification of

viruses from outbreaks must include diagnostic information

from both the ORF1 and OFR2 regions. If the source of an

outbreak is a recombinant, including information from only

one ORF may result in a misleading identification of the viral

strain. Other authors (Oliver et al., 2004) have also highlighted

this factor.

It is well documented that new Noroviruses strains spread

rapidly around the world. The results presented here indicate

that they readily produce recombinants and the resulting

recombinants, or descendents thereof, may also perform rapid

global movements. With the increase in genomic DNA

sequencing, and the greater availability of genomic viral

sequences, it will be possible to investigate Norovirus

recombination in greater detail. We anticipate that the approach

described here will provide a valuable tool for the preliminary

screening and identification of recombinant sequences and

recombination hotspots within these forthcoming data sets.

Methods

Discovery of putative recombinants

All publicly available sequences for the Noroviruses (up to

and including 1 Sep 2004) were downloaded from GenBank

(Benson et al., 2002) via the NCBI Taxonomy browser

(Wheeler et al., 2002). All sequences of less than 3 kb in size

were removed to increase the likelihood of producing long

ungapped alignments and hence to maximise the portion of the

genome available for analysis. This method produced 124

Page 7: High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus

Fig. 9. ML trees showing reported recombinant Saitama U4 and its

phylogenetic relationship to the parental strains Gwynedd 276 and Saitama

U17 both (a) before the recombination crossover point and (b) after this point.

Table 1

Comparison of the log likelihood ratios for the putative recombinant sequences

to their respective simulated distributions under the null hypothesis of no

recombination

Putative recombinant

sequence

Range of log likelihood

ratios for 500

simulated data sets

Log likelihood ratio

for the putative

recombinant

Norovirus WUG1 2–14 93

Snow Mountain virus 2–19 155

Saitama U4 2–18 121

Norovirus Gifu’96 2–12 518

Note that for each putative recombinant sequence, the real value falls outside

the range of the simulated data set, lending further support to the recombinan

status of the sequence.

G.J. Etherington et al. / Virology 345 (2006) 88–9594

sequences. A separate alignment for strains that had complete

genomes available was also produced. The sequences were

translated into their amino acid counterparts using BioEdit

(Hall, 1999) and were subsequently aligned using ClustalW

(Thompson et al., 1994) (some minor manual editing of the

sequences was made to adjust for frame-shifts between ORFs).

Following alignment, the sequences were analysed to find

homologous genomic regions represented by aligned sequence

in as many strains as possible. Once potential regions had been

identified, they were back-translated to DNA sequence and

were edited manually to remove any remaining gapped

positions. The Recombination Analysis Tool (RAT) software

(Etherington et al., 2005), was then used to find recombination

in the aligned sequence blocks. RAT is a cross-platform Java-

based application used for high-throughput, recombination

analysis of both DNA and protein multiple sequence align-

ments. It uses a distance-based method of recombination

detection. When compared to other recombination programs,

RAT is an easier and intuitive tool to use and allows the

analysis of large amounts of data with minimum user

intervention. The lack of complex algorithms allows users to

speedily obtain a clear insight into the history of recombination

by automatically searching through sequence data. RAT is

especially useful as an initial, fast, exploratory tool to form the

basis for a more detailed analysis. RAT’s Auto Search option

was used for this purpose. The search proceeds by taking each

sequence in turn (the Test Sequence) and by calculating

distances from all possible Target Sequences within a sliding

window that traverses the alignment. In order to assess whether

a given Test Sequence is a recombinant of any two given Target

Sequences, three parameters were considered: lower threshold

(the distance between a Target Sequence and the Test Sequence

must be below this value within a given region of comparison),

upper threshold (the distance between a Target Sequence and

the Test Sequence must jump from below the lower threshold

to above this value, or vice versa, to qualify as a putative

recombinant), and number of parental sequences (the maxi-

mum number of sequences allowed to contribute to a putative

recombinant). Each alignment was taken in turn and was

examined using a range of search parameters. For the lower

threshold, the lowest value used was 60% and for the upper

threshold, the highest value used was 99%. These parameter

ranges were chosen as they were found to identify a large

proportion of known recombinants in simulated and real data

sets used to test the RAT software prior to this analysis.

Phylogenetic analysis of putative recombinants

For each putative recombinant discovered by RAT, a

maximum likelihood phylogenetic tree was obtained from

each side of the recombination breakpoint, using fastDNAml

(Olsen et al., 1994), which uses a maximum likelihood model

of evolution. Each tree contained the recombinant, the two

parental strains, and a number of other sequences from the

alignment (in alignments with many sequences, some

sequences that were distantly related from the recombinant

and parental strains were omitted). Each pair of trees was then

compared by eye to identify the phylogenetic relationships of

the recombinants to their respective parental strains either side

of the recombination breakpoint.

Statistical analysis of putative recombinants

For each putative recombinant discovered by RAT, the

sequences of the recombinant and the two parental strains were

analysed for further evidence of recombination (we did not

identify any recombinants with more than two parental strains).

t

Page 8: High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus

G.J. Etherington et al. / Virology 345 (2006) 88–95 95

Here, we used an approach where the null hypothesis (H0) of a

simple unrooted tree of the three sequences (i.e. no recombi-

nation) is compared to the alternative hypothesis (H1) of

different trees representing the relationship of the three

sequences on either side of a recombination breakpoint, using

the LARD software (Holmes et al., 1999). For each set of three

sequences (i.e. the recombinant and its two parents), the

SeqGen tool (Rambaut and Grassly, 1997) was used to create

500 data sets under the null hypothesis of no recombination

and each simulated data set was subsequently analysed by

LARD. The HKY model of substitution was used for both

simulated and real data sets. For each data set undergoing

analysis, the LARD software (Holmes et al., 1999) was used to

give a maximum likelihood estimate (MLE) of the recombi-

nation breakpoint and a log likelihood. This latter value

represents the difference in support for the null hypothesis of

no recombination and the competing alternative hypothesis of

recombination, where that event occurred at the MLE identified

by LARD. In this way, we can build up a distribution of log

likelihood values under H0, to which the log likelihood of our

putative recombinant data set may be compared.

Accession numbers

The accession numbers for viral sequences used in this

paper are: Arg320 (AF190817), Lordsdale virus (X86557),

Mexico virus (U22498), Seacroft/90/UK (AJ277620), Leeds/

90/UK (AJ277608), Wortley/90/UK (AJ277618), Norovirus

Mc37 (AY237415), Saitama U1 (AB039775), Saitama U3

(AB039776), Saitama U4 (AB039777), Saitama U16

(AB039778), Saitama U17 (AB039779), Saitama U25

(AB039780), WUG1 (AB081723), Southampton virus

(L07418), Norovirus BS5 (AF093797), Bo/Thirsk10/00/UK

(AY126468), Bo/Newbury2/76/UK (AF097917), Bo/Jena/80/

DE (AJ011099), MD 145-12 (AY032605), Gifu’96

(AB045603), Hawaii virus (U07611), Norovirus Vietnam 026

(AF504671), Snow Mountain virus (AY134748), Melksham

virus (X81879), and Gwynedd 273 (AF414409).

Acknowledgments

GJE was supported by a PhD studentship from the

Biotechnology and Biological Sciences Research Council

(BBSRC). JD and INR are supported by the BBSRC.

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic

local alignment search tool. J. Mol. Biol. 215 (3), 403–410.

Benson, D.A., Karsch-Mizrachi, I., D.J., L., Ostell, J., Rapp, B.A., Wheeler,

D.L., 2002. GenBank. Nucleic Acids Res. 30 (1), 17–20.

CDC, 2003. Center for Disease Control: Norovirus Activity—United States,

2002. Morb. Mort. Wkly. Rep. 52 (3), 41–45.

Clarke, I.N., Lambden, P.R., 1997. The molecular biology of caliciviruses.

J. Gen. Virol. 78, 291–301.

Etherington, G.J., Dicks, J., Roberts, I.N., 2005. Recombination Analysis Tool

(RAT): a program for the high-throughput detection of recombination.

Bioinformatics 21 (3), 278–281.

FSA, 2000. Food Standards Agency: A Report of the Study of Infections

Intestinal Disease in England. TSO, London.

Green, S.M., Dingle, K.E., Lambden, P.R., Caul, E.O., Ashley, C.R., Clarke,

I.N., 1994. Human enteric Caliciviridae: a new prevalent small round-

structured virus group defined by RNA-dependent RNA polymerase and

capsid diversity. J. Gen. Virol. 75 (Pt 8), 1883–1888.

Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor

and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41,

95–98.

Hansman, G.S., Katayama, K., Maneekarn, N., Peerakome, S., Khamrin, P.,

Tonusin, S., Okitsu, S., Nishio, O., Takeda, N., Ushijima, H., 2004. Genetic

diversity of norovirus and sapovirus in hospitalized infants with sporadic

cases of acute gastroenteritis in Chiang Mai, Thailand. J. Clin. Microbiol.

42 (3), 1305–1307.

Hardy, M.E., Estes, M.K., 1996. Completion of the Norwalk virus genome

sequence. Virus Genes 12 (3), 287–290.

Holmes, E.C., Worobey, M., Rambaut, A., 1999. Phylogenetic evidence for

recombination din dengue virus. Mol. Biol. Evol. 16 (3), 405–409.

Jiang, X., Espul, C., Zhong, W.M., Cuello, H., Matson, D.O., 1999.

Characterization of a novel human calicivirus that may be a naturally

occurring recombinant. Arch. Virol. 144 (12), 2377–2387.

Katayama, K., Shirato-Horikoshi, H., Kojima, S., Kageyama, T., Oka, T.,

Hoshino, F., Fukushi, S., Shinohara, M., Uchida, K., Suzuki, Y., Gojobori,

T., Takeda, N., 2002. Phylogenetic analysis of the complete genome of 18

Norwalk-like viruses. Virology 299 (2), 225–239.

Koopmans, M., Duizer, E., 2004. Foodborne viruses: an emerging problem. Int.

J. Food Microbiol. 90 (1), 23–41.

Lochridge, V.P., Hardy, M.E., 2003. Snow Mountain virus genome sequence

and virus-like particle assembly. Virus Genes 26 (1), 71–82.

Magiorkinis, G., Magiorkinis, E., Paraskevis, D., Vandamme, A.M., Van Ranst,

M., Moulton, V., Hatzakis, A., 2004. Phylogenetic analysis of the full-

length SARS-CoV sequences: evidence for phylogenetic discordance in

three genomic regions. J. Med. Virol. 74 (3), 369–372.

Oliver, S.L., Dastjerdi, A.M., Wong, S., El-Attar, L., Gallimore, C., Brown,

D.W., Green, J., Bridger, J.C., 2003. Molecular characterization of bovine

enteric caliciviruses: a distinct third genogroup of noroviruses (Norwalk-

like viruses) unlikely to be of risk to humans. J. Virol. 77 (4), 2789–2798.

Oliver, S.L., Brown, D.W., Green, J., Bridger, J.C., 2004. A chimeric bovine

enteric calicivirus: evidence for genomic recombination in genogroup III of

the Norovirus genus of the Caliciviridae. Virology 326 (2), 231–239.

Olsen, G.J., Matsuda, H., Hagstrom, R., Overbeek, R., 1994. fastDNAmL: a

tool for construction of phylogenetic trees of DNA sequences using

maximum likelihood. Comput. Appl. Biosci. 10 (1), 41–48.

Rambaut, A., Grassly, N.C., 1997. Seq-Gen: an application for the Monte Carlo

simulation of DNA sequence evolution along phylogenetic trees. Comput.

Appl. Biosci. 13 (3), 235–238.

Rohayem, J., Munch, J., Rethwilm, A., 2005. Evidence of recombination in the

norovirus capsid gene. J. Virol. 79 (8), 4977–4990.

Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving

the sensitivity of progressive multiple sequence alignment through

sequence weighting, position-specific gap penalties and weight matrix

choice. Nucleic Acids Res. 22 (22), 4673–4680.

Van Regenmortel, M.H.V., Fauquet, C.M., Bishop, D.H.L., Carstens, E.B.,

Estes, M.K., Lemon, S.M., Maniloff, J., Mayo, M.A., McGeoch, D.J.,

Pringle, C.R., Wickner, R.B., 2000. Virus Taxonomy: Classification and

Nomenclature of Viruses: Seventh Report of the International Committee

on Taxonomy of Viruses. Academic Press, San Diego.

Vinje, J., Green, J., Lewis, D.C., Gallimore, C.I., Brown, D.W., Koopmans,

M.P., 2000. Genetic polymorphism across regions of the three open reading

frames of ‘‘Norwalk-like viruses’’. Arch. Virol. 145 (2), 223–241.

Wheeler, D.L., Chappey, C., Lash, A.E., Leipe, D.D., Madden, T.L., Schuler,

G.D., Tatusova, T.A., Rapp, B.A., 2002. Database resources of the National

Center for Biotechnology Information. Nucleic Acids Res. 28 (1), 10–14.