origin and evolution of the unique hepatitis c virus...

9
Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant Form 2k/1b Jayna Raghwani, a Xiomara V. Thomas, b Sylvie M. Koekkoek, b Janke Schinkel, b Richard Molenkamp, b Thijs J. van de Laar, c Yutaka Takebe, d Yasuhito Tanaka, e Masashi Mizokami, f Andrew Rambaut, a,g and Oliver G. Pybus h Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, United Kingdom a ; Academic Medical Center, Department of Medical Microbiology, Section of Clinical Virology, Amsterdam, The Netherlands b ; VU University Medical Centre, Department of Medical Microbiology and Infection Control, Amsterdam, The Netherlands c ; AIDS Research Center, National Institute of Infectious Diseases, Tokyo, Japan d ; Department of Virology and Liver Unit, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan e ; The Research Center for Hepatitis and Immunology, National Center for Global Health and Medicine, Ichikawa, Japan f ; Fogarty International Center, National Institutes of Health, Bethesda, Maryland, USA g ; and Department of Zoology, University of Oxford, Oxford, United Kingdom h Since its initial identification in St. Petersburg, Russia, the recombinant hepatitis C virus (HCV) 2k/1b has been isolated from several countries throughout Eurasia. The 2k/1b strain is the only recombinant HCV to have spread widely, raising questions about the epidemiological background in which it first appeared. In order to further understand the circumstances by which HCV recombinants might be formed and spread, we estimated the date of the recombination event that generated the 2k/1b strain using a Bayesian phylogenetic approach. Our study incorporates newly isolated 2k/1b strains from Amsterdam, The Neth- erlands, and has employed a hierarchical Bayesian framework to combine information from different genomic regions. We esti- mate that 2k/1b originated sometime between 1923 and 1956, substantially before the first detection of the strain in 1999. The timescale and the geographic spread of 2k/1b suggest that it originated in the former Soviet Union at about the time that the world’s first centralized national blood transfusion and storage service was being established. We also reconstructed the epi- demic history of 2k/1b using coalescent theory-based methods, matching patterns previously reported for other epidemic HCV subtypes. This study demonstrates the practicality of jointly estimating dates of recombination from flanking regions of the breakpoint and further illustrates that rare genetic-exchange events can be particularly informative about the underlying epide- miological processes. H epatitis C virus (HCV) infection presents a major global health burden, with the WHO estimating that 170 million chronic carriers are at risk of developing severe clinical outcomes such as cirrhosis and hepatic cellular carcinoma (56, 71). The virus belongs to the single-stranded positive-sense RNA virus family Flaviviridae and is characterized by considerable genetic diversity. HCV diversity is classified into six main genotypes (genotypes 1 to 6), each of which is further divided into numerous subtypes, and the virus exhibits nucleotide sequence divergences of 30 and 20% at the genotype and subtype levels, respectively (58). The high genomic heterogeneity of HCV is a result of both its high rate of evolution and its long-term association with human populations (60). Although there is no indication for a zoonotic virus reser- voir, a related virus has recently been discovered in dogs (22). The greatest diversity of HCV is found in West and Central Africa and in Southeast Asia, where the virus appears to have persisted endemically for at least several centuries (49, 60). The current distribution of HCV genotypes and subtypes is geograph- ically structured, reflecting differences in the rates and routes of transmission of the various subtypes and genotypes. Epidemic strains, exemplified by subtypes 1a, 1b, and 3a, are characterized by high prevalence, low genetic diversity, and a global distribution and are typically associated with transmission via infected blood products and injecting drug use (IDU) during the 20th century (13, 44–46, 54, 57). In contrast, endemic strains are more spatially restricted but harbor greater genetic diversity than epidemic strains, and it is currently thought that this endemic diversity pro- vided the source of the epidemic strains that constitute the major- ity of HCV infections worldwide (47, 60). Recombination is thought to play a comparatively minor role in shaping the genetic diversity of HCV; however, an in- creasing number of reports suggests that it is not entirely insig- nificant in HCV evolution. Most notable of these was the initial discovery of a natural recombinant form of HCV circulating in injecting drug users resident in St. Petersburg, Russia (20). This recombinant, labeled 2k/1b, has a 5= genome region that is most closely related to subtype 2k and a 3= genome region that is most closely related to the global epidemic subtype 1b, with a single recombination breakpoint located at genomic position 3175 or 3176 in the NS2 gene (20). Since the discovery of 2k/1b, several other studies have reported both inter- and intrageno- typic HCV recombinants in natural populations, although the evidence presented for recombination varies in strength; the weakest studies report only discordant genotyping results be- tween genome regions (which could also result from coinfec- tion), whereas the most convincing studies repeatedly se- quence the same recombination breakpoint from independent extractions (thereby excluding the possibility of in vitro genetic Received 31 August 2011 Accepted 14 November 2011 Published ahead of print 23 November 2011 Address correspondence to Jayna Raghwani, [email protected], or Oliver G. Pybus, [email protected]. Supplemental material for this article may be found at http://jvi.asm.org/. Copyright © 2012, American Society for Microbiology. All Rights Reserved. doi:10.1128/JVI.06184-11 2212 jvi.asm.org 0022-538X/12/$12.00 Journal of Virology p. 2212–2220

Upload: others

Post on 12-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

Origin and Evolution of the Unique Hepatitis C Virus CirculatingRecombinant Form 2k/1b

Jayna Raghwani,a Xiomara V. Thomas,b Sylvie M. Koekkoek,b Janke Schinkel,b Richard Molenkamp,b Thijs J. van de Laar,c

Yutaka Takebe,d Yasuhito Tanaka,e Masashi Mizokami,f Andrew Rambaut,a,g and Oliver G. Pybush

Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, United Kingdoma; Academic Medical Center, Department of MedicalMicrobiology, Section of Clinical Virology, Amsterdam, The Netherlandsb; VU University Medical Centre, Department of Medical Microbiology and Infection Control,Amsterdam, The Netherlandsc; AIDS Research Center, National Institute of Infectious Diseases, Tokyo, Japand; Department of Virology and Liver Unit, Nagoya CityUniversity Graduate School of Medical Sciences, Nagoya, Japane; The Research Center for Hepatitis and Immunology, National Center for Global Health and Medicine,Ichikawa, Japanf; Fogarty International Center, National Institutes of Health, Bethesda, Maryland, USAg; and Department of Zoology, University of Oxford, Oxford,United Kingdomh

Since its initial identification in St. Petersburg, Russia, the recombinant hepatitis C virus (HCV) 2k/1b has been isolated fromseveral countries throughout Eurasia. The 2k/1b strain is the only recombinant HCV to have spread widely, raising questionsabout the epidemiological background in which it first appeared. In order to further understand the circumstances by whichHCV recombinants might be formed and spread, we estimated the date of the recombination event that generated the 2k/1bstrain using a Bayesian phylogenetic approach. Our study incorporates newly isolated 2k/1b strains from Amsterdam, The Neth-erlands, and has employed a hierarchical Bayesian framework to combine information from different genomic regions. We esti-mate that 2k/1b originated sometime between 1923 and 1956, substantially before the first detection of the strain in 1999. Thetimescale and the geographic spread of 2k/1b suggest that it originated in the former Soviet Union at about the time that theworld’s first centralized national blood transfusion and storage service was being established. We also reconstructed the epi-demic history of 2k/1b using coalescent theory-based methods, matching patterns previously reported for other epidemic HCVsubtypes. This study demonstrates the practicality of jointly estimating dates of recombination from flanking regions of thebreakpoint and further illustrates that rare genetic-exchange events can be particularly informative about the underlying epide-miological processes.

Hepatitis C virus (HCV) infection presents a major globalhealth burden, with the WHO estimating that 170 million

chronic carriers are at risk of developing severe clinical outcomessuch as cirrhosis and hepatic cellular carcinoma (56, 71). The virusbelongs to the single-stranded positive-sense RNA virus familyFlaviviridae and is characterized by considerable genetic diversity.HCV diversity is classified into six main genotypes (genotypes 1 to6), each of which is further divided into numerous subtypes, andthe virus exhibits nucleotide sequence divergences of 30 and 20%at the genotype and subtype levels, respectively (58). The highgenomic heterogeneity of HCV is a result of both its high rate ofevolution and its long-term association with human populations(60). Although there is no indication for a zoonotic virus reser-voir, a related virus has recently been discovered in dogs (22).

The greatest diversity of HCV is found in West and CentralAfrica and in Southeast Asia, where the virus appears to havepersisted endemically for at least several centuries (49, 60). Thecurrent distribution of HCV genotypes and subtypes is geograph-ically structured, reflecting differences in the rates and routes oftransmission of the various subtypes and genotypes. Epidemicstrains, exemplified by subtypes 1a, 1b, and 3a, are characterizedby high prevalence, low genetic diversity, and a global distributionand are typically associated with transmission via infected bloodproducts and injecting drug use (IDU) during the 20th century(13, 44–46, 54, 57). In contrast, endemic strains are more spatiallyrestricted but harbor greater genetic diversity than epidemicstrains, and it is currently thought that this endemic diversity pro-vided the source of the epidemic strains that constitute the major-ity of HCV infections worldwide (47, 60).

Recombination is thought to play a comparatively minorrole in shaping the genetic diversity of HCV; however, an in-creasing number of reports suggests that it is not entirely insig-nificant in HCV evolution. Most notable of these was the initialdiscovery of a natural recombinant form of HCV circulating ininjecting drug users resident in St. Petersburg, Russia (20). Thisrecombinant, labeled 2k/1b, has a 5= genome region that ismost closely related to subtype 2k and a 3= genome region thatis most closely related to the global epidemic subtype 1b, with asingle recombination breakpoint located at genomic position3175 or 3176 in the NS2 gene (20). Since the discovery of 2k/1b,several other studies have reported both inter- and intrageno-typic HCV recombinants in natural populations, although theevidence presented for recombination varies in strength; theweakest studies report only discordant genotyping results be-tween genome regions (which could also result from coinfec-tion), whereas the most convincing studies repeatedly se-quence the same recombination breakpoint from independentextractions (thereby excluding the possibility of in vitro genetic

Received 31 August 2011 Accepted 14 November 2011

Published ahead of print 23 November 2011

Address correspondence to Jayna Raghwani, [email protected], orOliver G. Pybus, [email protected].

Supplemental material for this article may be found at http://jvi.asm.org/.

Copyright © 2012, American Society for Microbiology. All Rights Reserved.

doi:10.1128/JVI.06184-11

2212 jvi.asm.org 0022-538X/12/$12.00 Journal of Virology p. 2212–2220

Page 2: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

exchange). Thus far there have been nine descriptions of HCVrecombinant forms, although only in six cases have the break-points been sequenced (6–8, 19, 28, 29, 42).

Inspection of the recombination breakpoint positions withinthe HCV genome reveals a difference between inter- and intra-genotypic recombinants. Breakpoints in the intrasubtypic recom-binants (1a/1c and 1b/1a) are located in the E1/E2 region, while inthe intergenotypic recombinants (including 2k/1b), the break-points are consistently found in the NS2-NS3 region (8, 19, 28, 29,39, 42). Interestingly, naturally occurring intergenotypic HCV re-combinants have more often than not involved genotype 2 in the5= genome region (19, 20, 28, 29, 42). This may reflect some in-herent yet unknown biological or ecological properties of this ge-notype to produce viable recombinant viruses.

Nevertheless, the low rate of discovery of novel recombinantforms suggests that although recombination does occur in HCV, itis an uncommon event, at least in comparison to HIV-1. Indeed,the 2k/1b strain discovered in St. Petersburg is the only knowncirculating recombinant form (CRF) of HCV and has thereforebeen designated CRF01_1b2k, following a naming scheme similarto that developed for HIV-1 recombinants (23). A CRF is a recom-binant form that is found repeatedly in different patients. Since itsdiscovery, CRF01_1b2k has been isolated from patients in manycountries, including Ireland, France, Cyprus, Azerbaijan, Uzbeki-stan, and Russia (20, 25, 26, 29, 37–39). CRF01_1b2k is the onlyrecombinant strain of HCV to have transmitted widely; therefore,it is important to investigate its genesis and dissemination in orderto understand why it might be unique and to evaluate the likeli-hood that other HCV recombinant forms could increase in prev-alence in the future.

Evolutionary analysis of viral genomes using methods based onmolecular clocks and coalescent theory has previously proved use-ful in reconstructing the epidemic history of various HCV strains,including subtypes 1a and 1b worldwide (33, 46) and subtype 1bin Japan (66). Similar analyses of HCV genotype 4 in Egypt haveestimated the timescale of the large HCV epidemic in that countryand have confirmed its iatrogenic cause (48, 65). Very little isknown about the evolutionary history of HCV subtype 2k, mostlikely because of the lack of sequence data for the strain.

In order to address the lack of information about the epidemi-ological and transmission history of HCV CRF01_1b2k, we haveconducted a comprehensive evolutionary analysis of all availableviral genome sequences using a well-established Bayesian frame-work (11). Since previously published sequence data onCRF01_1b2k are limited, we sought to increase the sample size byisolating and sequencing a panel of new recombinant isolatesfrom patients of Russian origin resident in Amsterdam, The Neth-erlands.

By combining information from different genomic regions in asingle analysis, we provide the first estimate of the date of therecombination event that generated CRF01_1b2k. The date thatwe obtained considerably predates the discovery of the strain andrequires a reevaluation of the circumstances surrounding its ori-gin. This is the first time that a recombination event has beendated for any virus other than HIV-1 (32, 51, 67) or influenza Avirus (27). Further, we estimate the CRF’s past rate of transmis-sion and its pattern of global geographic spread. To obtain moreprecise parameter estimates when dating recombination events,we employed a joint phylogenetic approach that improves onmethods previously applied to HIV-1 CRFs (51, 67). The methods

introduced here should serve as a model for future phylogeneticinvestigations of genetic-exchange events in RNA virus popula-tions.

MATERIALS AND METHODSIdentification and sequencing of new HCV 2k/1b isolates from Amster-dam. In the course of a study of HCV-infected patients resident in Am-sterdam (unpublished data), it was found that HCV genotyping resultsfrom the 5= untranslated and NS5B regions were discordant for 6 (out of200) patients. Of these, five were male and one was female, and their meanage was 34 years (Table 1). For this study, the 5=-end sequences were notused.

HCV RNA was isolated from 200 �l plasma using the purificationmethod described by Boom et al. (4). cDNA was generated using randomhexamer primers as described before (4). The amplification was per-formed using a conventional PCR with the following cycling conditions: 2min at 50°C and 10 min at 95°C, followed by 45 cycles, each consisting of30 s at 95°C, 30 s at 55°C, and 1 min at 72°C. Amplicons were purifiedfrom a 1% agarose gel as described by Boom et al. (4). Amplification of a724-nucleotide (nt) fragment (unpublished data) of the NS5B region wasperformed as previously described (40). Amplification of core/E1 wasperformed in a 25-�l volume using HCV1b/2 F (GCGTGAGRGTCCTGGAG) as the forward primer and HCV1b/2 R (TGCCARCARTANGGCYTCAT) (positions 292 to 312; all primer locations are indicated relative tothe H77 reference genome) as the reverse primer, using the same ampli-fication conditions mentioned above. Amplicons were also purified froma 1% agarose gel as described by Boom et al. (4). Sequencing was per-formed using HCV1b/2F seq (CTTCYTACTAGCTCTYTTGTCTT; posi-tions 128 to 112) as the forward sequencing primer and HCV1b/2R seq(TGCCAACTGCCRTTGGTGT; positions 2386 to 2410) as the reversesequencing primer.

To confirm that the viral variants were recombinants, a 234-nt frag-ment harboring the known breakpoint in NS2 was also amplified andsequenced. Amplification and sequencing of the NS2 breakpoint wereperformed using HCV2K_F (GCACGCCATACTTCGTCAGAG) as theforward primer and HCV1B_R (CAGGTAATGATCTTGGTCTCCATGT) as the reverse primer, also using the same cycling conditions men-tioned above.

In addition to new CRF01_1b2k sequences from Amsterdam, 6 re-combinant isolates from an unpublished study were also included. Thesesequences were sampled from IDUs in Azerbaijan (Table 1). Most of theremaining CRF01_1b2k isolates came from a study of IDUs in Uzbekistan(25) and from a cohort study of HCV-positive patients from seven coun-tries (26); detailed demographic information on isolates from these twostudies is provided here (Table 1). In the latter study, all individuals in-fected with CRF01_1b2k came from either Russia or Uzbekistan, and 6 ofthese patients were linked to high-risk groups, namely, blood transfusionrecipients or intravenous drug users.

Collation of HCV sequence alignments. To investigate the evolution-ary origin and spread of HCV CRF01_1b2k, a data set comprising allsubtype 2k/1b (n � 27), 2k (n � 15), and 1b (n � 71) isolates for whichboth core/E1 and NS5B sequences were available was collated fromGenBank and the HCV Sequence Database (24) (see Table S1 in the sup-plemental material). This collection included the 6 newly sequenced iso-lates obtained from HCV patients in Amsterdam (see above). Alignmentsfor both genome regions were constructed manually, and each alignmentcontained exactly the same set of taxa. The date and sampling location ofeach sequence were obtained from the literature or via personal commu-nication (Table 1).

Estimation of genome region-specific rates of evolution. The evolu-tionary rates of the core/E1 and NS5B regions used in this study could notbe estimated directly from our data, because the sample size and range ofsample dates were not large or wide enough. In line with previous studiesof HCV epidemic history (e.g., see reference 48), we employed an inde-pendent data set with significant temporal information to provide the

Origin and Evolution of HCV 2k/1b

February 2012 Volume 86 Number 4 jvi.asm.org 2213

Page 3: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

substitution rates of the subgenomic regions of interest. Specifically, weused the data and analysis strategy of a recent study that reported rates ofevolution for all HCV genome regions for subtypes 1a and 1b (14). Thatstudy utilized an alignment-partition approach, which was implementedin the BEAST program, to estimate region-specific rates (11). We applieda codon-structured nucleotide substitution model (55), an uncorrelatedrelaxed lognormal molecular clock (10), and a Bayesian skyline coalescentmodel (12) to both subtype 1a and 1b whole-genome alignments, fromwhich we obtained rate estimates for the precise subgenomic regions(core/E1 and NS5B) used in our analysis of CRF01_1b2k. The Markovchain Monte Carlo chains (MCMCs) were run for 200 million generationsand sampled regularly to yield a posterior tree distribution based upon10,000 estimates. For further analysis details, see reference 14.

Phylogenetic analysis. Preliminary phylogenetic analyses were un-dertaken to confirm that CRF01_1b2k originated from a single recombi-nation event. Neighbor-joining (NJ) trees of the core/E1 and NS5B datasets were estimated using the PAUP* program (64) with an HKY85 nu-cleotide substitution model and a gamma distributed among site rateheterogeneity (data not shown).

Next, in order to directly test the hypothesis of a single recombinantorigin, we performed Bayesian MCMC analysis of the core/E1 and NS5Bdata sets in two ways: (i) we constrained the CRF01_1b2k isolates to be amonophyletic clade, and (ii) no phylogenetic constraints were imposed.The hypothesis of a single origin was then tested by performing a Bayesfactor (BF) comparison of the marginal likelihoods (41, 63) obtainedfrom these two analyses. This revealed an insignificant difference betweenthe two competing hypotheses; thus, a single recombination event wasassumed in following analyses.

Molecular clock analysis. In order to estimate the date of the recom-bination event that formed CRF01_1b2k, we created separate data sets for

the core/E1 and NS5B regions that contained the CRF isolates, plus allavailable closely related parental subtype reference sequences (belongingto subtype 2k for the core/E1 region and to subtype 1b for the NS5Bregion). A hierarchical phylogenetic model (62) was used to combine bothdata sets and thereby provide a joint estimate of the time to the mostrecent common ancestor (TMRCA) of the CRF clade, while accountingfor uncertainty in both genome regions.

As the CRF clade in the different genome regions is known to representa common evolutionary history, jointly estimating the age of this cladewill maximize the explanatory power of the data (62) and is thus morepowerful than analyzing the regions independently, as has been done pre-viously (e.g., see reference 67). However, due to the recombination of thedifferent subtypes, we cannot simply assume that a single tree representsthe entire genome. Instead, we allow independent trees for each genomicregion but maintain the TMRCA of the CRF clade in each region to liewithin a small time of each other, while estimating the mean of these as theparameter of interest.

Specifically, separate phylogenies, molecular clock models, and sub-stitution models were estimated for the core/E1 and NS5B regions. Thegenome region-specific rates (estimated as described above) were used asprior distributions for the evolutionary rates for the core/E1 and NS5Bregions. For the NS5B region, the rates estimated from subtype 1b wereapplied, while for the core/E1 region, the average of the 1a and 1b rates wasused (because a subtype 2k-specific rate was not available).

For each pair of sampled phylogenies (core/E1 and NS5B) in the pos-terior distribution of the MCMC, three node dates were obtained (aslabeled in Fig. 2): A, the joint TMRCA of the CRF clade; B, the date of theparental node of the CRF clade in the core/E1 subtype 2k phylogeny; andC, the date of the parental node of the CRF clade in the NS5B subtype 1bphylogeny. The former date, together with the more recent of the last two

TABLE 1 Epidemiological and sequence information of the CRF01_1b2k isolates used in this studya

Isolate nameAge(yr)

GenderSamplinglocation

Riskfactor(s)

Country oforigin

GenBank accession no.NS2breakpoint

Reference orsourceNS5B

Core/E1accession

1b2k_AZ_01AZ051_2000 34 M Azerbaijan IDU Azerbaijan FJ435529 FJ435462 NA Unpublished1b2k_AZ_01AZ082_2000 38 M Azerbaijan IDU Azerbaijan FJ435544 FJ435480 NA Unpublished1b2k_AZ_02AZ105_2001 32 M Azerbaijan IDU Azerbaijan FJ435550 FJ435490 NA Unpublished1b2k_AZ_02AZ114_2001 41 M Azerbaijan IDU Azerbaijan FJ435556 FJ435497 NA Unpublished1b2k_AZ_02AZ129_2001 30 M Azerbaijan IDU Azerbaijan FJ435564 FJ435505 NA Unpublished1b2k_AZ_02AZ139_2001 33 M Azerbaijan IDU Azerbaijan FJ435572 FJ435514 NA Unpublished1b2k_CY_CYHCV037_2005 NA NA Cyprus ST/IDU Georgia EU684614 EU684686 NA 91b2k_CY_CYHCV093_2007 NA NA Cyprus ST/IDU Georgia EU684649 EU684728 NA 91b2k_AM_P077_2006 35 M Amsterdam IDU Russia JF949902 JF949908 Confirmed This study1b2k_AM_P079_2006 42 M Amsterdam NA Georgia JF949897 JF949903 Confirmed This study1b2k_AM_P108_2007 35 M Amsterdam IDU Georgia JF949898 JF949904 Confirmed This study1b2k_AM_P135_2005 35 F Amsterdam NA Georgia JF949899 JF949905 Confirmed This study1b2k_AM_P159_2007 39 M Amsterdam NA Georgia JF949900 JF949906 Confirmed This study1b2k_AM_P179_2000 21 M Amsterdam IDU Georgia JF949901 JF949907 Confirmed This study1b2k_FR_M21_2007 30 M France IDU Georgia FJ821465 FJ821465 Confirmed 381b2k_IE_HC9A99966_2006 NA NA Ireland NA Russia AB327058 AB327018 Confirmed 371b2k_RU_747_1999 NA NA Russia IDU Russia AF388411 AY070214 Confirmed 20, 211b2k_RU_796_1999 NA NA Russia NA Russia AF388412 AY070215 Confirmed 20, 211b2k_RU_ALT30_2000 34 F Russia NA Russia AB327055 AB327015 Confirmed 261b2k_RU_HIA1002_2003 NA NA Russia NA Russia DQ001221 AB327011 Confirmed 261b2k_RU_KNG318_2002 30 M Russia IDU Russia AY764172 AB327010 Confirmed 261b2k_RU_KNG327_2002 25 M Russia NA Russia AY764176 AB327012 Confirmed 261b2k_RU_N687_1999 NA NA Russia NA Russia AY587845 AY587845 Confirmed 261b2k_RU_PSA108_2005 37 M Russia NA Russia AB327053 AB327013 Confirmed 261b2k_RU_PSA62_2005 50 M Russia BT Russia AB327054 AB327014 Confirmed 261b2k_UZ_AZ15 22 M Uzbekistan BT/IDU Uzbekistan AB327056 AB327016 Confirmed 251b2k_UZ_UZIDU19_2006 NA NA Uzbekistan IDU Uzbekistan AB327120 AB327122 Confirmed 25a M, male; F, female; NA, not available; IDU, intravenous drug usage; BT, blood transfusion; ST, sexual transmission.

Raghwani et al.

2214 jvi.asm.org Journal of Virology

Page 4: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

dates, therefore defines a time range during which the recombinationevent must have occurred (see Fig. 2). The posterior distribution of thistime range was then compiled by repeating the above-described proce-dure for each pair of phylogenies in the MCMC output. The BEAST anal-ysis model settings were the same as those outlined in the section ongenome region-specific rates above.

CRF01_1b2k transmission history. Further Bayesian MCMC phylo-genetic analyses were performed solely on the CRF01_1b2k isolates inorder to estimate the epidemic history and basic reproductive number, R0,of the strain since its emergence. BEAST model settings were the same asthose outlined above, except that different coalescent models were em-ployed to reconstruct the transmission history of the CRF. Both theGMRF skyride (36) and exponential-growth coalescent models were used.

RESULTSEstimation of genome region-specific rates of evolution. Therates of evolution of the core/E1 and NS5B genomic regions usedin this study were estimated from subtype 1a and 1b whole-genome data sets (see Materials and Methods) and are given inTable 2. The rates for the NS5B region corresponded betweensubtypes, with similar 95% highest posterior density (HPD) inter-vals, while the rate estimates for the core/E1 region did show somevariation among subtypes (Table 2).

Phylogenetic analysis. To establish the evolutionary origins ofthe HCV 2k/1b strain, we analyzed the core/E1 (644 nt) and NS5B(741 nt) regions of 27 CRF 1b/2k, 15 subtype 2k, and 71 subtype 1bisolates. The 2k/1b isolates were sampled between 1999 and 2007and were from the following locations: Ireland, Uzbekistan, Azer-baijan, Cyprus, Amsterdam, France, and Russia (Table 1). Sincethe BF test supported the hypothesis that the 2k/1b isolates weremonophyletic, a single origin of the CRF was inferred (BFs for thecomparisons of monophyly and nonmonophyly models were 0.31for the core/E1 data set and �0.68 for the NS5B data set). Further-more, the monophyletic origin of the CRF clade was supported inboth genome regions when no phylogenetic constraints were im-posed. However, monophyly of CRF 1b/2k isolates was not sup-ported by a high posterior probability (0.67) in the NS5B maxi-mum clade credibility (MCC) tree, most likely reflecting theuncertainty associated with the star-like phylogeny and relativelyshort sequence length. However, the recombinant nature of manyof the isolates in this study was confirmed by direct observation ofthe breakpoint in the NS2 gene region (confirmed isolates arerepresented by filled circles in Fig. 1).

Molecular clock analysis. Figure 1 and Table 3 provide theestimated TMRCAs of the CRF clade obtained using the joint andindependent molecular clock analyses. The estimates obtainedseparately from the core/E1 and NS5B data sets were in closeagreement and exhibit overlapping 95% HPD intervals. These es-timates also agree with the joint estimate of TMRCA of the CRF(node A), which was 1946 (1932 to 1959; Table 3). We also esti-mated the date of the most recent common ancestor of the CRFclade and its most closely related parental isolate (Fig. 2). Theseestimates were again similar for both genome regions, at about1933 (Table 3). Lastly, by comparing the HPDs of the node A datewith the dates of the more recent of the two parental nodes (eitherB or C), we were able to generate bounds for the time of the

TABLE 2 Evolutionary rate estimates of the genomic regions used inthis study

Genomic regionNo. of substitutions/site/yr (10�3)a

HCV subtype 1a HCV subtype 1b Average rate

Core/E1 1.75 (1.41, 2.10) 1.36 (1.01, 1.76) 1.56 (1.21, 1.93)NS5B 0.89 (0.71, 1.07) 0.91 (0.68, 1.14) 0.90 (0.70, 1.11)a The estimates were obtained from HCV subtype 1a and 1b whole genomes using agenomic partition model (see Materials and Methods). The numbers inside parenthesesrepresents the 95% highest posterior density credibility interval.

FIG 1 Molecular clock phylogenies of CRF01_1b2k and its parental subtypes, estimated from the core/E1 alignment and the NS5B alignment. The horizontalbars in each phylogeny contain the dating estimates for two nodes: the common ancestor of the CRF clade (1931 or 1932 to 1958) and the common ancestor ofthe CRF and the most closely related parental strain (1910 to 1912 to 1952 or 1953). Filled circles, 2k/1b isolates that were confirmed as being recombinant bysequencing of the breakpoint in the NS2 region; open circles, those isolates for which NS2 sequences were not available. The trees shown are the maximum cladecredibility trees from the Bayesian MCMC analysis.

Origin and Evolution of HCV 2k/1b

February 2012 Volume 86 Number 4 jvi.asm.org 2215

Page 5: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

recombination event that generated the CRF, which was between1923 and 1956.

CRF01_1b2k transmission history. Figure 3 shows the esti-mated epidemic history of CRF01_1b2k, as estimated using theBayesian skyride plot method, which depicts the effective popula-tion size of the CRF epidemic over time. The plot indicates ap-proximately constant exponential growth since the emergence ofthe CRF lineage (Fig. 3) until the mid-1990s, after which the ef-fective population size declines or stabilizes (either is plausible,given the size of the credible region of the estimate). This decrease/stabilization coincides with the advent of screening for HCV inblood donors, which greatly reduced the risk of HCV infection viablood transfusion (31, 52, 70).

To ascertain the CRF’s exponential growth rate (r), the CRFdata set was also analyzed using an exponential-growth coalescentmodel. The estimated growth rate was 0.116 year�1 (95% HPDinterval, 0.079 to 0.159). This estimate was subsequently used tocalculate R0 values for the CRF01_1b2k strain under a plausiblerange of average durations of infectiousness (D), using the equa-tion R0 � rD � 1 (46). Both the estimated growth rate (r � 0.1year�1) and the estimated R0 values (R0, �2 to 4) are compatiblewith a number of equivalent estimates for other HCV subtypes,including those from IDU risk groups (46–48, 66).

The MCC tree of the CRF01_1b2k clade (Fig. 4), when com-bined with all available epidemiological information (Table 1),indicates a clear pattern of phylogenetic clustering according tothe geographic location and risk factor of each patient. For exam-ple, where these details were available, 14 out of 15 isolates wereassociated with IDU, while only 1 was isolated from a patient with

a history of blood transfusion (Table 1). These observations fur-ther support the view that CRF01_1b2k transmission is stronglylinked with the IDU transmission route. Although it is not possi-ble to reconstruct the location of the common ancestor of the CRFlineage with any certainty (because our sample size is small and thebasal branches of the phylogeny are poorly supported), all of theisolates included in this study are from or have an epidemiologicallink to the former Soviet Union, and the oldest strain was sampledin Russia in 1999. A well-supported cluster of strains from Azer-baijan originated in about 1970, and therefore, the CRF has likelycirculated there since that time. Hence, it appears thatCRF01_1b2k disseminated throughout the Soviet Union beforeits dissolution in the late 1980s and for some time prior to thediscovery of the CRF in St. Petersburg in 1999.

DISCUSSION

As the only known HCV recombinant in widespread circula-tion, the existence and emergence of CRF01_1b2k present aninteresting question in HCV epidemiology and evolution. In-vestigating its evolutionary origins and transmission historyhelps to understand the circumstances that led to its uniqueproperties. In contrast to HIV, which has 49 known CRFs anda much greater number of unique recombinant forms (30),recombination typically contributes little to the generation andmaintenance of HCV genetic diversity. Given that HCV has ahigher global prevalence than HIV and, thus, all else beingequal, there is a high likelihood of dual infections with diver-gent HCV strains, it is unlikely that epidemiological factors are

TABLE 3 Estimates of TMRCA of the CRF clade obtained from separate genome regions and from joint (hierarchical) phylogenetic analysis

Clade (nodea)TMRCA (yr)b

Core/E1 NS5B Joint (hierarchical) estimate

CRF (A) 1945.2 (1931.6, 1958.3) 1945.8 (1932.4, 1958.5) 1946.0 (1932.5, 1959.0)CRF � 2k (B) 1932.0 (1909.7, 1952.8) NA NACRF � 1b (C) NA 1933.1 (1912.0, 1952.1) NAa The node labels highlighted in Fig. 2.b The numbers inside parentheses represent the 95% highest posterior density credibility interval. NA, not available.

FIG 2 An illustration to explain the structure of the joint phylogenetic method that we employed to estimate the date of the recombination event that generatedCRF01_1b2k. The core/E1 tree is shown on the left and the NS5b tree on the right; in both, the shaded box highlights the CRF01_1b2k clade. The left axis depictsthe time of the ancestral nodes (node B in the core/E1 tree and node C in the NS5B tree). The right axis shows the TMRCA of CRF01_1b2k (node A). Both CRFnode A (estimated jointly from the core/E1 and NS5B regions) and the youngest parental node (either B or C) were used to calculate the minimum time windowfor the recombination event to have occurred.

Raghwani et al.

2216 jvi.asm.org Journal of Virology

Page 6: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

restricting the opportunities for HCV to generate CRFs. Mixedinfections with divergent HCV strains have been reported formany different populations and are noted to be prevalentamong high-risk groups, particularly IDUs and some hemo-philiacs (3, 5, 16, 50, 53).

Since the opportunities for HCV recombination are not lim-ited, it is more likely that fundamental molecular and evolution-ary differences between HIV and HCV explain why HIV has manyCRFs and HCV has few. These could include differences in the rateof template switching or differences in genomic or immunologicalconstraints, such that HCV recombinants have, on average, lowerfitness than HIV recombinants and therefore are rarely transmit-ted (72). Although both viruses are associated with chronic infec-tions, unlike HIV, HCV can be spontaneously cleared by the host.This may explain in part the differences in the number of recom-binants between HIV and HCV, where partial protective immu-nity against the latter reduces that chance of in vivo recombinationof HCV strains (1, 43, 69). However, the high rate of mixed infec-tions observed suggests that this is likely at best to play a minorrole in HCV recombination. The low frequency of HCV recom-binants is more likely to reflect mechanistic constraints on viralreplication. There is evidence that template switching in HCV is

FIG 3 Estimated Bayesian skyride plot of the CRF01_1b2k clade. The verticalaxis represents the product of viral generation time and the effective number ofinfections (Ne). The solid line shows the best estimate, and the shaded areashows the 95% credible region of this estimate.

FIG 4 Molecular clock phylogeny of the CRF01_1b2k clade, estimated using a relaxed uncorrelated lognormal clock model and the SDR06 nucleotidesubstitution model (see Materials and Methods). Sequences are color labeled according to the country of origin (diamonds) and the country of sampling (circles).All isolates were found to have an epidemiological link to Russia or the former Soviet Union (Table 1). Filled circles/diamonds, 2k/1b isolates that were confirmedas being recombinant by sequencing of the breakpoint in the NS2 region; open circles/diamonds, those isolates for which NS2 sequences were not available. Thenumbers above the branches show the posterior probability of nodes in the MCC tree; numbers below the branches represent bootstrap support values usingmaximum likelihood.

Origin and Evolution of HCV 2k/1b

February 2012 Volume 86 Number 4 jvi.asm.org 2217

Page 7: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

especially rare and that the replication complex is typically en-coded on the same genomic strand that it will replicate and tran-scribe (2). It is also interesting that when replication complexesare exchanged between different genotypes, the replication effi-ciency is substantially reduced (15). The pseudodiploidy of theHIV genome certainly increases the likelihood of recombinationoccurring due to the ability of the virus to package two RNA tem-plates (17), while the secondary RNA structure in the HCV ge-nome may limit the production of viable hybrid HCVs (59, 68).

Our study of previously reported and newly obtained HCVisolates provides the first estimates of the date of the recombina-tion event that generated CRF01_1b2k. We estimated the time oforigin of CRF01_1b2k to be between 1923 and 1956, which is notmuch later than the origin and global spread of the parental sub-type 1b (33). This date is significantly earlier than we expected: weexpected that the CRF’s creation might be linked to the dramaticincrease in IDU behavior following the breakup of the formerSoviet Union. This result is robust to the manner of isolate sam-pling: if, because of nonrandom sampling, our isolates are moreclosely related to each other than under random sampling, thenthe TMRCA of CRF01_1b2k would be biased toward more recentdates. Furthermore, despite its small size, our data set provides arelatively short time window during which the recombinant musthave arisen (Fig. 2). The involvement of subtype 1b in the recom-binant is not surprising, as it is one of the most prevalent subtypesworldwide. However, to fully appreciate the origin of CRF01_1b2k,we need to consider the evolutionary history of both parental sub-types and of the recombinant lineage itself. Genotype 2 harbors con-siderable genetic diversity, especially in West Africa, which is wherethe genotype is thought to have originated (34). Although the smallnumber of subtype 2k isolates sampled to date likely underestimatesthe true extent of subtype 2k distribution, such viruses have beenisolated from Martinique and Madagascar, implicating a role for thehistorical trans-Atlantic slave trade in the dissemination of the virusfrom West Africa (34).

The current distribution of subtype 2k is associated with fran-cophone regions and former Soviet Union countries. In contrast,CRF01_1b2k is more spatially limited, with all isolates being di-rectly or indirectly linked to the former Soviet Union. As the non-recombinant subtype 2k isolates that are most closely related toCRF01_1b2k are from Moldova and Azerbaijan (see Fig. S1 in thesupplemental material), it seems most likely that CRF01_1b2k wasgenerated in the former Soviet Union. An equivalent analysis ofsubtype 1b viruses provides no reliable phylogeographic linkage,due to the low phylogenetic resolution of the NS5B data set.

Our estimated date of CRF origin coincides with an interestingperiod of history in the former Soviet Union, which was an earlyleader in transfusion technology. Under the leadership of Alexan-der Bogdanov in the 1920s, a nationwide network of blood trans-fusion centers and research institutes, as well as the Central Insti-tute of Hematology in Moscow, Russia, in 1926, was establishedthroughout the Soviet republics (61). This expanded into a net-work of �1,500 blood donating centers across the republics (18).The Soviets also adopted blood storage and preservation tech-niques at an early stage. They established more than 60 primaryand 500 subsidiary blood storage centers by the mid-1930s, whichshipped blood across the entire Soviet Union (61). During theSecond World War, these networks were swiftly readapted to sup-port the front line; in Moscow alone, about 2,000 blood donationswere given per day (18, 61). The impressive scale of the blood

service in the former Soviet Union is likely to have favored HCVtransmission by increasing the efficiency and geographic range ofthe virus’s dissemination. Whether specific medical practices atthis time increased the probability of mixed viral infections re-mains unknown. It is interesting to note that Bogdanov himselfwas fascinated by the ideological interpretation of blood sharingand frequently practiced what he called “physiological collectiv-ism”: the exchange of blood with others through mutual transfu-sions (61).

Although unscreened blood transfusions can provide a credi-ble hypothesis for the origin of CRF01_1b2k in the Soviet Unionsome time from 1923 to 1956, we must also attempt to explainhow subtype 2k or the CRF itself arrived in the Soviet Union fromWest Africa or the Caribbean. Migration from Africa to the formerSoviet Union did occur during the late 1950s and 1970s as a resultof alliances forged by the Soviet government with newly indepen-dent African states such as Ghana and Angola (35). However,these connections are too late to have contributed to the emer-gence of CRF01_1b2k, according to our dating estimates. Al-though we cannot reject the hypothesis that the CRF was formedin West Africa and subsequently moved to the Soviet Union, ourresults are more consistent with the recombination event occur-ring in the latter. This uncertainty is likely to be reduced withfurther samples, especially subtype 2k viruses, from African andformer Soviet Union locations.

The epidemic history CRF01_1b2k (Fig. 3) since its emergenceis similar to that estimated for other epidemic subtypes of HCV(e.g., see reference 33). The growth in the CRF01_1b2k effectivepopulation sizes coincides with a substantial increase in bloodtransfusion, including during the Second World War, and withthe subsequent rise in intravenous drug usage. CRF01_1b2ktransmission seems to have slowed or stabilized since the early1990s, coinciding with the onset of the anti-HCV screening ofdonors. In the absence of any data to the contrary, the transmis-sion of this recombinant and its spread from the former SovietUnion reflect the peculiar epidemiological properties of the riskgroups that it has been associated with rather than any intrinsicproperties of the virus.

We demonstrate the practicality and benefits of using a hierar-chical phylogenetic model to jointly estimate parameters of inter-est when analyzing multipartite sequence data that result fromgenetic exchange. This method yields more accurate parameterestimates than previous methods (e.g., see references 27 and 67)by incorporating the phylogenetic information and uncertainty indifferent genomic regions. We recommend that this improvedstatistical framework be used in future investigations of recombi-nation in fast-evolving RNA viruses.

This study has made significant steps in understanding the ep-idemic history and spread of the unique circulating HCV recom-binant 2k/1b. Most significantly, we show that this strain origi-nated many decades before the post-Soviet rise in injectionbehavior with which it is currently associated. On the basis of thedate of its origin and its molecular epidemiology, there are reason-able grounds to suppose that the Soviet Union’s revolutionaryblood service was instrumental in the CRF’s early generation andcontinental-scale spread. This infrastructure may have facilitatedthe pan-Eurasian spread of other parenterally transmitted blood-borne infections, and this is an interesting question for futureresearch.

Raghwani et al.

2218 jvi.asm.org Journal of Virology

Page 8: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

REFERENCES1. Aitken CK, et al. 2008. High incidence of hepatitis C virus reinfection in

a cohort of injecting drug users. Hepatology 48:1746 –1752.2. Appel N, Herian U, Bartenschlager R. 2005. Efficient rescue of hepatitis

C virus RNA replication by trans-complementation with nonstructuralprotein 5A. J. Virol. 79:896 –909.

3. Blackard JT, Sherman KE. 2007. Hepatitis C virus coinfection and su-perinfection. J. Infect. Dis. 195:519 –524.

4. Boom R, et al. 1990. Rapid and simple method for purification of nucleic-acids. J. Clin. Microbiol. 28:495–503.

5. Bowden S, McCaw R, White PA, Crofts N, Aitken CK. 2005. Detectionof multiple hepatitis C virus genotypes in a cohort of injecting drug users.J. Viral Hepat. 12:322–324.

6. Calado RA, et al. 2011. Hepatitis C virus subtypes circulating amongintravenous drug users in Lisbon, Portugal. J. Med. Virol. 83:608 – 615.

7. Colina R, et al. 2004. Evidence of intratypic recombination in naturalpopulations of hepatitis C virus. J. Gen. Virol. 85:31–37.

8. Cristina J, Colina R. 2006. Evidence of structural genomic region recom-bination in hepatitis C virus. Virology J. 3:53.

9. Demetriou VL, de Vijver DAMCV, Kostrikis LG, Network CHCV.2009. Molecular epidemiology of hepatitis C infection in Cyprus: evidenceof polyphyletic infection. J. Med. Virol. 81:238 –248.

10. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. 2006. Relaxedphylogenetics and dating with confidence. PLoS Biol. 4:699 –710.

11. Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary anal-ysis by sampling trees. BMC Evol. Biol. 7:214.

12. Drummond AJ, Rambaut A, Shapiro B, Pybus OG. 2005. Bayesiancoalescent inference of past population dynamics from molecular se-quences. Mol. Biol. Evol. 22:1185–1192.

13. Dubois F, et al. 1997. Hepatitis C in a French population-based survey,1994: seroprevalence, frequency of viremia, genotype distribution, andrisk factors. Hepatology 25:1490 –1496.

14. Gray RR, et al. 2011. The mode and tempo of hepatitis C virus evolutionwithin and among hosts. BMC Evol. Biol. 11:131.

15. Herlihy KJ, et al. 2008. Development of intergenotypic chimeric repli-cons to determine the broad-spectrum antiviral activities of hepatitis Cvirus polymerase inhibitors. Antimicrob. Agents Chemother. 52:3523–3531.

16. Herring BL, Page-Shafer K, Tobler LH, Delwart EL. 2004. Frequenthepatitis C virus superinfection in injection drug users. J. Infect. Dis. 190:1396 –1403.

17. Hu WS, Temin HM. 1990. Genetic consequences of packaging two RNAgenomes in one retroviral particle: pseudodiploidy and high rate of geneticrecombination. Proc. Natl. Acad. Sci. U. S. A. 87:1556 –1560.

18. Huestis DW. 2002. Russia’s National Research Center for Hematology: itsrole in the development of blood banking. Transfusion 42:490 – 494.

19. Kageyama S, et al. 2006. A natural inter-genotypic (2b/1b) recombinantof hepatitis C virus in the Philippines. J. Med. Virol. 78:1423–1428.

20. Kalinina O, Norder H, Mukomolov S, Magnius LO. 2002. A naturalintergenotypic recombinant of hepatitis C virus identified in St. Peters-burg. J. Virol. 76:4034 – 4043.

21. Kalinina O, et al. 2001. Shift in predominating subtype of HCV from 1bto 3a in St. Petersburg mediated by increase in injecting drug use. J. Med.Virol. 65:517–524.

22. Kapoor A, et al. 2011. Characterization of a canine homolog of hepatitisC virus. Proc. Natl. Acad. Sci. U. S. A. 108:11608 –11613.

23. Kuiken C, Simmonds P. 2009. Nomenclature and numbering of thehepatitis C virus. Methods Mol. Biol. 510:33–53.

24. Kuiken C, Yusim K, Boykin L, Richardson R. 2005. The Los Alamoshepatitis C sequence database. Bioinformatics 21:379 –384.

25. Kurbanov F, et al. 2008. Detection of hepatitis C virus natural recombi-nant RF1_2k/1b strain among intravenous drug users in Uzbekistan.Hepatol. Res. 38:457– 464.

26. Kurbanov F, et al. 2008. Molecular epidemiology and interferon suscep-tibility of the natural recombinant hepatitis C virus strain RF1_2k/1b. J.Infect. Dis. 198:1448 –1456.

27. Lam TY, et al. 2008. Evolutionary analyses of European H1N2 swineinfluenza A virus by placing timestamps on the multiple reassortmentevents. Virus Res. 131:271–278.

28. Lee YM, et al. 2010. Molecular epidemiology of HCV genotypes amonginjection drug users in Taiwan: full-length sequences of two new subtype6w strains and a recombinant form_2b6w. J. Med. Virol. 82:57– 68.

29. Legrand-Abravanel F, et al. 2007. New natural intergenotypic (2/5) re-combinant of hepatitis C virus. J. Virol. 81:4357– 4362.

30. Leitner T, Korber B, Daniels M, Calef C, Foley B. 2005. HIV-1 subtypeand circulating recombinant form (CRF) reference sequences, 2005,p 41– 48. In Leitner T, et al (ed), HIV sequence compendium 2005. The-oretical Biology and Biophysics Group, Los Alamos National Laboratory,Los Alamos, NM.

31. Lemon SM, Brown EA. 1995. Hepatitis C virus, p 1474 –1486. In MandelGL, Bennett JE, Dolin R (ed), Principle and practice of infectious disease,4th ed. Churchill Livingstone, New York, NY.

32. Li Y, et al. 2010. Identification of a novel second-generation circulatingrecombinant form (CRF48_01B) in Malaysia: a descendant of the previ-ously identified CRF33_01B. J. Acquir. Immune Defic. Syndr. 54:129 –136.

33. Magiorkinis G, et al. 2009. The global spread of hepatitis C virus 1a and1b: a phylodynamic and phylogeographic analysis. PLoS Med. 6:e1000198.

34. Markov PV, et al. 2009. Phylogeography and molecular epidemiology ofhepatitis C virus genotype 2 in Africa. J. Gen. Virol. 90:2086 –2096.

35. Matusevich M. 2009. Black in the U.S.S.R.: Africans, African Americans,and the Soviet society. Transition 100:56 –75.

36. Minin VN, Bloomquist EW, Suchard MA. 2008. Smooth skyridethrough a rough skyline: Bayesian coalescent-based inference of popula-tion dynamics. Mol. Biol. Evol. 25:1459 –1471.

37. Moreau I, et al. 2006. Serendipitous identification of natural intergeno-typic recombinants of hepatitis C in Ireland. Virol. J. 3:95–102.

38. Morel V, et al. 2010. Emergence of a genomic variant of the recombinant2k/1b strain during a mixed hepatitis C infection: a case report. J. Clin.Virol. 47:382–386.

39. Moreno P, et al. 2009. Evidence of recombination in hepatitis C viruspopulations infecting a hemophiliac patient. Virol. J. 6:203.

40. Murphy DG, et al. 2007. Use of sequence analysis of the NS5B region forroutine genotyping of hepatitis C virus with reference to C/E1 and 5=untranslated region sequences. J. Clin. Microbiol. 45:1102–1112.

41. Newton MA, Raftery AE. 1994. Approximate Bayesian-inference with theweighted likelihood bootstrap. J. R. Stat. Soc. Ser. B Methodol. 56:3– 48.

42. Noppornpanth S, et al. 2006. Identification of a naturally occurringrecombinant genotype 2/6 hepatitis C virus. J. Virol. 80:7569 –7577.

43. Osburn WO, et al. 2010. Spontaneous control of primary hepatitis Cvirus infection and immunity against persistent reinfection. Gastroenter-ology 138:315–324.

44. Pawlotsky JM, et al. 1995. Relationship between hepatitis-C virus geno-types and sources of infection in patients with chronic hepatitis-C. J. In-fect. Dis. 171:1607–1610.

45. Pol S, et al. 1995. The changing relative prevalence of hepatitis-C virusgenotypes— evidence in hemodialyzed patients and kidney recipients.Gastroenterology 108:581–583.

46. Pybus OG, et al. 2001. The epidemic behavior of the hepatitis C virus.Science 292:2323–2325.

47. Pybus OG, Cochrane A, Holmes EC, Simmonds P. 2005. The hepatitisC virus epidemic among injecting drug users. Infect. Genet. Evol. 5:131–139.

48. Pybus OG, Drummond AJ, Nakano T, Robertson BH, Rambaut A.2003. The epidemiology and iatrogenic transmission of hepatitis C virus inEgypt: a Bayesian coalescent approach. Mol. Biol. Evol. 20:381–387.

49. Pybus OG, Markov PV, Wu A, Tatem AJ. 2007. Investigating the en-demic transmission of the hepatitis C virus. Int. J. Parasitol. 37:839 – 849.

50. Qian KP, Natov SN, Pereira BJ, Lau JY. 2000. Hepatitis C virus mixedgenotype infection in patients on haemodialysis. J. Viral Hepat. 7:153–160.

51. Ristic N, et al. 2011. Analysis of the origin and evolutionary history ofHIV-1 CRF28_BF and CRF29_BF reveals a decreasing prevalence in theAIDS epidemic of Brazil. PLoS One 6:e17485.

52. Schreiber GB, Busch MP, Kleinman SH, Korelitz JJ. 1996. The risk oftransfusion-transmitted viral infections. The Retrovirus EpidemiologyDonor Study. N. Engl. J. Med. 334:1685–1690.

53. Schroter M, Feucht HH, Zollner B, Schafer P, Laufs R. 2003. Multipleinfections with different HCV genotypes: prevalence and clinical impact.J. Clin. Virol. 27:200 –204.

54. Seeff LB, et al. 2000. 45-year follow-up of hepatitis C virus infection inhealthy young adults. Ann. Intern. Med. 132:105–111.

55. Shapiro B, Rambaut A, Drummond AJ. 2006. Choosing appropriatesubstitution models for the phylogenetic analysis of protein-coding se-quences. Mol. Biol. Evol. 23:7–9.

Origin and Evolution of HCV 2k/1b

February 2012 Volume 86 Number 4 jvi.asm.org 2219

Page 9: Origin and Evolution of the Unique Hepatitis C Virus ...evolve.zoo.ox.ac.uk/Evolve/Oliver_Pybus_files... · Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant

56. Shepard CW, Finelli L, Alter MJ. 2005. Global epidemiology of hepatitisC virus infection. Lancet Infect. Dis. 5:558 –567.

57. Silini E, et al. 1995. Molecular epidemiology of hepatitis-C virus-infection among intravenous-drug-users. J. Hepatol. 22:691– 695.

58. Simmonds P. 2004. Genetic diversity and evolution of hepatitis C vi-rus—15 years on. J. Gen. Virol. 85:3173–3188.

59. Simmonds P, Smith DB. 1999. Structural constraints on RNA virusevolution. J. Virol. 73:5787–5794.

60. Smith DB, et al. 1997. The origin of hepatitis C virus genotypes. J. Gen.Virol. 78(Pt 2):321–328.

61. Starr D. 1999. Blood: an epic history of medicine and commerce.Little,Brown & Company, London, United Kingdom.

62. Suchard MA, Kitchen CMR, Sinsheimer JS, Weiss RE. 2003. Hierarchi-cal phylogenetic models for analyzing multipartite sequence data. Syst.Biol. 52:649 – 664.

63. Suchard MA, Weiss RE, Sinsheimer JS. 2001. Bayesian selection of continuous-time Markov chain evolutionary models. Mol. Biol. Evol. 18:1001–1013.

64. Swofford D. 2003. PAUP*. Phylogenetic analysis using parsimony (*andother methods), version 4.Sinauer Associates, Sunderland, MA.

65. Tanaka Y, et al. 2004. Exponential spread of hepatitis C virus genotype 4ain Egypt. J. Mol. Evol. 58:191–195.

66. Tanaka Y, et al. 2005. Molecular evolutionary analyses implicate injectiontreatment for schistosomiasis in the initial hepatitis C epidemics in Japan.J. Hepatol. 42:47–53.

67. Tee KK, et al. 2009. Estimating the date of origin of an HIV-1 circulatingrecombinant form. Virology 387:229 –234.

68. Tuplin A, Wood J, Evans DJ, Patel AH, Simmonds P. 2002. Thermo-dynamic and phylogenetic prediction of RNA secondary structures in thecoding region of hepatitis C virus. RNA 8:824 – 841.

69. van de Laar TJ, et al. 2009. Frequent HCV reinfection and superin-fection in a cohort of injecting drug users in Amsterdam. J. Hepatol.51:667– 674.

70. van der Poel CL. 1999. Hepatitis C virus and blood transfusion: past andpresent risks. J. Hepatology 31(Suppl 1):101–106.

71. WHO. 2003, posting date. Global alert and response (GAR): hepatitis C.WHO, Geneva, Switzerland.

72. Worobey M, Holmes EC. 1999. Evolutionary aspects of recombination inRNA viruses. J. Gen. Virol. 80:2535–2543.

Raghwani et al.

2220 jvi.asm.org Journal of Virology