promises and challenges of next generation sequencing for hiv and hcv

47
The UC San Diego AntiViral Research Center sponsors weekly presentations by infectious disease clinicians, physicians and researchers. The goal of these presentations is to provide the most current research, clinical practices and trends in HIV, HBV, HCV, TB and other infectious diseases of global significance. The slides from the AIDS Clinical Rounds presentation that you are about to view are intended for the educational purposes of our audience. They may not be used for other purposes without the presenter’s express permission. AIDS CLINICAL ROUNDS

Upload: uc-san-diego-antiviral-research-center

Post on 04-Jul-2015

1.934 views

Category:

Health & Medicine


4 download

DESCRIPTION

Sergei L. Kosakovsky Pond, PhD (UC San Diego AntiViral Research Center) presents "Promises and Challenges of Next Generation Sequencing for HIV and HCV"

TRANSCRIPT

Page 1: Promises and Challenges of Next Generation Sequencing for HIV and HCV

The UC San Diego AntiViral Research Center sponsors weekly presentations by infectious disease clinicians, physicians and researchers. The goal of these presentations is to provide the most current research, clinical practices and trends in HIV, HBV, HCV, TB and other infectious diseases of global significance. The slides from the AIDS Clinical Rounds presentation that you are about to view are intended for the educational purposes of our audience. They may not be used for other purposes without the presenter’s express permission.

AIDS CLINICAL ROUNDS

Page 2: Promises and Challenges of Next Generation Sequencing for HIV and HCV

January 11, 2013

Promises and Challenges of Next Generation Sequencing for HIV and HCVSergei L Kosakovsky Pond, PhD. Associate Professor, UCSD Department of Medicine.

Page 3: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Outline

✤ Next generation / Ultradeep sequencing (NGS/UDS) technology

✤ NGS applications for HIV and HCV

✤ What are the unique advantages of NGS?

✤ What are the limitations of NGS?

✤ Clinical relevance of NGS-based assays

✤ Regulatory approval

Page 4: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Genomic sequencing

✤ In the recent years, sequencing (DNA, RNA) has rapidly become the cheapest and fastest assays in many applications

✤ Sub-$1000 human genome very shortly.

http://www.genome.gov/sequencingcosts/

NGS (Solexa) introduced commerically

Page 5: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Is NGS relevant for medicine?

✤ In 2012, 6 out of TIME magazine’s Top 10 Medical Breakthroughs relied on NGS

1 The ENCODE project (non-coding DNA)

2 The Human Microbiome Project6 Cancer Genome Atlas

7 Neo-/pre-natal screening for rare diseases

8 Pediatric Cancer Diagnostics

10 P. acnes phage characterization

Page 6: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Next generation sequencing

✤ Traditional (Sanger) sequencing generates a small number of intermediate length reads (~1000 bp)

✤ All NGS technologies perform millions of parallel sequencing reactions to generate many, typically short, reads per run.

✤ Two canonical applications for NGS

✤ Assembling long sequences from short fragments (human genome, cancer)

✤ Characterizing diverse populations (HIV, HCV, immune repertoire, metagenomics)

Page 7: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Platform comparison

Instrument First introduced Output per run Run-time Use in HIV/

HCV settings

Roche 454 FLX+/ Junior 2005

105-106

400-700bp reads

10-20 hrs Extensive (>300 papers)

Illumina HiSeq/MiSeq 2007 107-109

36-250bp reads7 hrs - 11 days Limited (~30

papers)

Life SciencesIonTorrent 2010 105-107

35-400 bp reads1-8 hrs Limited (<10

papers)

Pacific Biosciences PacBioRS

2011104-105

1000-10000 bp reads

1-2 hrs Limited (<10 papers)

Page 8: Promises and Challenges of Next Generation Sequencing for HIV and HCV

✤ Being able to characterize HIV-1 populations rapidly and accurately is important for understanding pathogenesis, interplay between viruses and humoral responses, and the evolution of drug resistance

✤ Both HIV-1 and HCV exist as viral quasispecies in a host, i.e. many distinct viral strains are circulating at any given moment in time

✤ NGS has the potential to directly sequence many such strains

✤ Using multiplexing (multiple samples/run), high throughput can be achieved

Characterizing viral diversity within a host

Page 9: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Characterizing minority DRAMs

✤ Perhaps the clearest clinical application of NGS for HIV and HCV.

✤ Already know what mutations we are looking for (e.g. K103N).

✤ Which mutations are real?

✤ Sequencing error

✤ Assay error / reproducibility

✤ What frequency of mutations matter clinically?

Page 10: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Drug resistance associated mutations (DRAMs)

✤ Using bulk-sequencing (standard tests): all viral strains from a biological sample are PCR amplified and sequenced together

✤ Generates a “population” virus sequence that may hide mutations present in minority variants

✤ The basis of all current FDA approved sequencing tests

✤ Ambiguous peaks on the electropherogram reflect mixed populations

✤ Can detect minority variants at frequencies ≥20%

Page 11: Promises and Challenges of Next Generation Sequencing for HIV and HCV

A T G T G C T G C C A C A G G G A T G G A A A G G A T C A C C A G C A A T A T T C C A A T G T A G C A T G A C G A A A A T C T T A G A G C C T T T T A G A A A A C A A A A T C C A G A A A T ABULK9590858075706560555045403530252015105

G T T A T A T A T Y A A T A C A T G G A T G A T T T G T A T G T G G G A T C T G A C T T A G A A A T A G G R CBULK150145140135130125120115110105100

Mixed bases

✤ Are we missing lower frequency variants?

✤ Do all four combinations of resolved mixtures (CA, CG, TA, TG) actually exist in the sample?

Bulk sequence

Page 12: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Cloning/Single genome sequencing✤ Cloning or limiting dilution PCR followed by Sanger sequencing:

single genome sequencing (SGS)

✤ Generates ~10-100 sequences; how representative is this of the entire population?

pNL4-­‐3p6-­‐rt

AB819  9  12-­‐11-­‐2002

AB958  13  11-­‐6-­‐2002

AB958  12  11-­‐6-­‐2002

AB570  12  12-­‐13-­‐2002

AB819  4  12-­‐11-­‐2002

AB958  9  11-­‐6-­‐2002

AB570  11  12-­‐13-­‐2002

AB819  6  12-­‐11-­‐2002

AB958  17  11-­‐6-­‐2002

AB570  4  12-­‐13-­‐2002

AB819  3  12-­‐11-­‐2002

AB819  8  12-­‐11-­‐2002

AB958  5  11-­‐6-­‐2002

AB570  13  12-­‐13-­‐2002

AB570  9  12-­‐13-­‐2002

AB595  33  2-­‐20-­‐1997

AB595  17  2-­‐20-­‐1997

AB595  16  2-­‐20-­‐1997

AB595  12  2-­‐20-­‐1997

AB595  29  2-­‐20-­‐1997

Page 13: Promises and Challenges of Next Generation Sequencing for HIV and HCV

✤ Now have 3 variants / 20 clones

✤ Are we still missing lower frequency variants?

✤ Would we get the same counts if the experiment were repeated?

Cloning/SGS

Clone_0

Clone_19

Clone_1

Clone_2

Clone_3

Clone_4

Clone_5

Clone_6

Clone_7

Clone_8

Clone_9

Clone_10

Clone_11

Clone_12

Clone_13

Clone_14

Clone_15

Clone_16

Clone_17

Clone_18

0.01

Page 14: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Cloning/SGS

Clone_0

Clone_19

Clone_1

Clone_2

Clone_3

Clone_4

Clone_5

Clone_6

Clone_7

Clone_8

Clone_9

Clone_10

Clone_11

Clone_12

Clone_13

Clone_14

Clone_15

Clone_16

Clone_17

Clone_18

0.01

✤ Sampling variance could be quite high.

Clone_0

Clone_1

Clone_2

Clone_3

Clone_4

Clone_5

Clone_6

Clone_8

Clone_9

Clone_10

Clone_11

Clone_12

Clone_13

Clone_14

Clone_15

Clone_16

Clone_17

Clone_7

Clone_19

Clone_18

0.001

Replicate 1

Replicate 2

Page 15: Promises and Challenges of Next Generation Sequencing for HIV and HCV

NGS approach

✤ Prepare amplicons, e.g. Blood → HIV RNA → cDNA → PCR 3 regions

✤ Multiplex multiple samples/regions on the plate

✤ Obtain 1000s of reads / sample from a single run 454 Junior

Library Prep

emulsion PCR

Sequencing

Data analysis

Env: C2-V3-C3(416 bp)

Pol: RT(534 bp)

Gag: p24(253 bp)

PacBio RS

Page 16: Promises and Challenges of Next Generation Sequencing for HIV and HCV

>FYJLQU001AI1WJ rank=0036132 x=99.0 y=3537.0 length=250GGACATCAAGCAGCCATGCAAATGTTAAAAGAGACCATCAATGAG...>FYJLQU001AI1WJ rank=0036132 x=99.0 y=3537.0 length=25028 28 28 35 37 37 37 37 37 35 33 33 35 35 35 ...

>FYJLQU001AWHGJ rank=0036147 x=252.0 y=3537.5AAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAGACAGT...>FYJLQU001AWHGJ rank=0036147 x=252.0 y=3537.5 length=35421 18 18 32 33 33 35 35 35 35 25 27 31 28 31 ...

FASTQ output which needs to be converted to interpretable results: 10,000 - 1,000,000 of records like this

Massive data sets: needs tools to analyze

Quality informatics tools are essential.

Page 17: Promises and Challenges of Next Generation Sequencing for HIV and HCV

NGS/454

✤ 9 variants identified.

✤ Would need >200 clones to detect lowest frequency ones reliably.

Page 18: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Sources of error

Library Prep

emulsion PCR

Sequencing

Data analysis

Viral template resamplingPCR recombinationPCR error

PCR errorMultiple templates on a bead

Base calling errorsDetection errors

Software limitationsImproper statistical analyses

Page 19: Promises and Challenges of Next Generation Sequencing for HIV and HCV

454 sequencing error rates

✤ Sequencing clonal populations of bacetriophages measured a sequencing error of 0.25% per base.

✤ Most common errors are homopolymer runs that are too long or too short, e.g. AAAA could be reported as AAA or AAAAA.

✤ Solution: We developed an algorithm to map reads to “reference sequences” (e.g. subtype-specific HIV/HCV sequences or germline IgG alleles) which corrects for most of such errors.

✤ Many such algorithms exist; we are currently conducting a rigorous comparison among them.

Page 20: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Correcting sequencing error

✤ If one has 10000 reads covering a 400 bp amplicon and the reported sequencing error rate is a uniform 1%, then, on average, ✤ each read will have 4 errors ✤ each nucleotide position will have 100 (random) mutations

✤ Just because a sequencer reports the presence of a mutation, that does not meet that the mutation is real.

✤ We (and other groups) have developed statistical models and algorithms than can reliably detect minority variants at 0.25-0.5% frequencies, given sufficient coverage.

Page 21: Promises and Challenges of Next Generation Sequencing for HIV and HCV

UCSD processing pipeline site report

Real

Instrument error

http://www.datamonkey.org

Page 22: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Experimental error

✤ In order to detect low frequency variants, we need a lot of input templates (e.g. high viral load).

✤ For few input templates, NGS could create a sense of false depth, by resampling the same templates over and over again.

✤ PCR amplification biases can cause allelic skewing (inflate or decrease frequencies of specific variants)

Page 23: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Reproducibility

Gianella et al, 2011 J Virol

PIDHXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)

PID 65 100 103 106 179 181 184 188 190 215 230I4 1 0.19 0.23 0.33 99.77 0.05 0.00 0.00 0.20 0.15 0.05 0.00 0.00I4 2 0.18 0.18 0.00 99.32 0.10 0.00 0.06 0.25 0.06 0.00 0.08 0.00J6 1 0.27 0.33 0.00 0.00 0.10 0.00 0.00 0.30 0.40 0.20 0.00 0.00J6 2 0.18 0.60 0.00 0.00 0.20 0.00 0.00 1.90 0.00 0.38 0.00 0.00L3 1 0.17 0.12 0.00 IR 0.00 0.00 0.00 0.14 0.57 0.00 0.00 0.00L3 2 0.22 0.57 0.13 4.55 0.04 0.00 0.10 0.24 0.14 0.00 0.10 0.00R2 1 0.38 0.00 0.00 17.74 0.00 0.00 0.00 0.00 0.81 0.00 0.00 IRR2 2 0.27 2.27 0.00 IR 0.00 0.23 0.00 0.89 0.22 0.00 0.00 IRR6 1 0.23 0.15 0.00 0.00 0.10 0.00 0.00 0.14 0.07 0.07 0.00 0.00R6 2 2.36 0.17 0.09 1.19 0.00 0.00 0.00 0.30 0.00 0.11 0.00 0.00U1 1 0.27 0.20 0.00 IR 0.00 0.00 0.17 0.17 0.00 0.00 0.00 0.64U1 2 0.34 0.00 0.00 IR 0.00 0.00 0.00 0.00 0.19 0.00 0.00 0.00U6 1 0.25 0.00 0.00 0.00 0.20 0.00 0.00 0.25 0.08 0.00 0.00 0.00U6 2 0.10 0.84 0.00 0.15 0.34 0.00 0.09 0.36 0.00 0.27 0.00 0.00U7 1 0.35 0.00 0.00 100 0.00 0.00 0.25 0.25 0.00 0.00 0.00 0.63U7 2 0.14 0.61 0.00 100 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00

Page 24: Promises and Challenges of Next Generation Sequencing for HIV and HCV

One possible solution: Primer ID

✤ Tag each template with a random sequence tag/Primer ID in the cDNA primer.

✤ Use the sequence tag/Primer ID to identify PCR resampling.✤ Use the resampled sequences to create a consensus sequence.✤ Use the number of sequence tags/Primer IDs to define the number of

templates.

Jabara C et al PNAS 2011

Page 25: Promises and Challenges of Next Generation Sequencing for HIV and HCV

✤ Creating a consensus sequence for each resampled template using Primer ID mitigates error from PCR and sequencing

ATGACGTC%

ATGACGTC%

ATGACGTC%ATGACGTC%ATGACGTC%

ATGACGTC%

ATGACGTC%

ATGACGTC%

Resampled)Templates)with)PCR)and)Sequencing)Errors) ) )Primer)ID)

Jabara C et al PNAS 2011

Page 26: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Good reproducibility between runs

y = 0.9943x R² = 0.80872

0

5

10

15

20

25

0 5 10 15 20 25

Ru

n 1

Run 2

Ron Swanstrom (pers. comm.)

Page 27: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Lowering the limit of detectionFisher et al J Virol 2012

TABLE 2 Resistance detected with bulk sequencing during first-line (bulk sequencing) and second-line (bulk sequencing and UDPs) failure

Patientno.

Mutation(s) during first-line NNRTI failuredetected by bulk sequencinga

Mutation(s) during second-line PI failure detected by:b

Bulk sequencing UDPS (frequency [%])

Reverse transcriptase ProteaseReversetranscriptase Protease NRTI NNRTI PI

1 A62V, M184I, V108I,Y181C, H221Y

M46I, L89M, I93L None M36I, L89M, I93L K65R (1.1), D67N (0.9), D67E (0.9),K219R (0.7)

V90I (0.8), A98E (0.9),K101E (5.9), K103R(44.7), K103N (5.1),K103E (3.2), V179I(0.5), Y181C (0.6),F227L (0.8), F227S(0.6), K238R (0.5)

I54T (0.5), M36I (99.3),L63P (0.9), L89M(99.3), I93L (99.7)

2 M184V, V106M M36I, L63P,L89M, I93L

None M36I, L63P,L89M, I93L

K65R (3.8), D67N (8.4), F77L (0.7),M184V (8.0), L210S (0.7), T215(0.8)

V90I (0.8), K101R(0.7), K103R (2.2),Y181C (1.9), F227S(0.5)

L23P (0.5), M36I (98.7),L63P (99.6), L89M(99.5), I93L(99.1)

3*c M184V, V90I, K103N,Y181C

K20R, M36I, L63P,L89M, I93L

None K20R, M36I,L89M, I93L

V118A (0.5)*, K219E (0.5)* V179I (5.8) I54M (0.7)*, I84V (0.9)*,K20R (98.3), M36I(98.6), I62V (1.2),L63P (2.0), A71T (1.3),L89M (99.6), I93L(99.8)

4 M184V, V108I, Y181C,H221Y

K20R, M36I,D60E, L89M,I93L

None K20R, M36I, D60E,L89M, I93L

K65R (2.7), D67N (1.5), F116S (0.5),M184V (2.6)

K101E (0.6), K101R(0.6), P225T (1.5),F227L (0.6), K238T(3.3), K238R (0.7)

M46V (0.7), F53L (0.6),F53S (0.5), K20R(73.5), M36I (67.5),M36L (31.6), D60E(59.3), L63P (38.0),L89M (79.0), I93L(99.7)

5 M184V M36L, L63P, I93L K103N M36L, L63P, I93L K65R (1.6), K65E (0.5), D67N (2.6),M184V (3.1)

K101E (1.3), K101R(0.6), K103N (55.2),V179D (1.0), P225T(1.0), F227L (0.6),F227S (0.6), K238T(2.9)

F53L (0.7), N88S (0.7),N88D (0.6), K20R(0.5), M36I (3.0),M36L (96.5), I93L(99.7)

6 M184V, K103N D60E, L63P, I93L None D60E, L63P, L93L K65R (1.0), K65E (0.6), K219E (0.5) K103E (0.8), G190E(0.7)

V82A (0.6), K20R (34.5),M36I (28.1), D60E(97.9), I62V (0.7),L63P (81.2), L93L(99.6)

7 M41L, K65R, V75I, M184V,K103R, V179D

M36I, L63S, T74S,I93L

V179D M36I, L63S, T74S,I93L

K65R (2.2), T215A (0.7), K219R (0.7),K219E (0.5)

V90I (1.0), V179D(6.1)

M36I (93.2), D60E (8.1),L63S (90.7), L63P(9.0), T74S (90.2), I93L(99.6)

a For the NNRTI failure episode, NRTI mutations are in roman, and NNRTI mutations are in italics.b For UDPS mutations, major PI resistance mutations are shown in bold, accessory mutations are in italics, and other amino acid variants at PI resistance loci are in roman type.c Asterisks indicate the detection of minor variants below the predicted threshold, based on the sample input (viral load of 520 copies/ml).

Fisheret

al.

6234jvi.asm

.orgJournalof

Virology

Page 28: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Clinical relevance

✤ NGS-based assays will detect many more DRAMs than current tests.✤ Multiple studies provide evidence that SOME low level NRTI and NNRTI DRAMs are

associated with subsequent virologic failure (also for FI)✤ Picture less clear with PI, likely due to the polyallelic nature of resistance✤ II to be investigated directly as are HCV antivirals✤ “The extent to which the detection of low-abundance DRMs will affect patient management is

still unknown but it is hoped that use of such an assay in clinical practice, will help resolve this important question”

Evaluation of a Bench-Top HIV Ultra-Deep Pyrosequencing Drug-Resistance Assay in the Clinical LaboratoryAvidor et al J Clin Microbiol 2013.

Page 29: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Tropism analysis using NGS

✤ Because NGS provide sequences, one can ask questions that require the knowledge of the entire sequence.

✤ CCR5 vs CXCR4 usage has implications for treatment (with fusion inhibitors), and clinical outcomes

Page 30: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Tropism analysis, clinical relevance

✤ Can either be measured experimentally (e.g. Enhanced Sensitivity Trofile Assay, ESTA), or by computational analyses of env V3 loop sequences (e.g. Geno2Pheno)

✤ Low level (e.g. 2%) X4 variants are predictive of FI failure, e.g. in the Maraviroc versus Efavirenz in Treatment-Naive Patients (MERIT) study

Swenson L C et al. Clin Infect Dis. 2011;53:732-742

N=312N=35

Page 31: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Does the choice of platform matter✤ Largely, no.

Archer et al PLoS ONE 2012

Page 32: Promises and Challenges of Next Generation Sequencing for HIV and HCV

High throughput dual infection detection

✤ Blood → HIV RNA → cDNA → PCR 3 regions

✤ Sequenced 16 samples concurrently on single 454 GS FLX Titanium plate

✤ Processed reads (~5 mins/patients on a computer cluster) and generated phylogenies

✤ Interpreted nucleotide diversity > 2% (RT, gag) and > 5% (env), confirmed by phylogenetic bootstrap, as evidence of dual infection

Env: C2-V3-C3(416 bp)

Pol: RT(534 bp)

Gag: p24(253 bp)

Pacold et al ARHR 2010

Page 33: Promises and Challenges of Next Generation Sequencing for HIV and HCV

identified samples A, B, C, E, F, and G as singly infected(Supplementary Figs. 1–6 and 11–13; Supplementary Data areavailable online at www.liebertonline.com/aid) and samplesD1, D2, H, and I as dually infected (Fig. 2 and SupplementaryFigs. 7–10 and 14–16). DI results specific to the coding regionsof each sample are shown in Table 2.

For nearly all the samples, the high read coverage of UDSidentified greater maximum divergence than SGS (Table 2).Duplicate UDS runs performed on the same sample cDNA forthe same coding regions agreed in DI status for all 20 cases.Combined phylogenies of UDS and SGS for each sample areshown in Figure 2 and the supplemental figures. The onesample (H) in which the divergence found by SGS in both C2–V3 and RT exceeded that of UDS was the sample with thelowest viral load tested, 1113 HIV RNA copies/ml, in whichthe calculated input copy number that was interrogated byUDS was only 52.3. UDS of the gag p24 region identified DIonly for sample I, which had the highest SM-Index of thecohort and was also the only sample whose UDS and SGS ofthe C2–V3 and RT coding regions both identified DI (Fig. 2).

Cost and time analyses

We estimated cost and time per sample for SM-Index, SGS,and UDS based on a batch of 16 samples (corresponding to asingle UDS run). The cost per sample for population-based pol

sequence was $278.18, for SGS of two coding regions$2,646.39, and for UDS of three coding regions $1,075.10.Costs of each sequencing type are summarized in Table 3. Ittook 3 hours to produce one sample’s population-based polsequence, 42 hours for one sample’s SGS, and 9.5 hours forone sample’s UDS. Cost and time estimates for parallel stepslike RNA extraction are highly throughput-dependent. UDScan be customized to produce fewer reads per sample at alower cost. As previously noted,11 many factors (such as pricereductions related to quantity) influence cost estimates andmay cause large price differences for experiments using thesame technologies.

Discussion

Systematic identification of HIV DI in large cohorts haspreviously relied on a variety of screeningmethods, includingpopulation-based sequencing analysis from different timepoints,2 counting sequencing ambiguities,9 heteroduplexmobility assays,29 and molecular analysis of a single codingregion.2 Single genome sequencing is the current standard toidentify distinct strains in a viral population; however, SGS istoo slow, expensive, and labor-intensive to be used as ascreening method for the presence of DI in hundreds orthousands of biological samples. In this study, two alternativemethods to detect DI were assessed. The SM-Index identified

FIG. 2. Sample I, UDS duplicate 1. First year of infection. DI in env, pol, and gag. UDS are represented as red circles and SGSas blue squares. Variant abundances per node and branches with >90% bootstrap support are labeled.

DETECTION OF HIV DUAL INFECTION 1295

Pacold et al ARHR 2010

Page 34: Promises and Challenges of Next Generation Sequencing for HIV and HCV

High throughput dual infection detection

SGS:

UDS:

25 reads per sample-region

4,650 reads per sample-region

A B C D1 D2 H

A B C D1 E F D2 G H

E F GLow viral

load

✤ For all dually infected samples, UDS identified a greater within-sample divergence than SGS.

✤ Samples E and F both had divergence exceeding the DI threshold, but only Sample F exhibited DI-like population structure.

✤ UDS required 40% of the cost and 20% of the time for SGS.

“Gold-standard”

Pacold et al ARHR 2010

Page 35: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Method comparison

SGS NGS

Robustness for confirming DI High High

Throughput potential Low High

Labor High Medium

Time High Low

Cost High Medium (and dropping)

Page 36: Promises and Challenges of Next Generation Sequencing for HIV and HCV

San Diego Primary Infection Cohort

L537

Q294

U189

Months after initial infection

12 24 36

N112

D224

K613

K908

P265

P853

S155

U796

Months after initial infection

12 24 36

4 CI!!!!!7 SI!

1 strain detected!2 strains detected!

✤ Samples sequenced to date show a prevalence of DI of 11/61 = 18%.

✤ Of the 7 SI cases:✤ 5 were SI in the first year of initial

infection (incidence: 8.2%)✤ 2 in the second year (incidence:

3.3%)

✤ Dual infections are much more frequent than expected.

Pacold et al AIDS 2012

Page 37: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Viral Dynamics of SI CasesSubject( Coding(Regions( Ini2al( Superinfec2ng( Recombinant(

1((K6)(RT# Replaced# Persists# Not#Detected#

C24V3# Replaced# Persists# Not#Detected#

2((K9)(RT# Replaced# Persists# Not#Detected#

C24V3# Replaced# Persists# Not#Detected#

3((D2)(RT# Persists# Persists# Persists#

C24V3# Persists# Transient# Persists#

4((P2)(RT# Persists# Not#Detected# Not#Detected#

C24V3# Persists# Persists# Persists#

5((P8)(RT# Persists# Transient# Not#Detected#

C24V3# Persists# Transient# Transient#

6((S1)(RT# Persists# Transient# Persists#

C24V3# Replaced# Transient# Persists#

7((U7)(RT# Persists# Persists# Transient#

C24V3# Persists# Transient# Transient#

Page 38: Promises and Challenges of Next Generation Sequencing for HIV and HCV

4 6 8 10 12

100

200

300

400

K6 (p = 0.35)

4 6 8 10 12 14 16

010

020

030

0K9 (p = 0.10)

2 4 6 8 10 14

050

100

150

D2 (p = 0.0026)

Sqrt

(vira

l loa

d)

5 10 15 20 25 30 35

200

300

400

500

P2 (p = 0.66)

5 10 15 20 25 30 35

010

020

030

040

0P8 (p = 0.093)

5 10 15 20

5010

015

020

0

S1 (p = 0.0061)

2 4 6 8 10

050

010

0015

00

U7 (p = 0.0044)

EDI, months

Viral load dynamics for seven super-infected

patients

Open circle - beforeShaded circle - after

p-values are for the presence of a structural

shift

Clinical consequences

Page 39: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Molecular epidemiology of HIV-1

✤ Because HIV is a measurably evolving pathogen that accumulates sequence diversity within hosts at rates as high as 1-2% per year within the polymerase (pol) gene, viral sequences are nearly unique to each infected person.

✤ This distinct feature of the virus allows one to interrogate sequences for evidence of recent relatedness, and thus infer potential transmission links.

Page 40: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Establishing links

✤ Putative transmission links are established if the genetic distance between two pol sequences is below a threshold D (e.g. 1.5%)

✤ Median intra-subtype pairwise genetic distance is ~5%, and the probability that two randomly selected HIV-1 subtype B sequences are ≤1.5% distant is very low (p = 0.0022 for the SD AEH cohort and p = 0.0002 for a random sample)

San Diego Acute and Early Cohort

Den

sity,

AU

0 5 10 15

0.0

0.1

0.2

0.3

0.4

0.5

0.0 0.5 1.0 1.5

0.0

0.4

0.8

Random database sample

Den

sity,

AU

0 2 4 6 8 10 12 14

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.0 0.5 1.0 1.5

0.0

1.0

Page 41: Promises and Challenges of Next Generation Sequencing for HIV and HCV

San Diego HIV molecular network (bulk sequences)

Direction resolvedbased on EDI

Viral load, log10 (copies/ml)

N/A

1.5-2.5

2.5-3.5

3.5-4.5

4.5-5.5

5.5-6.5

>6.5

TNS < 0.8

2

2

2

3

2

7

2

1912

2

2

74

2

2

2

2

2

19

2

2

6

2

2

3

2

3

2 2

5

2

21

2

2

122

2

2

2

3

2

2

2

10

2 2

2

22 2

10

2

TNS ! 0.8

N Number of timepoints (if > 1)

Page 42: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Linking transmission partners using NGS

✤ Because a substantial proportion of individuals may be multiply infected, we need to be able to draw links between minority populations.

✤ NGS data have been used in HPTN 052 (to confirm transmission links between serodiscordant couples)

Page 43: Promises and Challenges of Next Generation Sequencing for HIV and HCV

A denser network of connections

✤ 64 new edges and 16 new nodes (a yield of ~1 connection / 2 NGS samples) were added to the network,

✤ The inclusion of NGS data ✤ increased the size of the largest

cluster from 62 to 156 nodes ✤ increased the number of “hubs” by

7 (from 51 to 58).

Page 44: Promises and Challenges of Next Generation Sequencing for HIV and HCV

It pays to target highly connected nodes

Degree = 7

Degree = 1

Degree = 7

Degree = 1

Targeting a low degree node has a local effect

Targeting a high degree node has a global effect

Concept Contact Network Transmission network

Node Individual HIV+ individual

Edge A contact that could lead to HIV transmission, e.g. sexual, shared needle

Transmission event

Degree = edges connected to a node

Number of contacts associated with a node

Number of transmissions associated with a node

HIV+ HIV-Contact w/o tranmission

Transmission

Degree = 7

Degree = 1

Degree = 3

Transmission network is a subset of the contact network

Page 45: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Regulatory approval: the bad news

✤ No NGS platforms have been cleared/approved by FDA

✤ No standards to use for comparison

✤ No clear agreement on bioinformatics handling

✤ Lack of proficiency panels and reference materials

✤ Rapid change

Page 46: Promises and Challenges of Next Generation Sequencing for HIV and HCV

Regulatory approval: the good news

✤ The industry, academia, and agencies (FDA, CAP, NCBI, etc) are actively collaborating on the issue

✤ Informatics rapidly improving and stabilizing

✤ Clinical relevance studies are ongoing

✤ This is primarily driven by human genomic applications, so HIV/HCV applications will benefit from the larger effort

✤ The Forum on Collaborative HIV research has held a series of roundtables to discuss issues relevant to HIV/HCV research, including the “Next Generation Sequencing Roundtable” in December 2012.

Page 47: Promises and Challenges of Next Generation Sequencing for HIV and HCV

AcknowledgementsUCSDDavey SmithJason YoungSara Gianella WeibelSusan LittleDouglas RichmanRichard HaubrichGabe WagnerLance Hepler

UBCRichard HarriganArt FY PoonLife IncMary Pacold