the first steps in adaptive evolution

9
NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005 337 The usual suspects Using the traditional dragnet of genome- wide linkage analysis and candidate gene association studies, only a tiny fraction of the genetic culprits in common disease have been identified. In the face of uncertainty, two distinguishable but complementary models drive much of today’s research. The first, in analogy to mendelian disease, is based on the assumption that disease can be caused by a heterogeneous collection of individually rare, highly penetrant muta- tions. To find such alleles, deep resequencing in cases must be done 1,2 to identify muta- tions. Until resequencing becomes much cheaper and faster, and until functional noncoding changes can be identified from primary sequence data, this approach will probably be restricted to culprits lurking in the coding regions of candidate genes who carry smoking guns 3 . Second is the idea of comprehensively testing common variation for association to disease. This approach is motivated by the observations that most human heterozygosity is due to common ancestral polymorphisms and that common, late-onset, environmen- tally triggered diseases might not have been disadvantageous during human history. Two trends make it practical to start carrying out hypothesis-free, genome-wide association studies 4–6 : the public catalog of common sequence variants now contains more than nine million SNPs, with more than one mil- lion typed in population samples to deter- mine frequencies and correlations 7,8 and SNP genotyping technology is becoming sufficiently high-throughput, accurate and affordable so as to allow large collections of markers to be tested without restriction to hypotheses about the identities of candidate genes or whether the changes are coding or regulatory 4–6 . On the basis of the very limited association studies done to date, a modest number of common variants have reproducible influ- ences on common diseases 9–11 . In all con- firmed cases, the individual effect of each variant is weak, on its own explaining only a tiny fraction of the overall heritability of disease. Conspiracy theories Common variants with small individual effects might contribute more substantially to disease risk through nonadditive interac- tions among loci (epistasis). Such a model raises the concern that by examining only a single locus at a time, these effects might be missed. Because only very common variants will be found in combination at a measurable frequency, the study of gene-gene interac- tions in common disease is implicitly most relevant to the second approach of studying common variants. Partners in crime Mark J Daly & David Altshuler The genetic culprits that contribute to common diseases remain at large, despite dedicated sleuthing by many laboratories. A new study evaluates the power of genome-wide searches for variants acting in combination, with results that are both unexpected and encouraging. Mark J. Daly is at the Massachusetts General Hospital, Harvard Medical School, the Broad Institute of Harvard and MIT, and the Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. David Altshuler is at the Massachusetts General Hospital, Harvard Medical School, and the Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. e-mail: [email protected] or [email protected] Cartoon by Sean Taverna NEWS AND VIEWS © 2005 Nature Publishing Group http://www.nature.com/naturegenetics

Upload: sarah-p

Post on 21-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005 337

The usual suspectsUsing the traditional dragnet of genome-wide linkage analysis and candidate gene association studies, only a tiny fraction of the genetic culprits in common disease have been identified. In the face of uncertainty, two distinguishable but complementary models drive much of today’s research. The first, in analogy to mendelian disease, is based on the assumption that disease can be caused by a heterogeneous collection of individually rare, highly penetrant muta-tions. To find such alleles, deep resequencing in cases must be done1,2 to identify muta-tions. Until resequencing becomes much cheaper and faster, and until functional noncoding changes can be identified from primary sequence data, this approach will probably be restricted to culprits lurking in the coding regions of candidate genes who carry smoking guns3.

Second is the idea of comprehensively testing common variation for association to disease. This approach is motivated by the observations that most human heterozygosity is due to common ancestral polymorphisms and that common, late-onset, environmen-tally triggered diseases might not have been disadvantageous during human history. Two

trends make it practical to start carrying out hypothesis-free, genome-wide association studies4–6: the public catalog of common sequence variants now contains more than nine million SNPs, with more than one mil-lion typed in population samples to deter-mine frequencies and correlations7,8 and SNP genotyping technology is becoming sufficiently high-throughput, accurate and affordable so as to allow large collections of markers to be tested without restriction to hypotheses about the identities of candidate genes or whether the changes are coding or regulatory4–6.

On the basis of the very limited association studies done to date, a modest number of common variants have reproducible influ-ences on common diseases9–11. In all con-

firmed cases, the individual effect of each variant is weak, on its own explaining only a tiny fraction of the overall heritability of disease.

Conspiracy theoriesCommon variants with small individual effects might contribute more substantially to disease risk through nonadditive interac-tions among loci (epistasis). Such a model raises the concern that by examining only a single locus at a time, these effects might be missed. Because only very common variants will be found in combination at a measurable frequency, the study of gene-gene interac-tions in common disease is implicitly most relevant to the second approach of studying common variants.

Partners in crimeMark J Daly & David Altshuler

The genetic culprits that contribute to common diseases remain at large, despite dedicated sleuthing by many laboratories. A new study evaluates the power of genome-wide searches for variants acting in combination, with results that are both unexpected and encouraging.

Mark J. Daly is at the Massachusetts General Hospital, Harvard Medical School, the Broad Institute of Harvard and MIT, and the Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. David Altshuler is at the Massachusetts General Hospital, Harvard Medical School, and the Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. e-mail: [email protected] [email protected]

Car

toon

by

Sean

Tav

erna

NEWS AND V IEWS©

2005

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

egen

etic

s

NEWS AND V IEWS

338 VOLUME 37 | NUMBER 4 | APRIL 2005 | NATURE GENETICS

Large-scale (ultimately, genome-wide) data collection makes it possible to carry out unbi-ased screens for combinations of unlinked genetic variants that may together predispose to or protect from disease12–14. An overrid-ing concern in the development of these approaches has been whether the penalty due to testing a vast number of hypotheses might, even in the presence of true interactions, decrease rather than increase the overall power of the study to discover true associations.

To evaluate this set of questions, on page 413 of this issue, Jonathan Marchini and col-leagues15 present simulated genome-wide association studies under a range of scenar-ios and analytical approaches. The authors focused on biologically motivated scenarios of gene-gene interactions: (i) models in which disease risk increases multiplicatively with each additional risk allele, whether at the same or a different locus, and (ii) models in which each allele acts only in the presence of the other factor. In both cases, statistical associa-tions would be observed in single-locus tests, driven partially or entirely (respectively) by a subset of individuals carrying both factors.

The authors compare three distinct ana-lytical strategies, in each case using a correc-tion for multiple testing: (i) locus-by-locus scanning, requiring a stringent genome-wide level of single-locus significance; (ii) complete evaluation of all pairs of loci, declaring success if any given pair surpasses the genome-wide threshold; and (iii) an intermediate, two-stage strategy, first carrying out single-locus tests and then examining those loci that surpass a much more modest level of significance in all pairwise combinations.

In addition to showing that genome-wide interaction analyses can be computation-ally tractable, these simulations offer several important results. First, across a range of reasonable scenarios, screening all pairwise combinations has comparable or greater power than locus-by-locus testing, despite the substantial penalty for the O(n2) pair-wise tests. Second, for loci with independent main effects as well as interactions, single-locus scanning remains the most powerful approach. In contrast, where locus effects are entirely dependent on alleles at unlinked sites, the pairwise testing procedure was preferable and resulted in good power. These results offer considerable reassurance that, under

reasonable models of gene-gene interactions, epistasis is not an overwhelming barrier to genome-wide association analysis.

Implications for law enforcementThough perhaps unexpected, these conclu-sions are in retrospect quite intuitive. Given a sufficiently well-powered sample, the correctly specified model results in the most power-ful testing procedure, regardless of statisti-cal penalties. This is because the increase in significance for hitting the right model scales more favorably with sample size than does the penalty for multiple comparisons. Thus, multilocus scanning performs best when the effect is entirely due to an allelic combination, whereas single-locus scanning wins when each individual locus has an effect on its own, even if multiplied by other loci.

The authors suggest that the two-stage strategy may be the most powerful of those considered: the locus-by-locus test is done, and all pairs of loci reaching a very modest nominal level of association are examined in pairwise combination. Another natural strat-egy not explicitly considered is to carry out the locus-by-locus screen with a more stringent threshold and then repeat the genome-wide analysis conditional on any individually sig-nificant locus. Such an approach most closely resembles standard practice today, where once individual loci are discovered and confirmed, subsequent scans are routinely examined for interactions with those loci by the addition of n two-locus tests (i.e., each new marker in combination with the confirmed locus). Because both the single-locus tests and tests conditional on confirmed positives will prob-ably remain the first two steps of any whole-genome association ana-lysis, exploration of methods that optimally complement and supplement the results of these tests may pro-vide further insight.

Finally, the authors suggest that the wide-spread irreproducibility of genetic associa-tion results might be explained (at least in part) by gene-gene interactions. Specifically, they point out that if the frequencies of each genetic risk factor vary across populations, and if these variants act in combinations, then the frequencies of specific combinations may vary to an even greater degree, contributing to failure of replication. Although this principle is sound and must certainly have a role, we

offer a word of caution as a counterbalance. Given the very modest statistical significance of most irreproducible association results and the low prior probability for each locus in the genome, it seems to us more likely that statistical fluctuation is the predominant explanation for failure of replication as com-pared to true heterogeneity. We would argue that each indicted (but not yet convicted) genetic culprit should be considered inno-cent until proven guilty beyond reasonable statistical doubt, lest overzealous prosecutors (i.e., investigators)—who have unfortunately strong incentives to claim success in nabbing the crook—claim a gene is guilty even in the face of exculpatory evidence.

At root, a sound approach to apprehend-ing genetic culprits depends on knowing how many there are, how frequent they are in the population and how often they band together to commit crimes. The challenge for the field is to balance the virtue of considering a broad range of scenarios (thereby avoiding overlooking guilty parties) with the risk of inappropriately incarcerating the innocent. Marchini and colleagues reassure us that gene-gene interactions are not show-stoppers in the fight against genetic crime and point the way towards design of analytical approaches that are robust to multiple models of gene effects. Answers about genetic architecture can only come from empirical data, which we are thankfully soon to have in abundance.

1. Botstein, D. & Risch, N. Nat. Genet. 33 Suppl, 228–237 (2003).

2. Cohen, J.C. et al. Science 305, 869–872 (2004).3. Hirschhorn, J.N. & Altshuler, D. J. Clin. Endocrinol.

Metab. 87, 4438–4441 (2002).4. Wang, W.Y., Barratt, B.J., Clayton, D.G. & Todd, J.A.

Nat. Rev. Genet. 6, 109–118 (2005).5. Hirschhorn, J.N. & Daly, M.J. Nat. Rev. Genet. 6, 95–

108 (2005).6. Altshuler, D. & Clark, A.G. Science 307, 1052–1053

(2005).7. Hinds, D.A. et al. Science 307, 1072–1079 (2005).8. The International HapMap Consortium. Nature 426,

789–796 (2003).9. Lohmueller, K., Pearce, C.L., Pike, M., Lander, E. &

Hirschhorn, J.N. Nat. Genet. 33, 177–182 (2003).10. Florez, J.C., Hirschhorn, J. & Altshuler, D. Annu. Rev.

Genomics Hum. Genet. 4, 257–291 (2003).11. Daly, M.J. & Rioux, J.D. Inflamm. Bowel Dis. 10, 312–

317 (2004).12. Nelson, M.R., Kardia, S.L., Ferrell, R.E. & Sing, C.F.

Genome Res. 11, 458–470 (2001).13. Hoh, J. & Ott, J. Nat. Rev. Genet. 4, 701–709

(2003).14. Moore, J.H. & Ritchie, M.D. JAMA 291, 1642–1643

(2004).15. Marchini, J. Donnelly, P. & Cardon, L.R. Nat. Genet.

37, 413–417 (2005).

©20

05 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureg

enet

ics

NEWS AND V IEWS

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005 339

Scientists should avoid using words like ‘miraculous’. But if ever there was a reason to make an exception, resveratrol is it. This small nontoxic molecule from Asian medici-nal herbs and red wine is currently in human clinical trials to treat colon cancer and oral herpes; in rodents, it protects against inflam-matory disorders, stroke, myocardial infarc-tion, spinal cord injury and heart disease and is one of the most effective cancer chemopre-ventive agents known1. No one really knows how resveratrol achieves such feats, but there is little doubt that this knowledge could open new avenues to develop truly revolutionary drugs. Now, on page 349 of this issue, Alex Parker and colleagues2 show that resvera-trol might be effective in treating hereditary polyglutamine (polyQ) disorders such as Huntington disease; moreover, they provide evidence for a mechanism.

STACs come of ageThe foundation for understanding how res-veratrol works came from a small group of researchers examining why yeast cells grow old and which genes, if any, control the pro-cess. They found that genomic instability of DNA repeats is a key cause of yeast aging3 and that overexpression of a gene called SIR2 suppresses genomic instability, leading to an increase in lifespan of ∼30% (ref. 4). Now we know that SIR2 genes are found in all eukary-otes (humans have seven, SIRT1–SIRT7) and that many of them encode NAD+-dependent deacetylases that direct the behavior of target proteins by removing acetyl groups from spe-cific lysines. Overexpression of SIR2 homo-logs also extends the lifespans of organisms that age by mechanisms ostensibly unrelated to those in yeast, namely Caenorhabditis ele-gans5 and Drosophila melanogaster6.

A set of 18 small polyphenolic molecules including resveratrol was recently found to increase the affinity of the SIRT1 enzyme for certain protein targets, probably through an

allosteric mechanism (Fig. 1). These mol-ecules, known as sirtuin-activating com-pounds (STACs), extend the lifespan of yeast, worms and flies in a Sir2-dependent manner. Sir2 and STACs seem to work by increasing cell defenses against stress, through the same or similar pathways as caloric restriction, the strict diet that increases the life expectancy of mammals by delaying diseases of old age such as cancer, heart disease, diabetes and even neurodegeneration7.

Resveratrol versus huntingtinThis brings us to the work of Parker and col-leagues2. There are four main polyQ disor-ders, one of which is Huntington disease, a progressive neurodegenerative disorder that typically presents around age 40 with uncon-trolled movements and cognitive deteriora-

tion. PolyQ disorders are so named because they stem from mutations in DNA repeats of CAG, which encode the amino acid glu-tamine, resulting in a given protein (in this case, huntingtin) having many more sequen-tial glutamines than normal. For reasons that are not known, mutant huntingtin is toxic to cells, leading to neuronal dysfunction and cell death.

Parker and colleagues2 investigated two models of Huntington disease. The first involved overexpressing a pathogenic version of huntingtin in C. elegans touch receptor neurons. In these worms, a subset of mecha-nosensory neurons accumulates huntingtin aggregates, and the worms often fail to twitch when prodded. The second model involved culturing neurons derived from transgenic mice engineered to overexpress mutant

Sirtuins for healthy neuronsDavid Sinclair

Sir2 deacetylases are believed to promote the survival and longevity of organisms during times of adversity. A new study shows that activation of Sir2 by small molecules called sirtuin-activating compounds increases neuronal survival in two different models of Huntington disease, possibly opening new avenues for treatment.

David Sinclair is in the Department of Pathology, Harvard Medical School, Boston, Massachusetts, 02115 USA. e-mail: [email protected]

ON+

NADsalvagepathwaySirtuins

Isonicotinamide

NAM riboside

Resveratrol

Plant-derivedSTACs

NAM derivativeSTACs

Secretedfactors

NADprecursors

NAM

NMNAT

PNC1/PBEF

PBEF(visfatin)

14

32

?

?

NAD+

O

HO

HO OH

NH2

N+

O

HO

OH

OH

NH2

Figure 1 Routes to increasing the activity of sirtuin deacetylases. Increasing the abundance or activity of sirtuin (class III) deacetylases protects neurons from the toxic effects of mutant huntingtin. There are four known ways to increase the activity of sirtuins. STACs, which work by contacting the enzymes directly, fall into two classes: polyphenols produced by stressed plants, such as resveratrol and fisetin, lower the Km of sirtuins, possibly through an allosteric mechanism (1), and nicotinamide (NAM) analogs increase the Vmax by blocking inhibition by nicotinamide, a physiological regulator of sirtuin activity (2). Other routes include increasing the amount or activity of the NAD+ salvage pathway gene PNC1 or its functional equivalent in mammals, known as PBEF (visfatin) (3), or increasing the levels of NMNAT, as exemplified by the WldS mouse mutant that is resistant to axonal degeneration due to an NMNAT gene duplication (4). It is also feasible that sirtuin activity is stimulated by NAD+ precursors such as nicotinamide riboside or by secreted PBEF (visfatin).

©20

05 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureg

enet

ics

NEWS AND V IEWS

340 VOLUME 37 | NUMBER 4 | APRIL 2005 | NATURE GENETICS

(109Q) huntingtin. In both model systems, resveratrol suppressed the deleterious effects of mutant huntingtin, as assessed by loss of the twitch response in the worms and cell death of the mouse neurons.

Resveratrol is commonly referred to as a ‘dirty’ molecule in the pharmaceutical indus-try, meaning that it seems to interact with many different proteins, including COX1/2, ribonucleotide reductase and SIRT1. What makes this new study important is that resve-ratrol is almost certainly working through Sir2 enzymes. Overexpression of Sir2 in the worm suppresses cell toxicity, whereas worms lack-ing Sir2 are no longer protected by resveratrol. Similarly, in mouse neurons, the action of res-veratrol is blocked by the Sir2 inhibitors sirti-nol and nicotinamide. Although the authors did not test whether protection is provided by SIRT1 specifically, this seems a reasonable bet, given that SIRT1 has previously been linked to cell survival in a variety of cell types, includ-ing neurons8. Fisetin, a chemically distinct class of STAC, worked even better than resve-ratrol at protecting worm neurons, consistent with the finding that fisetin is a considerably more potent activator of C. elegans Sir2 than is resveratrol9.

The present study comes on the heels of a string of papers linking the effects of resvera-

trol in mammals to SIRT1 activation, includ-ing increased survival of neurons whose axons are cut8, mobilization of fatty acids in adipo-cytes10 and modulation of NF-κB transactiva-tion and TNFα-induced apoptosis11.

To your healthProbably the most pressing question for many readers is whether a glass of red wine can pro-vide enough resveratrol to activate SIRT1. Older literature indicates that it does not. Red wine results in serum levels of resveratrol that barely reach the micromolar concentrations thought to be required for SIRT1 activation; what’s more, resveratrol is metabolized into sulfated and glucuronidated forms within ∼15 min of entering the bloodstream12. But Parker et al.2 show that we may need a con-centration of only ∼500 nM of resveratrol to protect our neurons. We should also be aware that the metabolites of resveratrol, which cir-culate in serum for some 9 h (ref. 12), might also activate SIRT1.

One overarching question remains: why does a group of structurally related polyphe-nols produced by stressed plants protect cells, improve health and, in some cases, promote longer life in various organisms? STACs may resemble an as-yet-unidentified endogenous activator. Another explanation, known as the

Sickle cell anemia (SCA) is an autosomal recessive disorder caused by a missense mutation in the beta polypeptide chain of hemoglobin. Symptomatic stroke can occur in as many as 11% of affected individuals by the age of 20 years (ref. 1), and many more will show evidence of silent infarction by magnetic resonance imaging (Fig. 1). The number of families with SCA with at least two children with stroke is greater than

would be expected by chance alone2. A study of 29 families with more than one child with SCA found that having a child with elevated cerebral blood flow velocities on transcranial Doppler ultrasonography (TCD) increased the odds of siblings having elevated cerebral blood flow velocities by a factor of 50, a find-ing consistent with familial predisposition to cerebral vasculopathy3. These studies signal that genetic factors may influence the risk of stroke in individuals with SCA; the search for risk-modifying genes is ongoing4,5. There are increasing amounts of clinical and genetic information about risk factors; the challenge remains how best to analyze and interpret this information. On page 435 of this issue, Paola Sebastiani and colleagues6 use the new

approach of Bayesian networks to analyze variations in 108 SNPs in 39 candidate genes in 1,398 individuals with SCA. Their findings show the potential for Bayesian networks to generate predictive models and highlight relationships between genes and a disease phenotype.

Stratifying stroke riskThe current approach to stratifying stroke risk among SCA cases is to test noninvasively for presymptomatic cerebral vasculopathy with TCD ultrasonography. The technique measures flow velocity, which increases in narrowed arteries, in large proximal intracra-nial arteries. A cohort study of 190 children and adolescents with SCA monitored for an

Defining stroke risks in sickle cell anemiaJames F Meschia & V Shane Pankratz

Children and young adults with sickle cell anemia at risk for stroke are identified principally by screening for cerebral vasculopathy using transcranial Doppler ultrasonography. Investigators now show how Bayesian networks can generate useful predictive models and highlight relationships between genes and the occurrence of stroke in those with sickle cell anemia.

xenohormesis hypothesis13, holds that plants synthesize STACs during stressful times to activate their own Sir2-mediated defenses and that animals benefit from picking up on these chemical stress cues from the plant world because it allows them to mobilize defenses in anticipation of a deteriorating environment. Which of these theories is correct may not be known for many years, but whether STACs can effectively treat the major diseases of our time, including neurodegenerative disorders, may be known far sooner than that. Wouldn’t that be a miracle?

1. Aggarwal, B.B. et al. Anticancer Res. 24, 2783–2840 (2004).

2. Parker, J. et al. Nat. Genet. 37, 349–350 (2005).3. Sinclair, D.A. & Guarente, L. Cell 91, 1033–1042

(1997).4. Kaeberlein, M., McVey, M. & Guarente, L. Genes Dev.

13, 2570–2580 (1999).5. Tissenbaum, H.A. & Guarente, L. Nature 410, 227–230

(2001).6. Rogina, B. & Helfand, S.L. Proc. Natl. Acad. Sci. USA

101, 15998–16003 (2004).7. Wang, J. et al. FASEB J. (in the press).8. Araki, T., Sasaki, Y. & Milbrandt, J. Science 305, 1010–

1013 (2004).9. Wood, J.G. et al. Nature 430, 686–689 (2004).10. Picard, F. et al. Nature 429, 771–776 (2004).11. Yeung, F. et al. EMBO J. 23, 2369–2380 (2004).12. Walle, T., Hsieh, F., DeLegge, M.H., Oatis, J.E. Jr.

& Walle, U.K. Drug Metab. Dispos. 32, 1377–1382 (2004).

13. Lamming, D.W., Wood, J.G. & Sinclair, D.A. Mol. Microbiol. 53, 1003–1009 (2004).

James F. Meschia is in the Department of Neurology, Mayo Clinic, Cannaday Building, 2 East, Jacksonville, Florida 32224, USA. V. Shane Pankratz is in the Division of Biostatistics, Mayo Clinic, Rochester, Minnesota, 55905 USA. e-mail: [email protected]

©20

05 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureg

enet

ics

NEWS AND V IEWS

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005 341

average of 29 months showed that abnormal blood flow velocities of 170 cm s–1 or greater were associated with a 44-times-higher rela-tive risk of stroke7. Longer follow-up with an additional 125 affected individuals confirmed that elevated TCD-detected velocities, par-ticularly average mean maximum blood flow velocities of 200 cm s–1 or more, are strongly associated with stroke risk8. The STOP trial in individuals with SCA with time-averaged mean blood flow velocity in the internal carotid or middle cerebral artery of at least 200 cm s–1 showed that blood transfusions reduced the risk of stroke by 92% (ref. 9). TCD monitoring and transfusion therapy might explain the recent marked decline in stroke rates in California10.

In the future, multiplex genetic testing may prove superior to TCD for stratifying risk of stroke in SCA. Bayesian networks, and other modern data analytical methods, may lever-age information generated by multiplex gene testing. For example, the model developed by Sebastiani and colleagues showed great pre-dictive value, correctly classifying all 7 sub-jects who experienced a stroke and 105 of the 107 who did not6. Testing for risk-modify-ing genes might reduce the need for on-site technical expertise in ultrasonography and thereby decentralize the process of screening for at-risk individuals. This could also reduce the need for long-term follow-up of at-risk individuals, which has been necessary to opti-mize the usefulness of TCD screening11.

Testing for risk-modifying genes could identify individuals at risk for stroke earlier in the course of their disease, before they develop ultrasonographic evidence of cerebral vascu-lopathy, a condition that may not be benign in the stage preceding symptomatic stroke. In a study of children with SCA but no history of stroke, children with abnormal TCD read-ings performed less well on certain cognitive tests than those with conditional or normal TCD readings12. Genetic testing might help not only individuals at risk for stroke but also those at risk for cerebral vasculopathy before they manifest symptomatic stroke.

Sebastiani and colleagues started with 235 SNPs in 80 candidate genes and used a Bayesian network approach to ultimately identify 31 SNPs in 12 genes that were asso-ciated with the occurrence of stroke in indi-viduals with SCA. Among these were genes that are good candidates for underlying stroke, including genes in the TGF-β path-

way. Agonist and antagonist studies on stroke models show that TGF-β1 can be neuropro-tective, reducing neuronal cell death and infarct volumes13. TGF-β plasma concen-trations are elevated in SCA in steady state, and the elevation is contingent on whether individuals have high or low fetal hemoglo-bin, an inhibitor of polymerization of sickle hemoglobin in red cells14.

Bayesian and other approachesBayesian networks, and other modern meth-ods of evaluating multilocus data, are prom-ising alternatives to commonly used model selection procedures such as stepwise logistic regression (e.g., ref. 15). One strength of these methods is their flexibility in accounting for genetic interactions among the selected genes. This is in contrast to logistic regres-sion, where one must explicitly model any single-gene effects and multigene interac-tions of interest. They therefore show prom-ise in identifying potential structure that may be useful in interpreting large amounts of genetic information.

Despite their promise, it is important to temper enthusiasm for Bayesian networks and similar analytical techniques. Results from these analytical techniques are depen-dent on the structure of the model, and their statistical properties have not yet been fully studied. The complex data analytical tech-niques that can be applied to large data sets make it possible to obtain persuasive results

for any given data set. If nuances in sample set identification, data collection and application of the method are overlooked, then the results that are obtained may be difficult to replicate, or even misleading. We endorse the contin-ued study of these new analytical techniques. To validate more fully the methods and the results obtained with them, however, we rec-ommend that extensive detail be provided about every component of the study design and analysis, including a detailed description of the intermediate steps in model creation. This would facilitate a full evaluation of the analytical methods and permit an objective weighing of findings.

1. Ohene-Frempong, K. et al. Blood 91, 288–294 (1998).

2. Driscoll, M.C. et al. Blood 101, 2401–2404 (2003).3. Kwiatkowski, J.L. et al. Br. J. Haematol. 121, 932–937

(2003).4. Hoppe, C. et al. Blood 103, 2391–2396 (2004).5. Adams, G.T. et al. BMC Med. Genet. 4, 6 (2003).6. Sebastiani, P. et al. Nat. Genet. 37, 435–440

(2005).7. Adams, R. et al. N. Engl. J. Med. 326, 605–610

(1992).8. Adams, R.J. et al. Ann. Neurol. 42, 699–704

(1997).9. Adams, R.J. et al. N. Engl. J. Med. 339, 5–11

(1998).10. Fullerton, H.J. et al. Blood 104, 336–339 (2004).11. Adams, R.J. et al. Blood 103, 3689–3694 (2004).12. Kral, M.C. et al. Pediatrics 112, 324–331 (2003).13. Dhandapani, K.M. & Brann, D.W. Cell Biochem.

Biophys. 39, 13–22 (2003).14. Croizat, H. & Nagel, R.L. Am. J. Hematol. 60, 105–

115 (1999).15. Hoh, J. & Ott, J. Nat. Rev. Genet. 4, 701–709

(2003).

Modifier genes:TGFBR2, TGFBR3, BMP6 and others

Stroke

Cognitivedeficits

Schoolproblems

Cerebralinfarcts

HbSS

Figure 1 Multiple genes probably modify the risk of stroke in individuals with SCA. HbSS, hemoglobin in an individual who is homozygous with respect to the mutation that causes SCA; TGFBR2, transforming growth factor, beta receptor II; TGFBR3, transforming growth factor, beta receptor III; BMP6, bone morphogenetic protein 6.

©20

05 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureg

enet

ics

NEWS AND V IEWS

342 VOLUME 37 | NUMBER 4 | APRIL 2005 | NATURE GENETICS

Adaptation is everywhere. Sometimes it gets in our way, as with drug-resistant microbes, pes-ticide-resistant insects and cancer. Sometimes it does us good, as in the domestication of plants and animals and industry’s use of directed evolution to create useful molecules. Nevertheless, although the products of adap-tation are well known, the mechanism and quantitative nature of the adaptive process remain poorly understood. Early attempts to describe the adaptive process on geometrical grounds1 did not lead much beyond a rudi-mentary understanding of whether small-effect or large-effect mutations contribute most to adaptation2. Real progress towards understanding the adaptive process came from explicit models of genetic sequences3–6. These models predicted the fitness effects of mutations that arise and fix within a popula-tion as it adapts to its environment. On page 441 of this issue, Rokyta et al.7 now provide an empirical test of this theory, finding that the first steps that adaptation takes are con-sistent with the theoretical predictions of the adaptive process.

Testing evolutionTheories that make real, a priori predic-tions about adaptation have been gaining momentum, and the report from Rokyta et al. in this issue provides the empirical sup-port needed to launch these theories into the spotlight. This paper presents a quantitative experimental test of a theory of adaptation initially developed by John Gillespie in the 1980s (refs. 3–5) and recently extended by Allen Orr6. Gillespie realized that in a popu-lation slightly displaced from its closest fit-ness optimum, there will be but a handful of mutations that improve fitness compared with an overwhelming number that reduce fitness3–5. Therefore, beneficial mutations

that lead to adaptation should represent the most-fit tail of the distribution of all possible mutations.

Gillespie’s model made use of a common statistical theory called extreme-value theory, which indicates that samples drawn from the tail of a distribution have properties that do not depend on the exact nature of the distri-bution. Applied to evolution, extreme-value theory predicts an ordered progression of fit-ness effects among the handful of beneficial alleles: the best allele should be substantially fitter than the next-best allele, and fitness differences between pairs of next-most ben-

eficial alleles should decline so that most of the beneficial alleles have small effects. Using this insight, Orr6 derived predictions about the distribution of fitness effects of the muta-tions that arise and fix during an adaptive walk. Orr’s model allowed for testable predic-tions about the course of adaptive evolution. In particular, he derived the probability that a mutated allele with a given fitness rank would be fixed at the next adaptive step.

The first adaptive stepRokyta et al.7 now supply the needed empiri-cal test of this theory. The authors used a

The first steps in adaptive evolutionJames J Bull & Sarah P Otto

The first empirical test of an evolutionary theory provides support for a mutational landscape model underlying the process of adaptation. The study shows that it is possible to predict at least the first step in an adaptive walk and also shows the importance of incorporating mutation bias in the fitness effects of mutations.

James J. Bull is in the Section of Integrative Biology and Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas 78712-0253, USA. Sarah P. Otto is in the Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada. e-mail: [email protected]

Figure 1 A model bacteriophage takes its first adaptive step on a fitness landscape. An adaptive walk along a mutational landscape, reflecting all possible mutations deriving from the initial sequence, can represent the evolution of a virus. Now, a virus’s first steps in an adaptive walk have been defined to within a likely mutational landscape. Photo by J. Palmersheim; phage sculpture by H. Wichman; landscape by A. Johnston.

©20

05 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureg

enet

ics

NEWS AND V IEWS

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005 343

single-stranded DNA virus to determine whether the beneficial mutation fixed at the first substitution in an adaptive walk agreed with that predicted by Orr’s theory (Fig. 1). The empirical test was not trivial, requiring considerable replication as well as detailed information on the identity and fitness of substituted alleles. To accomplish this, Rokyta et al. focused on the first step in adaptation, allowing for greater replication and predictive ability.

Rokyta et al. carried out 20 replicate sin-gle-step adaptations using a wild relative of ΦX174 grown in liquid culture, allowing each line to adapt independently to the same conditions. In each replicate, the first muta-tion to both arise and spread in the line was identified by whole-genome sequencing. The fitness effect of each mutation was measured as the growth rate of the virus, and the 20 fitness effects were ranked. Of the 20 first adaptive steps examined, all mutations were nonsynonymous, involving nine distinct amino acid changes, and all increased fitness (from 11% to 39%).

Comparing these experimental results with previous theory, Rokyta et al. found that Orr’s model fit the observed fitness distribu-tion of the mutations reasonably well. The predictions of Orr, however, are only expec-tations over all possible genomic starting points and over all possible adaptive walks.

Rokyta et al. found a substantially improved fit to the data by incorporating mutation rate differences between their starting sequence and each of the nine observed amino acid changes. A slightly better fit was obtained using all the available data (including the mutation rate differences, fitness effects and population size dynamics). Thus, the authors found that models tailored to the specifics of the population could better describe the pro-cess of adaptation. It is notable that, without this additional knowledge of the starting and mutant sequences, Orr’s predictions faired so well.

Rigorous biological tests of these mod-els are presently limited to small genomes (viruses, plasmids and single molecules sub-jected to in vitro selection) and to computer models of fitness landscapes. Tests using bac-teria, yeast and higher eukaryotes await the cost-feasibility of sequencing large genomes with multiple replicates of a single experi-ment. But even now, tests are feasible and of obvious relevance for predicting drug resis-tance evolution in viruses, including HIV and influenza, two viruses for which drugs have or could have a crucial role in treatment and for which we already know that evolution causes problems. In some cases, we might want the theory tailored to the individual genome, a combination of Orr’s model and the modifi-cations offered by Rokyta et al.

Next stepsIn showing the relevance of existing adapta-tion theories to real experimental conditions, this work brings new excitement to adaptive walks. For any particular system, general properties about the course of adaptive walks can now be predicted based on only a few parameters. To borrow from Fisher, no practical biologist would have dared imagine that the details of adaptive walks might be largely independent of the biology, yet that is precisely what the current results suggest. This work shows that it is possible to pre-dict the spectrum of possible first steps of an adaptive walk. The next steps will be to extrapolate this work to the full course of an adaptive walk. Such predictive power would be extremely valuable when anticipating the evolutionary response of pathogens to new antimicrobial drugs and when using directed evolution to create molecules with specific functions for industry.

1. Fisher, R.A. The Genetical Theory of Natural Selection (Oxford University Press, Oxford, UK, 1930).

2. Orr, H.A. Nat. Rev. Genet. 6, 119–127 (2005).3. Gillespie, J.H. Theor. Popul. Biol. 23, 202–215

(1983).4. Gillespie, J.H. Evolution 38, 1116–1129 (1984).5. Gillespie, J.H. The Causes of Molecular Evolution

(Oxford University Press, Oxford, UK, 1991).6. Orr, H.A. Evolution 56, 1317–1330 (2002).7. Rokyta, D.R., Joyce, P., Caudle, S.B. & Wichman, H.A.

Nat. Genet. 37, 441–444 (2005).

Since its discovery, the human X chromosome has been defined by its relationships to other chromosomes. Its hemizygosity in males and unusual patterns of inheritance immediately separate it from its autosomal cousins, and its length and gene content set it apart from its Y chromosome sibling. The two sex chro-mosomes in mammals descended from a pair of autosomes1. The Y underwent massive degeneration, losing size and gene content,

whereas the X was maintained, retaining its size and most of its genes. But the X is much more than a faithful copy of its autosomal progenitor; it also evolved many distinctive features2. The most notable of these include X inactivation, the extensive flux (both accre-tion and loss) of sex-specific genes, and a deficit in polymorphism. Additionally, the X has a disproportionately large representa-tion of genes involved in mendelian diseases, probably owing to the relative ease of identi-fying such genes when X-linked. As recently reported by Mark Ross and colleagues in Nature3, the sequence of the human X brings these and other features of this chromosome into sharp focus.

Origin of the sex chromosomesThe mammalian sex chromosomes arose from autosomal progenitors ∼300 million years ago. Before then, sex determination probably relied on environmental cues such as egg incubation temperature, as is the case in many extant reptiles. Ross et al.3 confirmed that much of the long arm of the human X is homologous to the short arm of chicken chromosome 4, whereas most of the short arm of the human X matches a stretch of chicken chromosome 1. The bird sex chro-mosomes Z and W, on the other hand, show homology to human chromosome 9. These observations demonstrate the independent origins of genetic sex determination in mam-

The X chromosome: not just her brother’s keeperEric J Vallender, Nathaniel M Pearson & Bruce T Lahn

The X chromosome has traditionally been characterized as a conscientious sister to her derelict brother that is the Y. Beyond dutifully maintaining the family heritage, however, the X has developed its own unique identities. Now, the complete sequence of the human X allows us to appreciate its distinctiveness at an unprecedented resolution.

Eric J. Vallender, Nathaniel M. Pearson and Bruce T. Lahn are in the Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA. e-mail: [email protected]

©20

05 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureg

enet

ics

NEWS AND V IEWS

344 VOLUME 37 | NUMBER 4 | APRIL 2005 | NATURE GENETICS

mals and birds. Furthermore, comparison of the human X with that of the mouse, rat and dog indicates that gene order of the X in human and dog largely represents that of the ancestral X, whereas many rearrangements have occurred on the murine X.

After the sex chromosomes were estab-lished, recombination between X and Y was suppressed progressively over time in a block-by-block manner along the chromosomes4. This was probably accomplished by a series of large-scale inversions on the Y. Previous stud-ies posited at least four evolutionary ‘strata’ on the X chromosome, each corresponding to a block of the chromosome where recom-bination was suppressed at once (or within a short period)4. Ross et al.3 now identify a fifth stratum on the telomeric end of the X short arm. On the whole, the data portray an unrelenting advance of the pseudoauto-somal boundary towards the telomeres over evolutionary time, with the most recent shift occurring ∼30 million years ago.

X inactivationAs the sex chromosomes diverged, they devel-oped their own identities. The Y underwent massive genic atrophy and a correspond-ing size reduction (and also accumulated male-beneficial genes5,6). The X remained more stable in terms of gene content and, in response to Y decay, evolved a mechanism

of inactivating one of its two copies in each somatic cell to compensate for the gene dos-age difference between males and females. Although most genes on the X are subject to this haplo-inactivation, many escape it7. Typically, these escapees have functional (or very recently decayed) Y homologs. This is consistent with the notion that X inactiva-tion evolved gene by gene and, in each case, as a delayed response to the degeneration of a corresponding Y homolog8.

The mechanisms by which X inactivation occurs are only partly understood. In particu-lar, it is uncertain how X inactivation spreads from the X inactivation center (where the gene XIST resides) across the rest of the chromo-some. One hypothesis holds that LINE1 (L1) repetitive elements may act as ‘way stations’ to facilitate the spread of X inactivation9. Ross et al.3 show not only that are L1 repeats enriched on the X relative to the rest of the genome, but also that their density in a given region roughly correlates with the region’s age of divergence from the Y chromosome and its completeness of inactivation. Regions long diverged from the Y show high L1 density and more thor-ough X inactivation, whereas regions recently diverged from the Y show low L1 density and greater tendency to escape inactivation. These data are consistent with (but do not yet prove) the hypothesis that L1 repeats are the way sta-tions of X inactivation.

Selective forces on the XThe evolutionary history of the sex chromo-somes differs categorically from that of the autosomes. Notably, both the X and the Y are present less frequently than an autosome in the population. As predicted by population genetics theories, this should lead to fewer polymorphisms on the sex chromosomes. Ross et al.3 report that, as expected, the poly-morphism level of the X is only ∼57% that of an autosome.

Selective regimes also differ greatly between the two sex chromosomes. For the Y, two par-ticularly prominent forces, asexual degenera-tion and constant directional selection, have been postulated to result in a chromosome that is generally gene-poor but specifically enriched for male-beneficial functions2. For the X, hemizygous exposure is theorized to drive a subtler accumulation of male-benefi-cial genes on the assumption that recessive male-beneficial alleles can more readily man-ifest their benefit from the X—a chromosome that is hemizygous (or ‘exposed’) in males—than from autosomes2,10. Extending previous findings11,12, Ross et al. showed that the X is enriched for cancer-testis antigen genes, a subset of testis genes that are also expressed in cancer cells. Of note, a recent study con-firmed that the mammalian X is enriched for genes involved in early stages of spermato-genesis, consistent with the influence of

Table 1 General properties of the human sex chromosomes and autosomes

X chromosome Y chromosomea Autosomes

Size of euchromatic region (Mb) 150 23 2,863

Gene number (count per Mb) ∼1,098 (7) ∼78 (3.5) ∼29,800 (11)

Copy number in population as scaled to autosomes

3/4 1/4 1

Heterozygosityb 4.7 × 10–4 1.5 × 10–4 7.5 × 10–4

Fraction of LINE1 sequence 29% 23% 16%

Unique evolutionary forces affecting gene content

Sexual antagonism; hemizygous exposure; evasion of male germline X inactivation

Asexual degeneration; constant direc-tional selection; sexual antagonism

None

Genes that show enrichment Early spermatogenesis genes; brain genes; skeletal muscle genes; ovary genes; pla-centa genes

Spermatogenesis genes None

Genes that show deficit Late spermatogenesis genes Most types of genes None

aPseudoautosomal regions are excluded. bA measure of polymorphism defined as the average number of nucleotide differences per base between two randomly sampled copies of the chromosome.

©20

05 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureg

enet

ics

NEWS AND V IEWS

NATURE GENETICS | VOLUME 37 | NUMBER 4 | APRIL 2005 345

hemizygous exposure13. But the same study also showed a deficit of genes implicated in late stages of spermatogenesis, which was hypothesized to be an evolutionary response to male germline X inactivation that occurs at the onset of male meiosis13. Thus, the can-cer-testis antigen genes shown by Ross et al. to be enriched on the X are probably early spermatogenesis genes. The human X is also mildly enriched for brain and skeletal muscle genes, which could potentially be explained by hemizygous exposure2.

In addition to the accretion of certain male-beneficial genes, the mammalian X also seems to be enriched for female-beneficial genes such as those expressed in ovary and placenta2,13. This could be explained by sexual antagonism: because the X spends most of its time (two thirds) in females, genes benefiting females but harming males may tend to move from autosomes to the X, whereas genes benefiting males but harming females may tend to move from the X to autosomes (or the Y)2,11,14.

Regardless of the selective forces, the X is predicted to be a hotbed of gene traffic. This was demonstrated by a recent study show-

ing that the X has disseminated as well as recruited a disproportionately high number of functional retroposed genes15.

Thus, opposing selective forces have driven the gene content of the X to change in a man-ner far more complex than that of autosomes. The results are a suite of evolutionary out-comes that includes masculinization fostered by hemizygous exposure (e.g., enrichment of early spermatogenesis genes), demasculiniza-tion driven by sexual antagonism or the need to evade male germline X inactivation (deficit of late spermatogenesis genes) and feminiza-tion propelled by sexual antagonism (accre-tion of ovary and placenta genes).

Compare and contrastThe X and Y chromosomes have such tightly intertwined evolutionary histories that each can only be viewed in the light of the other (Table 1). The X chromosome gained early fame for its role in the discovery of reces-sive disease-associated alleles. Beyond this illustrious contribution to medical genetics, the X has come to have an important role in the understanding of genome evolution. The

newest work by Ross et al.3 now allows virtu-ally every nucleotide of the X chromosome to be studied. Such a resource will surely produce additional insights into the function and evolution of the X chromosome—and, by extension, the Y chromosome.

1. Ohno, S. Sex Chromosomes and Sex-Linked Genes (Springer, Berlin, 1967).

2. Vallender, E.J. & Lahn, B.T. Bioessays 26, 159–169 (2004).

3. Ross, M.T. et al. Nature 434, 325–337 (2005).4. Lahn, B.T. & Page, D.C. Science 286, 964–967

(1999).5. Lahn, B.T. & Page, D.C. Science 278, 675–680

(1997).6. Skaletsky, H. et al. Nature 423, 825–837 (2003).7. Carrel, L., Cottle, A.A., Goglin, K.C. & Willard, H.F. Proc.

Natl. Acad. Sci. USA 96, 14440–14444 (1999).8. Jegalian, K. & Page, D.C. Nature 394, 776–780

(1998).9. Lyon, M.F. Cytogenet. Cell Genet. 80, 133–137

(1998).10. Rice, W.R. Evolution 38, 735–742 (1984).11. Saifi, G.M. & Chandra, H.S. Proc. R. Soc. Lond. B Biol.

Sci. 266, 203–209 (1999).12. Wang, P.J., McCarrey, J.R., Yang, F. & Page, D.C. Nat.

Genet. 27, 422–426 (2001).13. Khil, P.P., Smirnova, N.A., Romanienko, P.J. & Camerini-

Otero, R.D. Nat. Genet. 36, 642–646 (2004).14. Wu, C.I. & Xu, E.Y. Trends Genet. 19, 243–247

(2003).15. Emerson, J.J., Kaessmann, H., Betran, E. & Long, M.

Science 303, 537–540 (2004).

©20

05 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

ureg

enet

ics