nested association mapping for identification of functional ... · association genetic mapping,...

11
Copyright Ó 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.115782 Nested Association Mapping for Identification of Functional Markers Baohong Guo,* David A. Sleper and William D. Beavis* ,1 *Department of Agronomy, Iowa State University, Ames, Iowa 50011 and Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211 Manuscript received February 21, 2010 Accepted for publication May 7, 2010 ABSTRACT Identification of functional markers (FMs) provides information about the genetic architecture underlying complex traits. An approach that combines the strengths of linkage and association mapping, referred to as nested association mapping (NAM), has been proposed to identify FMs in many plant species. The ability to identify and resolve FMs for complex traits depends upon a number of factors including frequency of FM alleles, magnitudes of their genetic effects, disequilibrium among functional and nonfunctional markers, statistical analysis methods, and mating design. The statistical characteristics of power, accuracy, and precision to identify FMs with a NAM population were investigated using three simulation studies. The simulated data sets utilized publicly available genetic sequences and simulated FMs were identified using least-squares variable selection methods. Results indicate that FMs with simple additive genetic effects that contribute at least 5% to the phenotypic variability in at least five segregating families of a NAM population consisting of recombinant inbred progeny derived from 28 matings with a single reference inbred will have adequate power to accurately and precisely identify FMs. This resolution and power are possible even for genetic architectures consisting of disequilibrium among multiple functional and nonfunctional markers in the same genomic region, although the resolution of FMs will deteriorate rapidly if more than two FMs are tightly linked within the same amplicon. Finally, nested mating designs involving several reference parents will have a greater likelihood of resolving FMs than single reference designs. T HE primary purpose for identifying functional markers (FMs) associated with complex traits in plant species is to provide molecular genetic information underlying variability upon which both artificial and natural selection are based. FMs are defined as poly- morphic sites within genomes that causally affect phe- notypic trait variability (Andersen and Lubberstedt 2003). This definition is a pragmatic recognition that phenotypic variability can be due to genomic variability located outside of open reading frames. Forward genet- ics approaches to associate naturally occurring structural genomic variants with phenotypic variability can be broadly categorized as (1) linkage mapping, also re- ferred to as quantitative trait locus (QTL) mapping, (2) association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage and LD mapping. The third approach based on the concept of combin- ing LD with QTL mapping is a natural extension of the multifamily QTL approach and has been referred as joint linkage and linkage disequilibrium mapping ( JLLDM) (Xiong and Jin 2000; Farnir et al. 2002; Wu et al. 2002; Perez-Enciso 2003; Jung et al. 2005) in samples from natural populations. The combined ap- proach also has been applied to designed mapping families sampled from plant breeding populations (Xu 1998a; Jannink and Jansen 2000; Jannink and Wu 2003; Jansen et al. 2003). A special case of designed mapping families that are interconnected, known as nested association mapping (NAM), was proposed by Yu et al. (2008). As originally proposed, a NAM population consists of multiple families of recombinant inbred lines (RILs) derived from multiple inbred lines crossed to a single reference inbred line. Implicitly, genomic in- formation is composed of high-density genotypes of parental inbred lines and low-density genotypes from segregating progeny. If the segregating progeny are RILs or doubled haploid lines (DHLs), then the genomic information can be ‘‘immortalized’’ for associations with phenotypes obtained through long-term longitudinal studies (Nordborg and Weigel 2008). A NAM population consisting of 25 families with 200 RILs for each family has been developed and released as a genetic resource for identification of FMs in maize (Yu et al. 2008). Other publicly available NAM populations are being developed for several species including Arabidopsis thaliana (Buckler and Gore 2007), barley (R. Wise, personal communication), sorghum ( J. Yu, personal communication), and soybean (B. Diers, personal communication). The power, accuracy, and precision of identifying FMs in experimental NAM populations have not been Available freely online through the author-supported open access option. 1 Corresponding author: 1208 Agronomy Hall, Iowa State University, Ames, IA 50011. E-mail: [email protected] Genetics 186: 373–383 (September 2010)

Upload: others

Post on 13-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

Copyright � 2010 by the Genetics Society of AmericaDOI: 10.1534/genetics.110.115782

Nested Association Mapping for Identification of Functional Markers

Baohong Guo,* David A. Sleper† and William D. Beavis*,1

*Department of Agronomy, Iowa State University, Ames, Iowa 50011 and †Division of Plant Sciences, University of Missouri,Columbia, Missouri 65211

Manuscript received February 21, 2010Accepted for publication May 7, 2010

ABSTRACT

Identification of functional markers (FMs) provides information about the genetic architectureunderlying complex traits. An approach that combines the strengths of linkage and association mapping,referred to as nested association mapping (NAM), has been proposed to identify FMs in many plant species.The ability to identify and resolve FMs for complex traits depends upon a number of factors includingfrequency of FM alleles, magnitudes of their genetic effects, disequilibrium among functional andnonfunctional markers, statistical analysis methods, and mating design. The statistical characteristics ofpower, accuracy, and precision to identify FMs with a NAM population were investigated using threesimulation studies. The simulated data sets utilized publicly available genetic sequences and simulated FMswere identified using least-squares variable selection methods. Results indicate that FMs with simple additivegenetic effects that contribute at least 5% to the phenotypic variability in at least five segregating families of aNAM population consisting of recombinant inbred progeny derived from 28 matings with a single referenceinbred will have adequate power to accurately and precisely identify FMs. This resolution and power arepossible even for genetic architectures consisting of disequilibrium among multiple functional andnonfunctional markers in the same genomic region, although the resolution of FMs will deteriorate rapidly ifmore than two FMs are tightly linked within the same amplicon. Finally, nested mating designs involvingseveral reference parents will have a greater likelihood of resolving FMs than single reference designs.

THE primary purpose for identifying functionalmarkers (FMs) associated with complex traits in

plant species is to provide moleculargenetic informationunderlying variability upon which both artificial andnatural selection are based. FMs are defined as poly-morphic sites within genomes that causally affect phe-notypic trait variability (Andersen and Lubberstedt

2003). This definition is a pragmatic recognition thatphenotypic variability can be due to genomic variabilitylocated outside of open reading frames. Forward genet-ics approaches to associate naturally occurring structuralgenomic variants with phenotypic variability can bebroadly categorized as (1) linkage mapping, also re-ferred to as quantitative trait locus (QTL) mapping, (2)association genetic mapping, also known as linkagedisequilibrium (LD) mapping, and (3) designs thatcombine linkage and LD mapping.

The third approach based on the concept of combin-ing LD with QTL mapping is a natural extension ofthe multifamily QTL approach and has been referredas joint linkage and linkage disequilibrium mapping( JLLDM) (Xiong and Jin 2000; Farnir et al. 2002; Wu

et al. 2002; Perez-Enciso 2003; Jung et al. 2005) insamples from natural populations. The combined ap-

proach also has been applied to designed mappingfamilies sampled from plant breeding populations (Xu

1998a; Jannink and Jansen 2000; Jannink and Wu 2003;Jansen et al. 2003). A special case of designed mappingfamilies that are interconnected, known as nestedassociation mapping (NAM), was proposed by Yu et al.(2008). As originally proposed, a NAM populationconsists of multiple families of recombinant inbred lines(RILs) derived from multiple inbred lines crossed to asingle reference inbred line. Implicitly, genomic in-formation is composed of high-density genotypes ofparental inbred lines and low-density genotypes fromsegregating progeny. If the segregating progeny are RILsor doubled haploid lines (DHLs), then the genomicinformation can be ‘‘immortalized’’ for associations withphenotypes obtained through long-term longitudinalstudies (Nordborg and Weigel 2008).

A NAM population consisting of 25 families with 200RILs for each family has been developed and released asa genetic resource for identification of FMs in maize (Yu

et al. 2008). Other publicly available NAM populationsare being developed for several species includingArabidopsis thaliana (Buckler and Gore 2007), barley(R. Wise, personal communication), sorghum ( J. Yu,personal communication), and soybean (B. Diers,personal communication).

The power, accuracy, and precision of identifyingFMs in experimental NAM populations have not been

Available freely online through the author-supported open accessoption.

1Corresponding author: 1208 Agronomy Hall, Iowa State University,Ames, IA 50011. E-mail: [email protected]

Genetics 186: 373–383 (September 2010)

Page 2: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

investigated for complex genetic architectures. Thesestatistical properties depend upon a number of factorsincluding the following:

1. Data analysis method: Some methods are more power-ful than others; however, experimental biologistsprefer methods implemented in existing softwarepackages. Are least-squares methods sufficiently pow-erful to identify FMs in established and developingNAM populations?

2. Frequency of functional markers and magnitudes of geneticeffects: Development of a NAM population willchange the allele frequencies of the FM relative tothe reference population from which the lines aresampled. How will allele frequency and magnitude ofgenetic effects in a typical NAM population affect theability to identify FMs?

3. Disequilibrium among functional and nonfunctionalmarkers: Disequilibrium may exist among alleleswithin subpopulations even when there is no physicalbasis for genetic linkage. To what extent can theNAM design address consequences of gametic dis-equilibrium (population structure) in the referencepopulation?

4. Multiple FMs in the same genomic region: If multiple FMsare physically located in the same genomic region,will equilibrium among the parental lines enableresolution of multiple FMs?

5. Mating design: An appropriate mating design canmaximize the number of families that are informa-tive for FMs. Will multiple-reference mating designsimprove the probability of identifying FMs?

These five questions were addressed.

METHODS

Models: Consider multiple families of segregatingprogeny with only two genotypic classes per locus, suchas will occur with DHLs and RILs derived from a cross oftwo inbred lines (Figure 1). For the sake of brevity, werefer to such loci as Q loci; i.e., these are potential FMs.Segregating progeny with three genotypic classes perlocus, such as will occur in an F2, are obvious (Xu 1998a)and are not explicitly developed herein. A pooled analysisof data from multiple families is enabled through thelinear genetic model first proposed by Haley and Knott

(1992), extended to multiple families by Xu (1998a) andto account for background QTL by Guo et al. (2006),

Yij ¼ m 1 ai 1 bXij 1X

cidciZcij

1 eij ; ð1Þ

where Yij is the observed trait value of the jth segregat-ing line in the ith family, m is the overall mean, and ai isthe effect of the ith family. Xij represents a Q locusunder consideration and assumes the values of �1 and11 for the two homozygous genotypic classes. The vastmajority of SNP loci have only two alleles. Thus most

SNP loci that are FMs will be detectable as a biallelicsystem across families in which these alleles are segre-gating. Given current sequencing capabilities and costs,the alleles at the Q locus will be known in the parentallines, but unknown in the segregating progeny (Figure1). The values for Xij can be imputed in the segregatingprogeny by the conditional expectation, E(Xij j IP, IM) (Xu

1998a), where IP includes the known genotypes of theparental alleles at the Q locus, and IM is the genotypicinformation from the flanking markers in both parentsand progeny (see below). ci is a cofactor marker in family iused for the effects of background QTL. Zcij

is anindicator of cofactor marker in the jth line of the ithfamily. The values for Zcij

assume values of 1 or�1 for thetwo known genotypic classes in the segregating progenyand are identified on the basis of analyses within in-dividual families (see below). ai, b, and dci are parametersthat need to be estimated. It is assumed that eij aresampled from an identical and independent normaldistribution. Epistasis among Q loci was not modeled.

Under the null hypothesis model, (1) is reduced to

Yji ¼ m 1 ai 1X

cidciZcij

1 eij : ð2Þ

Identification of functional markers: Cofactormarkers associated with background QTL were identifiedwith an initial scan of the genome in individual families,using composite-interval mapping (Zeng 1994). This isamenable to least-squares computation that is readilyavailable through standard software packages and searchprocedures routinely used for multiple-regression prob-lems (Christensen 2002).

After background QTL were identified, all Q loci withinthe genomic region of interest were tested for significantassociations with simulated phenotypic variability. Q lociincluded in model (1) were declared to be FMs if thevariability explained by the model with a parameterrepresenting the locus exceeds that of the model withoutthe locus (Equation 2) at a predetermined thresholdvalue. The threshold for declaring a Q locus as a FM with atype 1 error rate of 0.01 was determined initially using3000 replicates of data generated under the null hypoth-esis of no genetic effects. The resulting threshold was�log10(p)¼ 3.80. Because SNP loci in sh1 amplicons wereused to simulate FMs (see below), we noted that this valueis very close to a Bonferroni adjusted value of 4.09 ¼log10(0.01/123), where 123 represents the number oftagged SNPs among sh1 amplicons. Therefore, forsimulations involving both sh1 and bt2 amplicons, thresh-old values were based on Bonferroni adjusted values of4.2 (¼ log10(0.01/150)), where 150 represents the totalnumber of tagged SNP loci in both amplicons.

If a FM was identified (conditional on backgroundQTL), additional Q loci were evaluated for significanceusing two variable selection methods: forward selectionand a modified stepwise procedure. The forward selec-

374 B. Guo, D. A. Sleper and W. D. Beavis

Page 3: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

tion procedure has been used previously in associationmapping (Nair et al. 2009), so it was chosen as acomparator for the modified stepwise procedure. Themodified stepwise procedure consisted of an addingstep and an updating step (Christensen 2002). In theadding step, all possible Q loci are tested conditional oncovariate FMs and background QTL. In the updatingstep, previously identified FMs are reevaluated as eachFM is sequentially excluded while all other FMs areincluded in the model as covariate(s). This updatingstep is repeated for each of the previously identified FMs.The adding and updating steps are repeated until nofurther significant associations are identified.

Imputation of parental SNPs onto segregatingprogeny: Assume a Q locus is genotyped in parentallines but not in their progeny and this locus is flanked bytwo SNP loci, A and B, which are genotyped in parentallines and their progeny within a family, and theexpectation of genotype score is based on the following:

1. The transition probabilities from one genotype atone locus to one genotype at another locus [P(Q¼ q jA¼ a), P(B¼ b jQ¼ q)] are obtained from Jiang andZeng (1995). These transition probabilities arefunctions of the frequency of recombinants betweenthe two flanking loci and the number of selfinggenerations.

2. The conditional probability of genotypes at Q givenflanking SNP loci A and B is computed as P(Q¼ q jA¼a, B¼ b)¼ P(Q¼ q j A¼ a)P(B¼ b jQ¼ q)/

PqP(Q¼

q j A ¼ a)P(B ¼ b j Q ¼ q) ( Jiang and Zeng 1995).

3. The expectation of genetic score at Q is computedas (1)P(Q¼ 1 j A¼ a, B¼ b) 1 (0)P(Q¼ 0 j A¼ a, B¼b) 1 (�1)P(Q¼�1 j A¼ a, B¼ b)¼ P(Q¼ 1 j A¼ a,B¼ b)� P(Q¼�1 j A¼ a, B¼ b). In situations wherecomputation is needed at terminal ends of a linkagegroup, SNP locus Q will have only one adjacentpolymorphic SNP locus. For this situation, theconditional probability is computed as P(Q ¼ q jA ¼ a) ¼ P(Q ¼ q j A ¼ a)/

PqP (Q ¼ q j A ¼ a).

The expectation of genetic score is computedby (1)P(Q ¼ 1 j A ¼ a) 1 (0)P(Q ¼ 0 j A ¼ a) 1

(�1)P(Q¼�1 jA¼ a)¼ P(Q¼ 1 jA¼ a)� P(Q¼�1j A ¼ a).

Statistical properties: The ability to identify FMs wascharacterized using power, accuracy, and precision fromreplicated simulations (see below). Power was estimatedas the frequency, among 100 replicates, of significantassociations for one or more Q loci within the sequencedamplicon containing a simulated FM. Note that thisdefinition of power indicates that a segregating Q locuswithin the amplicon containing the FM is significantlyassociated with the phenotypic variability; the identifiedQ locus may not be the actual FM. Accuracy wasevaluated by two criteria: (1) as the frequency ofcorrectly identified FM(s) and (2) as the differencebetween estimated genetic effects and the actual simu-lated genetic effects. Precision is a measure of consis-tency among the replicates. Identified FMs from allreplicates were used to construct a 95% linkage disequi-librium confidence interval (LD C.I.) to represent the

Figure 1.—Generalizeddesign for development of anested association mapping(NAM) population. Essentialelements depicted include(1) structure (nested) andpurpose (association map-ping) of a population ofsegregating progeny, (2)high-density genotypic datafrom genomic regions of in-terest in parental inbredlines, and (3) low-densitygenotypic data through-out the genomes of theparental lines and the segre-gating progeny.

Identification of Functional Markers 375

Page 4: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

lack of consistency among the replicate data sets.Explicitly, let the statistically identified FM be the Qlocus with the largest �log10(p) exceeding a thresholdvalue be designated as R 2, and let r 2 be the LD betweenthe identified FM and the ‘‘true’’ simulated FM. The1�a LD C.I. is defined as P(r 2 $ R 2) $ 1�a, where r 2 $

R 2 includes the simulated FM with probability $1 � a.The 95% confidence interval was obtained as theinterval that included 95% of the replicates after rankingtheir r2 values among alleles from the largest to thesmallest. All SNP loci contained in this interval werereferred to as the C.I. SNPs and might be regarded ascandidate FMs.

Simulations of FMs: The genomes of 28 segregatingfamilies in a NAM population, each consisting of 100DHLs, were simulated using R/QTL (http://www.rqtl.org/). The genomes of the simulated families consistedof five independent linkage groups. Each linkage groupconsisted of 100 cM of recombination with 11 anony-mous biallelic markers placed every 10 cM. In additionto simulated polymorphic marker loci actual sequencesof the sh1 and bt2 amplicons from 28 maize inbreds wereassigned to the segregating families. Depending uponthe genetic architecture and their relative genomiclocations (see below), SNPs within these sequences wereselected to serve the role of FMs in the expression ofquantitative phenotypes. The majority of SNP loci havetwo alternate alleles and only in rare cases will a SNPlocus have three or four alternative alleles. For these rarecases, multiple classes of alleles can be reduced to twoclasses: one representing the allele of highest frequencyand one representing all other alleles. Thus, for pur-poses of these studies, we considered only biallelicFMs.

Sequences from the sh1 amplicons were placed in themiddle of one linkage group, midway between the fifthand sixth markers. For genetic architectures involvingtwo FMs, sequences of the bt2 amplicon also were placedat various locations in the simulated genomes of theparental lines. In addition to the FMs, three backgroundQTL were simulated on independently segregatingchromosomes in all families, including families forwhich no simulated FMs were segregating. Each back-ground QTL contributed an equal amount to the overallphenotypic variability. Note that while the genotypes ofthe SNP loci within the sequences are known in theparental lines, in a NAM analysis they may not begenotyped in the segregating progeny (Figure 1). Thus,the actual SNP data were deleted from the simulateddata sets and expected genotypic values, as describedabove (Xu 1998a), were used in data analyses.

Sequences of sh1 and bt2 amplicons from the maizediversity project (http://www.panzea.org) consist of7029 bp and 6196 bp, respectively. Among maizeaccessions 235 SNPs have been identified on sh1 and46 SNPs on bt2. While LD (r2) among these SNPs decaysrapidly with distance, sets of 110 SNPs within sh1 and 22

within bt2 are in complete LD (r2¼ 1) among the maizeaccessions at http://www.panzea.org. For purposes ofthese simulations, a single SNP was used to representsets of SNPs that are in complete LD, resulting in a totalof 123 tagged SNPs on sh1 and 24 tagged SNPs on bt2.

Phenotypes were simulated as the sum of additivegenetic effects from FMs, additive genetic effects frombackground QTL, and random error. Three studiesinvolving multiple genetic architectures were used toevaluate the ability to identify FMs. The goals of thesestudies were to (1) determine impacts of multiple allelicfrequencies and magnitudes of simple additive geneticeffects, (2) determine whether the NAM design canreduce false positive consequences of disequilibrium(population structure) between FMs and non-FMs inthe reference population, and (3) determine if equilib-rium among the parental lines enables resolution oflinked FMs within families.

Study 1. Impact of frequencies and genetic effects offunctional markers: SNPs that occur at two differentfrequencies in the reference population (http://www.panzea.org) were simulated as FMs (Table 1). TheseSNPs occur at positions 3822, 4884, and 5769 on the sh1amplicon and are designated as sh1-3822, sh1-4884, andsh1-5769. The allele frequencies for the sh1-4884 andsh1-5769 loci each have an estimated frequency of 0.19in the reference population, but they were associatedwith different numbers of informative NAM families: 24and 5, respectively. The minor allele at sh1-3822 occursat a frequency of 0.47 and was informative in 15simulated NAM families. Differences in additive geneticeffects of the FMs accounted for 1, 2, 5, 10, 15, and 20%of the total phenotypic variability within families thatwere segregating FMs. Three additional independentlysegregating background QTL were simulated such thatthe total additive genetic variability from all sourcesexplained 70% of the phenotypic variability withinfamilies. The remaining phenotypic variability was dueto random variability from a random normal distribu-tion. Phenotypes were simulated across all DHL families100 times. Thus for each of the three FMs with sixdifferent magnitudes of genetic effects there were 100replicates, i.e., 1800 data sets, used to characterize thestatistical properties.

Study 2. Disequilibrium among functional and non-functional markers: To evaluate the ability of the NAMdesign to alter consequences of disequilibrium in thereference population, a SNP allele in complete disequi-librium with the allele at the sh1-3822 locus wassimulated and used as a FM (Table 2). No genetic effectswere simulated at the sh1-3822 locus, while the simulatedFM was located at 10, 20, 30, and 40 cM and independentof sh1 in segregating families. Because the FM was incomplete disequilibrium with sh1-3822, it also is in-formative in 15 of the 28 families. Differences in additivegenetic effects of the FMs accounted for 5, 20, and 30%of the total phenotypic variability among progeny within

376 B. Guo, D. A. Sleper and W. D. Beavis

Page 5: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

families in which the FMs were segregating. Three addi-tional independently segregating background QTLwere simulated such that the total additive geneticvariability accounted for 70% of the phenotypic variabil-ity within families. A total of 15 combinations (3 additivegenetic effects 3 5 genomic locations) were simulated100 times, providing a total of 1500 data sets.

Study 3. Multiple functional markers in the same linkedgenomic region: To evaluate whether the NAM design canresolve multiple FMs located in the same genomic region,four cases were simulated (Table 3): (a) two FMs within asingleamplicon, (b) threeFMswithina singleamplicon, (c)two FMs located in two linked amplicons, and (d) two FMswithin one amplicon and a third FM in a linked amplicon.

For cases a and b three SNP alleles, located at positions819, 3822, and 5769 in the sh1 amplicons were used tosimulate FMs. The minor alleles at sh1-819 and sh1-3822have estimated frequencies of 0.41 and 0.47 in thereference population and were segregating in 13 and 15families, respectively. The minor allele at sh1-5769 has anestimated frequency of 0.19 in the reference population

and was segregating in 5 families. For the three pairwisecombinations of FMs within a single amplicon, differ-ences in additive genetic effects of the FMs accountedfor 5 and 10% of the total phenotypic variability amongprogeny within families in which the FMs were segregat-ing. For the triplet, differences in additive genetic effectsat sh1-819 and sh1-5769 accounted for 5% and those atsh1-3822 accounted for 10% of the phenotypic variabilityamong progeny within families in which these FMs weresegregating.

For case c position 3822 in the sh1 amplicon wasdesignated as one of the FMs and alleles at positions 900and 2854 in the bt2 amplicon (designated as bt2-900 andbt2-2854, respectively) were used as FMs in the samegenomic region. Sequences from the bt2 amplicon werelocated at 10 or 40 cM from sh1 within segregatingfamilies. The estimated minor allele frequency at bt2-900is 0.16 and is segregating in 5 families, while theestimated allele frequency for bt2-2854 is 0.5 and issegregating in 16 of the NAM families. The alleles at sh1-3822 were used to simulate additive genetic effects that

TABLE 1

Power, accuracy, and precision in identification of simulated functional markers (FMs) based on Minor Allele Frequencies (MAF),number of segregating families (SF) and magnitude of genetic effects

Simulated additive genetic effects (% of phenotypicvariability within segregating families)

Simulated FMa Statistical measure0.183(1%)

0.258(2%)

0.409(5%)

0.577(10%)

0.707(15%)

0.817(20%)

sh1-5769 Powerb 0.07 0.42 0.95 1.0 1.0 1.0MAFR ¼ 0.19

SF ¼ 5Accuracy: f(AIFM)c 1.0 0.93 0.99 1.0 1.0 1.0Average estimated

genetic effectsd

0.305 0.310 0.424 0.597 0.704 0.833

Precision: 95%LD C.I. (r2)e

NA $0.14 1.0 1.0 1.0 1.0

SNP loci includedin the C.I.

NA sh1-5769, 6027 sh1-5769 sh1-5769 sh1-5769 sh1-5769

sh1-3822 Power 1.0 1.0 1.0 1.0 1.0 1.0MAFR ¼ 0.47

SSF ¼ 15f(AIFM) 0.33 0.51 0.81 0.94 1.0 0.99Average estimated

genetic effects0.171 0.226 0.409 0.574 0.705 0.817

95% LD C.I. (r2) $0.51 $0.66 $0.88 $0.88 1.0 1.0SNP loci included

in the C.I.sh1-3822, 3982, 3564,

4325, 4133, 3861sh1-3822,

3564, 4325sh1-3822, 3982 sh1-3822, 3982 sh1-3822 sh1-3822

sh1-4884 Power 1.0 1.0 1.0 1.0 1.0 1.0MAFR ¼ 0.19

SSF ¼ 24f(AIFM) 0.90 0.98 1.0 1.0 1.0 1.0Average estimated

genetic effects0.1984 0.2665 0.4194 0.5895 0.7167 0.8294

95% LD C.I. (r2) $0.44 1.0 1.0 1.0 1.0 1.0SNP loci included

in the C.I.sh1-4884, 3870 sh1-4884 sh1-4884 sh1-4884 sh1-4884 sh1-4884

a MAFR designates minor allele frequency among maize lines at http://www.panzea.org, SF designates the number of families inthe NAM population that segregate the FM.

b Power indicates that a segregating Q locus within an amplicon containing the FM has a statistically significant association withthe phenotypic variability; the significant Q locus may not be the actual FM.

c Frequency of accurately identified functional markers.d The average consists only of estimates for correctly identified FMs.e The 1 � a LD C.I. is defined as P(r2 $ R2) $ 1 � a, where r2 is the LD between the identified FM and the ‘‘true’’ simulated FM

and R2 is designated as a threshold value associated with the probability $1 � a.

Identification of Functional Markers 377

Page 6: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

accounted for 10% of the phenotypic variability and theFMs at bt2-900 and bt2-2854 were responsible for 5% ofthe phenotypic variability among progeny within fami-lies in which these FMs were segregating. The magni-tudes of additive genetic effects were also reversedbetween the pairs of loci (Table 3).

The fourth case of two FMs within one amplicon and athird FM in a second amplicon was evaluated using SNPalleles at sh1-3822, sh1-5769, and bt2-2854. The bt2 locuswas placed 10 and 40 cM from the sh1 loci in segregatingfamilies. Simulated additive genetic effects at the sh1-3822, sh1-5769, and bt2-2854 loci were respectively re-sponsible for 5, 10, and 5% of the phenotypic variabilityin segregating families, while the simulated geneticeffects at the sh1-3822, sh1-819, and bt2-900 loci wererespectively responsible for 5, 5, and 10% of the pheno-typic variability in segregating families.

For all simulated phenotypes in all four cases, threeadditional independently segregating background QTLwere simulated to contribute 60% of the phenotypicvariability within all families. Thus, the total contribu-tion to phenotypic variability from genetically segregat-ing loci was 60%, 70%, or 80%, depending upon themagnitude of additive genetic effects at segregatingFMs. For all four cases 100 replicates were simulated,producing 1400 data sets for analyses.

Mating designs: For the simulation studies it wasstraightforward to determine the number of polymor-phic families associated with SNPs that were designated

as FMs. However, experimentalists must choose a matingdesign without such knowledge. With the emergence ofcurrent sequencing and genotyping technologies, con-sider use of sequence data to determine the relationshipsbetween minor allele frequencies (MAFs) and number ofinformative families. Eleven maize amplicons represent-ing Id1, d8, sh2, su1, fea2, bt2, ae1, wx1, d3, sh1, and zfl1 locihave been resequenced in 30 maize inbreds (http://www.panzea.org/). Among the resulting 65,966 bp ofsequence there are 1552 SNPs with estimated MAFsranging from 0.03 to 0.50. For three single referencemating designs, sequences from 27 maize inbred lineswere contrasted with sequences from 3 lines selected asleast similar to the remaining 27. These 3 lines were alsoused in a multi-reference design where the 27 lines wererandomly distributed into 3 subgroups of 9. SNPs in eachsubgroup were compared with the three references todetermine the number of informative families in a multi-reference design.

RESULTS

Frequencies and genetic effects of functionalmarkers: Typical association profiles in the genomicregion associated with the sh1 amplicon for some of thesimulated phenotypes are presented in Figure 2. Thesimulated FMs are at the sh1-3822, sh1-4884, and sh1-5769loci, where the magnitudes of the genetic effects account

TABLE 2

Ability of NAM populations to distinguish functional and nonfunctional markers that are in complete disequilibrium in thereference population, but located on separate amplicons in segregating families (SFs)

Simulated additive geneticeffect, as a percentage ofphenotypic variability in SFs

Genetic linkage betweenfunctional marker

and sh1-3822a (cM)

False positiveassociations with

sh1 locib

Frequency ofcorrectly identifiedfunctional markersc

Differencebetweenmodelsd

5% 10 1.0 0.98 4.9720 1.0 1.0 16.1630 1.0 1.0 22.2840 1.0 1.0 27.59

Independent 0.01 1.0 33.53

20% 10 1.0 1.0 21.2420 1.0 1.0 65.2430 1.0 1.0 86.3540 1.0 1.0 103.06

Independent 0.01 1.0 132.37

30% 10 1.0 1.0 33.2720 1.0 1.0 97.0030 1.0 1.0 127.9040 1.0 1.0 149.26

Independent 0.0 1.0 193.27

a Among the parental lines, the simulated FM exhibits complete disequilibrium with sh1-3822.b Frequency of false positive sh1 loci that exhibit a significant association with the phenotype.c Frequency of correctly identified FMs when the FM is included in the model.d The average difference of the �log10(p) between models that included the correct FM and models that included only false

positive sh1-3822 markers.

378 B. Guo, D. A. Sleper and W. D. Beavis

Page 7: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

for 1–10% of the phenotypic variability in families withsegregating FMs. FMs with larger simulated geneticeffects reveal similar association profiles except that thestatistical significance, �log10(p), has larger values.

A summary of all analyses involving FMs responsiblefor 1–20% of the phenotypic variability reveals that ifthere are a large number of informative progeny at thelocus (1500 for sh1-3822), the power of the test is 1.0,across all magnitudes of genetic effects (Table 1). ForFMs that may be at lower frequencies (0.19) in thereference population, but are informative in mostfamilies (sh1-4884), the power of the test is also closeto certainty, even for small additive genetic effects. ForFMs that were segregating in only a few families thepower to identify the FM is low if the FM contributes littleto the overall phenotypic variability. If the magnitude ofthe genetic effects accounts for at least 5% of thephenotypic variability and segregates in at least 500 ofthe 2800 progeny, the power to identify the FM is veryhigh. Thus, except in cases where low-frequency FMscontribute little to the phenotypic variability and aresegregating in ,18% of at least 2800 progeny, the NAMdesign will have adequate power to identify FMs involvedin simple additive genetic architectures.

Accuracy, assessed as the frequency of accuratelyidentified functional markers, f(AIFM), increased withthe number of families segregating for the simulated FMand the magnitude of the simulated genetic effect(Table 1). Accuracy, as assessed by the estimated geneticeffects relative to the actual simulated genetic effects,indicated that all estimates were slightly inflated, exceptin cases where there was low power, such as the FM (sh1-

5769). As with power, the ability to accurately identify aFM and estimate its genetic effects is extremely good aslong as the segregating genetic effect contributes at least5% to the total phenotypic variability.

Inaccurate identification of non-FMs is a function ofLD for those SNP loci in close proximity to the FM. Incontrast to association mapping where the rapid decayof LD with physical distance from the FM will result infew false positive associations (Kathiresan et al. 2008;Sabatti et al. 2009), replicates of the simulated NAMpopulation resulted in some false positive associations;i.e., the f(AIFM) , 1.

Precision was quantified using a 95% LD C.I. based onan amplicon (gene)-wise significance threshold (Table1). The size of the LD C.I. decreased with increasing FMsresponsible for larger amounts of phenotypic variability.Simulated FMs responsible for .5% (sh1-5769), 15%(sh1-3822), and 2% (sh1-4884) of the phenotypic vari-ability were the sole loci contained in the LD C.I.Occasionally, nonfunctional SNPs that are in LD withthe simulated FM were included in the LD C.I. Forexample, the simulated FMs at sh1-4884 and sh1-5769 arein LD (r2 ¼ 0.50) with the minor alleles at sh1-5469 andsh1-5715 and both are included in their respective LDC.I. Also note that sh1-3822 has a larger LD C.I. than sh1-5769 in situations where the simulated genetic effectswere responsible for 5 and 10% of the phenotypicvariability. This is somewhat counterintuitive given thegreater number of informative families for sh1-3822. Itwas noted, however, that there are a larger than averagenumber of SNP loci in LD with the simulated FM at thislocus (data not shown).

TABLE 3

Accuracy in identification of multiple functional markers (FMs) within the same linked genomic regionusing two variable selection procedures

Amplicons

GeneticDistance

(cM)Sets of simulated

functional marker loci

% contribution ofadditive genetic effects

to phenotypic variabilitywithin segregating families

Accuracya

(modified stepwiseprocedure)

Accuracy(forward selection

procedure)

sh1 sh1-3822, sh1-819 10, 10 0.94 0.24sh1-3822, sh1-819 5, 5 0.93 0.32sh1-3822, sh1-819 10, 5 0.95 0.73sh1-3822, sh1-5769 10, 5 0.97 0.95sh1-3822, sh1-5769 5, 10 0.81 0.43sh1-3822, sh1-5769, sh1-819 5, 10, 5 0.44 0.07

sh1, bt2 10 sh1-3822, bt2-2854 10, 5 0.93 0.9840 sh1-3822, bt2-2854 10, 5 0.95 1.010 sh1-3822, bt2-900 10, 5 0.95 0.8840 sh1-3822, bt2-900 10, 5 0.97 0.9310 sh1-3822, bt2-900 5, 10 0.86 0.7840 sh1-3822, bt2-900 5, 10 0.73 0.6710 sh1-3822, sh1-5769, bt2-2854 5, 10, 5 0.88 0.8240 sh1-3822, sh1-819, bt2-900 5, 5, 10 0.75 0.23

Analyses were able to identify a statistically significant association between segregating Q loci within an amplicon containing theFMs in all data sets; i.e., the power was 1.0 for both variable selection methods.

a The frequency of correctly identified sets of functional markers across all replicates of simulated phenotypes.

Identification of Functional Markers 379

Page 8: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

Disequilibrium between functional and nonfunc-tional markers: An allele in complete disequilibriumwith the minor allele at the sh1-3822 locus was simulatedand placed at 10, 20, 30, and 40 cM from the sh1amplicons as well as on an independently segregatinglinkage group (Table 2). An initial scan (excluding thesimulated FM) of all SNP loci indicated that the sh1-3822alleles were significantly associated with the segregatingtrait in all replicates. These represent false positiveassociations. If the FM was simulated on an independentchromosome, no false positive associations were de-tected. Application of the variable selection procedureto data sets that included the simulated FM resulted incorrectly identifying the FM instead of the sh1-3822locus. The impact of tighter linkage, the number ofsegregating progeny, and the ability to resolve FM withnon-FM was not investigated but needs to be.

Multiple functional markers within a linked genomicregion: Phenotypes were simulated on the basis of two orthree FMs within a single candidate amplicon or two orthree FMs located in two amplicons located in a linkedregion, with additive genetic effects at the FMs account-ing for 5 or 10% of the total phenotypic variability (Table

3). The correct sets of FMs were identified in .70% ofthe simulated data sets, indicating that it is usuallypossible to distinguish two FMs within the same candi-date gene or whether there are two or three FMs locatedin two candidate genes (amplicons) located in a linkedregion. Note that this ability to accurately identify FMsdepends upon the variable selection procedure; i.e.,frequencies of correctly identified FMs using forwardselection were less than those using the modified step-wise procedure (Table 3).

Simulated phenotypes in NAM families involving sh1-3822 paired with either sh1-819 or bt2-2854 produced thelargest frequencies of correctly identified FMs. For eachof these three loci the frequency of segregating familieswas �0.5 while the number of cosegregating loci oc-curred in 5 (sh1-3822, sh1-819) or 8 (sh1-3822, bt2-2854) of28 families. By way of contrast, the least favorableresolution of pairs of loci involved sh1-3822 and eithersh1-5769 or bt2-900. The frequency of segregating familiesfor the latter two loci was 5/28 and the number ofcosegregating loci occurred in only 2 (sh1-3822, sh1-5769)or 3 (sh1-3822, bt2-900) families. The ability to resolve thepair of FMs was strengthened if the more prevalent locus

Figure 2.—Typical asso-ciation profiles obtainedfrom simulated NAM popu-lations. Each solid circle rep-resents analysis results at aSNP within the amplicon.Circle colors represent cor-relation coefficients withthe simulated functionalmarker that is representedby the red circle (orange,0.8–1.0; yellow, 0.6–0.8;blue, 0.4–0.6; green, 0.2–0.4; black, 0–0.2). The redhorizontal line representsthe significance thresholdat �log10(p) ¼ 3.8. Resultsare shown for simulated ge-netic effects that accountfor 1, 2, 5, and 10% of the to-tal phenotypic variabilitywithin segregating families.Note that the vertical scalesare unique to each plot.

380 B. Guo, D. A. Sleper and W. D. Beavis

Page 9: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

(sh1-3822) was responsible for a larger amount ofphenotypic variability than its less prevalent partner(sh1-5769). These results suggest that if there is a largedifference in genetic effects between FMs and thenumber of cosegregating families is small, it will bepossible to resolve pairs of FMs on the same amplicon.In contrast, if the genetic effects of multiple FMs areresponsible for similar amounts of phenotypic variabilityand the numbers of cosegregating families are large,resolution of FMs will be poor. These results are some-what counterintuitive and need further investigation.

In the case of three FMs within a single candidate gene(amplicon) only 44% of the analyses managed tocorrectlyinclude all three FMs, indicating that the NAM design andthe variable selection procedure are limited in their abilityto resolve multiple FMs within a single ampiicon.

Mating designs: The results (Figure 3) indicate thatregardless of choice of reference, SNP loci with rarealleles will be either highly informative (segregating in alarge number of families) or largely noninformative(fewer than five informative familes) for the singlereference design (Figure 3, A–C). Alternatively, use ofthree references resulted in about one-third as manySNP loci in noninformative families as there are in thesingle reference designs (Figure 3D). On the basis ofthese results the single reference design is not optimal,particularly if the frequencies of FMs are rare. If thereference line carries a minor allele that occurs at lowfrequency in the reference population, then expectedheterozygosity for the locus could approach 1.0 amongNAM crosses, providing considerable power to identifythe FM. On the other hand, if the MAF is low and thecommon reference does not have this allele, then theexpected heterozygosity could approach 0.0, resultingin failure to detect the FM (Xu 2003).

DISCUSSION

Replication of genetic effects, genomic architecture,and population structure with simulations providedquantitative measures of power, accuracy, and precisionin identification of FMs in NAM populations. By usingsequences from the maize diversity project (http://www.panzea.org) the simulations were based on realisticrepresentations of allelic diversity and disequilibriumamong SNPs. Although resequenced amplicons frommaize show greater diversity and less disequilibrium perkilobase than those from most plant species, the resultsare applicable to all NAM populations because resultsare based on disequilibrium among markers regardlessof their density per kilobase of sequence. However, theresults also suggest caution should be exercised if acandidate gene approach is pursued with a NAM popu-lation. An actual statistical test based on the variableselection procedure was needed to eliminate falsepositives from the model. It is important to recognizethat such ability to resolve FMs was possible only because

genotypic information from the actual FMs was avail-able. If the FMs are not genotyped, false positive asso-ciations may be declared (Nair et al. 2009).

Data analyses: The data analyses are based on a biallelicmodel per SNP locus, which is reasonable for the vastmajority of SNP loci. This FM model can be extended tothe concept of multiple haplotypes, i.e., multiple FMalleles. The concept of multiple FM alleles can be regardedas a combination of biallelic variants within a haplotype(or amplicon) and the goal of identifying multiple FMalleles within an amplicon would be addressed exactly aswe proposed in our third study on identifying multiplefunctional markers in the same genomic region. However,the model will need to be extended to account for epistasisamong the alleles at the SNP loci.

Conceptually, the data analyses used herein and by Yu

et al. (2008) are similar to that proposed for quantitativetrait transmission disequilibrium tests (QTDT) in humanfamilies when combined with imputation of genotypes ofrelatives (Abecasis et al. 2000; Burdick et al. 2006). Thedata analysis method proposed herein has the followingdistinctive features from these other approaches: (1)Background QTL are controlled with cofactor markersidentified in a model-building process, whereas they arecontrolled with a covariance matrix based on familyrelationships in QTDT (Abecasis et al. 2000) or based ona forward selection process (Yu et al. 2008); (2) genotypesat Q loci are imputed in the progeny using the condi-tional expectation based on known genotypic informa-tion of the parental alleles and genotypic informationfrom anonymous flanking markers, whereas they areimputed using a random draw from a conditionaldistribution by Yu et al. (2008); and (3) implementationherein is through least squares, whereas maximum-likelihood algorithms are used in the other two.

The decision to use cofactors to represent backgroundQTL in the modeling process is idealistic because itassumes that background genetic effects are fully ex-plained by the identified cofactors. For the simulated datasets this assumption was not violated, and thus the resultsrepresent a set of ‘‘best cases.’’ It should be recognized,however, that such an assumption is not reasonable forexperimental NAM populations. Further, it is very diffi-cult to test the assumption. Intuitively, it seems that amore reasonable approach for experimental NAM pop-ulations is to model the background genetic effects as arandom effect variance component in a mixed-modelframework. At this point, however, it remains an openquestion as to which approach to modeling the back-ground QTL is better for various possible geneticarchitectures and population structures.

The challenge of imputing genotypes in segregatingprogeny on the basis of parental genotypes has beenaddressed using three approaches (Yi and Shriner

2008): (1) Estimate all missing genotypes by theirexpected values conditional on observed flankingmarkers (Haley and Knott 1992), (2) consider QTL

Identification of Functional Markers 381

Page 10: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

genotypes as unknowns to be predicted using an MCMCupdate procedure, and (3) perform multiple samplingof genotypes from a conditional probability distributionfor each unknown locus (Sen and Churchill 2001).Given the large number of SNP loci and large number offamilies and progeny of NAM populations, the latter twoapproaches are computationally daunting. We foundthe first to be accurate while computationally feasible.Using publicly available data resources for maize and themethods described herein, we found that it is possible toassign 90% of 1.5 million genotyped SNPs from themaize HapMap project to linkage map positions inthe maize NAM progeny using �1100 mapped markers(B. Guo and W. D. Beavis, unpublished results).

The motivation for development of a least-squaresalgorithm to map FMs is that it is accessible to exper-imentalists through many statistical software packagesand is computationally fast even in situations with largenumbers of SNPs in a large NAM population. It is wellknown that variable selection methods may not result inthe correct model (Christensen 2002), although forthe situations simulated herein, FMs were correctlyidentified in most cases, even when the model was builtunder potentially confounding influences of disequilib-rium among functional and nonfunctional makers inthe reference population. Another modification tovariable selection is to consider an iterative reweightedleast-squares algorithm and likelihood-ratio test, such asdescribed by Xu (1998b).

The i.i.d. assumption associated with error in model(1) may not be seriously violated if all families areevaluated under relatively uniform conditions and co-factor markers are used to reduce background QTLeffects among different families. It is possible, however,that some families may be evaluated in different environ-ments. This can occur when data are pooled frommultiple single-family studies or if different familiesare adapted to different sets of environments. Inaddition to transformations, standardized phenotypeshave been suggested for handling this issue (Walling

et al. 2000; Guo et al. 2006). Another solution is the use ofpermutation tests (Churchill and Doerge 1994), butin a nested model (Anderson and Ter Braak 2003),although it is unclear how heterogeneous error varian-ces among families will affect permutation tests. Finally,we recognize that thresholds are essential for inferentialstatistics but irrelevant to Bayesian approaches.

We also recognize that a measure of precision forNAM experiments is needed. While the LD C.I. waspossible with 100 replicates of the simulated genotypiceffects, an individual NAM experiment needs a methodto estimate precision for the identified FMs. An estima-tor that is conceptually related to the LD C.I. would bebootstrapping (Visscher et al. 1996; Lebreton andVisscher 1998), but would likely require significantcomputational resources. An alternative method pro-posed for linkage mapping is the confidence intervalbased on population size and the estimated QTL effect

Figure 3.—Number ofinformative families basedon expected heterozygos-ity, between pairs of paren-tal lines plotted against theminor allele frequenciesfor 11 resequenced ampli-cons sampled from theworldwide collection ofmaize. The set of 27 maizeinbred lines includes (1)A272, (2) D940Y, (3) EP1,(4) F2, (5) I205, (6) T232,(7) B103, (8) B97, (9)CI187-2, (10) Ki21, (11)Ky21, (12) M162W, (13)Mo17, (14) NC260, (15)Oh43, (16) Pa91, (17)W153R, (18) II101, (19)P39, (20) N28Ht, (21) A6,(22) CML254, (23)CML258, (24) CML333,(25) Ki3, (26) NC348, and(27) Tx601 (Liu et al.2003). Sequences fromthe 27 lines are comparedwith B14A (A), Ia2132(B), and IDS28 (C). The27 lines also were randomly

divided into three subgroups. (D) Group 1 includes 1, 4, 7, 10, 13, 16, 19, 22, and 25. Group 2 includes 2, 5, 8, 11, 14, 17, 20, 23,and 26. Group 3 includes 3, 6, 9, 12, 15, 18, 21, 24, and 27. Group 1 is compared with B14A, group 2 with Ia2132, and group 3 withIDS28 and the number of informative families were tallied across all three.

382 B. Guo, D. A. Sleper and W. D. Beavis

Page 11: Nested Association Mapping for Identification of Functional ... · association genetic mapping, also known as linkage disequilibrium (LD) mapping, and (3) designs that combine linkage

(Darvasi and Soller 1997) and can be adapted to themultiple-family situation of a NAM design.

Mating designs: While the originally proposed NAMdesign was based on use of a single reference, it isreasonable to consider alternative designs such as theNorth Carolina Design I (NC-DI), routinely used bycommercial maize and soybean breeders. The choice ofa single reference for the maize NAM population wasbased on the desire to identify FMs from the breadth ofmaize germplasm while reducing the potential con-founding of photoperiod on most traits through use of asingle reference line adapted to temperate day length(Myles et al. 2009). Alternatively if all parental lines usedin the NAM design are adapted to a set of targetenvironments, results reported herein suggest that themating design should be based upon the criterion ofmaximizing the number of informative families.

While three references are more optimal than one foradapted NAM populations, the optimum number ofreferences was not determined. Determination of anoptimum will depend upon the power of the experi-ment. It is reasonable to expect power to increase withfamily size and number of families (Xu 1998a). Poweralso will be affected by family heterogeneity arising froma variety of genetic mechanisms and experimentalprotocols. An investigation of optimal NAM designs willneed to be based on multiple criteria and will requiredevelopment of constraints in the context of an appro-priate objective function (Hillier 2005).

Funding for this research was provided by the Plant SciencesInstitute and the G.F. Sprague endowment for population geneticsin the Department of Agronomy at Iowa State University.

LITERATURE CITED

Abecasis, G. R., L. R. Cardon and W. O. C. Cookson, 2000 A gen-eral test of association for quantitative traits in nuclear families.Am. J. Hum. Genet. 66: 279–292.

Andersen, J. R., and T. Lubberstedt, 2003 Functional markers inplants. Trends Plant Sci. 8: 554–560.

Anderson, M. J., and C. J. F. Ter Braak, 2003 Permutation tests formulti-factorial analysis of variance. J. Stat. Comput. Simul. 73: 85–113.

Buckler, E., and M. Gore, 2007 An Arabidopsis haplotype maptakes root. Nat. Genet. 39: 1056–1057.

Burdick, J. T., W. Chen, G. R. Abecasis and V. G. Cheung, 2006 Insilico method for inferring genotypes in pedigrees. Nat. Genet.38: 1002–1004.

Christensen, R., 2002 Plane Answers to Complex Questions: The Theoryof Linear Models, Ed. 3. Springer, New York.

Churchill, G. A., and R. W. Doerge, 1994 Empirical threshold val-ues for quantitative trait mapping. Genetics 138: 963–971.

Darvasi, A., and M. Soller, 1997 A simple method to calculateresolving power and confidence interval of QTL map location.Behav. Genet. 27: 125–132.

Farnir, F., B. Grisart, W. Coppieters, J. Riquet, P. Berzi et al.,2002 Simultaneous mining of linkage and linkage disequilib-rium to fine map quantitative trait loci in outbred half-sib pedi-grees: revisiting the location of a quantitative trait locus withmajor effect on milk production on bovine chromosome 14.Genetics 161: 275–287.

Guo, B., D. A. Sleper, J. Sun, H. T. Nguyen, P. R. Arelli et al.,2006 Pooled analysis of data from multiple quantitative traitlocus mapping populations. Theor. Appl. Genet. 113: 39–48.

Haley, C. S., and S. A. Knott, 1992 A simple regression method formapping quantitative trait loci in line crosses using flankingmarkers. Heredity 69: 315–324.

Hillier, F. S., 2005 Introduction to Operations Research, Ed. 8. McGraw-Hill, New York.

Kathiresan, S., K. Musunuru and M. Orho-Melander 2008 De-fining the spectrum of alleles that contribute to blood lipidconcentrations in humans. Curr. Opin. Lipidol. 19: 122–127.

Jannink, J., and R. Jansen, 2000 The diallel mating design formapping interacting QTLs, pp. 81–88 in Quantitative Geneticsand Breeding Methods: The Way Ahead, edited by A. Gallais,C. Dillmann and I. Goldringer. Institut National de la Re-cherche Agronomique, Paris.

Jannink, J., and X. L. Wu, 2003 Estimating allelic number and identityin state of QTLs in interconnected families. Genet. Res. 81: 133–144.

Jansen, R. C., J. L. Jannink and W. D. Beavis, 2003 Mapping quan-titative trait loci in plant breeding populations: use of parentalhaplotype sharing. Crop Sci. 43: 829–834.

Jiang, C., and Z.-B. Zeng, 1995 Multiple trait analysis of geneticmapping for quantitative trait loci. Genetics 140: 1111–1127.

Jung, J., R. Fan and L. Jin, 2005 Combined linkage and associationmapping of quantitative trait loci by multiple markers. Genetics170: 881–898.

Lebreton, C. M., and P. M. Visscher, 1998 Empirical nonparamet-ric bootstrap strategies in quantitative trait loci mapping: condi-tioning on the genetic model. Genetics 148: 525–535.

Liu, K., M. Goodman, S. Muse, J. S. Smith and E. Buckler,2003 Genetic structure and diversity among maize inbredlines as inferred from DNA microsatellites. Genetics 165:2117–2128.

Myles, S., J. Peiffer, P. J. Brown, E. S. Ersoz, Z. Zhang et al.,2009 Association mapping: critical considerations shift fromgenotyping to experimental design. Plant Cell 21: 2194–2202.

Nair, R. P., K. C. Duffin, C. Helms, J. Ding, P. E. Stuartr et al.,2009 Genome-wide scan reveals association of psoriasis withIL-23 and NF-kB pathways. Nat. Genet. 41: 199–204.

Nordborg, M., and D. Weigel, 2008 Next-generation genetics inplants. Nature 456: 720–723.

Perez-Enciso, M., 2003 Fine mapping of complex trait genes com-bining pedigree and linkage disequilibrium information: a Bayes-ian unified framework. Genetics 163: 1497–1510.

Sabatti, C., S. K. Service, A. L. Hartikainen, A. Pouta, S. Ripatti

et al., 2009 Genome-wide association analysis of metabolic traitsin a birth cohort from a founder population. Nat. Genet. 41: 35–46.

Sen, S., and G. A. Churchill, 2001 A statistical framework for quan-titative trait mapping. Genetics 159: 371–387.

Visscher, P. M., R. Thompson and C. S. Haley, 1996 Confidenceintervals in QTL mapping by bootstrapping. Genetics 143:1013–1020.

Walling, G. A., P. M. Visscher, L. Andersson, M. F. Rothschild, L.Wang et al., 2000 Combined analysis of data from quantitativetrait loci mapping studies: chromosome 4 effects on porcinegrowth and fatness. Genetics 155: 1369–1378.

Wu, R., C. Ma and G. Casella, 2002 Joint linkage and linkage dis-equilibrium mapping of quantitative trait loci in natural popula-tions. Genetics 160: 779–792.

Xiong, M., and L. Jin, 2000 Combined linkage and linkage disequilib-rium mapping for genome screens. Genet. Epidemiol. 19: 211–234.

Xu, S., 1998a Mapping quantitative trait loci using multiple familiesof line crosses. Genetics 148: 517–524.

Xu, S., 1998b Iteratively reweighted least squares mapping of quan-titative trait loci. Behav. Genet. 28: 341–355.

Xu, S., 2003 Estimating polygenic effects using markers of the entiregenome. Genetics 163: 789–801.

Yi, N., and D. Shriner, 2008 Advances in Bayesian multiple QTLmapping in experimental 11 designs. Heredity 100: 240–252.

Yu, J., J. B. Holland, M. D. McMullen and E. S. Buckler,2008 Genetic design and statistical power of nested associationmapping in maize. Genetics 178: 539–551.

Zeng, Z. B., 1994 Precision mapping of quantitative trait loci. Genet-ics 136: 1457–1468.

Communicating editor: K. W. Broman

Identification of Functional Markers 383