estimating allele frequencies of hypervariable dna systems

Download Estimating allele frequencies of hypervariable DNA systems

Post on 21-Jun-2016




0 download

Embed Size (px)


  • Forensic Skience International, 51 (1991) 273-280 Elsevier Scientific Publishers Ireland Ltd.




    Immunohematology laboratory, Istituto di Medicina Legale, Universitci Cattolica o!el S. Cuore, Largo F. Vito, 1, I- 00168 Roma (Italy)

    (Received May 13th, 1991) (Revision received August 22nd, 1991) (Accepted August 29th, 1991)


    Several polymorphisms of human DNA have been shown to be hypervariable due to the recurrence of a variable number of tandem repeats (VNTRs) in the lengths of allelic restriction fragments. The recurrence of allelic variants in this novel class of polymorphisms seems to comply well with a model of continuous random variables. Based on this assumption, we have compiled some simple algorithms for classification of continuous data and estimation of classes of relative frequencies and have im- plemented these routines for the management of databases storing hypervariable single locus DNA genetic systems. The algorithms are compiled in BASIC language and can be incorporated in task- oriented computer programs. Three procedures are discussed, based in turn on: (a) using predeter- mined, arbitrary classes; (b) point estimations of frequencies for single fragments using error measurements associated with the kilobase value assignment; (c) estimates of phenotype frequencies according to error measurements. Error measurements are obtained from a statistic of values per- taining to several restriction fragments (genomic controls) repeatedly tested in different ex- periments. Problems related to these approaches are discussed.

    Key words: DNA profiles; Single-locus probes; Gene frequencies; BASIC algorithms.


    Hypervariable DNA markers of human genome are a vast category of polymorphisms whose importance is well established in several fields of biological research [1,2]. They are used in forensic biology because of their con- siderable power of individualization and have revolutionized the fields of biological stains analysis [3] and parenthood investigations [4]. The prominent feature of HVRs (hypervariable regions; VNTRs, variable number of tandem repeats) lies in the high number of allelic restriction fragments for each genetic locus.

    Unequal crossing over [5], as well as premeiotic germline mutations [6] have been proposed to explain a continuous generation of new alleles. According to these hypotheses, a high rate of mutation per locus [7] has been reported for some of these loci. Whatever the nature of the underlying genetic mechanism,

    0379-0738/91/$03.50 0 1991 Elsevier Scientific Publishers Ireland Ltd. Printed and Published in Ireland

  • 274

    DNA restriction fragments with different lengths result from the iteration of simple tandem repeats of core nucleotide sequences.

    In VNTR systems, allelic products may theoretically differ from each other by a length portion equivalent to at least one repeat unit [8]. In low molecular weight systems, resolution is much improved so that individual alleles can be identified and classified even in the presence of a high polymorphism (e.g. YNZ22-D17S5).

    However,many VNTRs have short repeat units assembled in large-size alleles. Small differences in lengths and measurement errors (because of the low resolu- tion of high molecular weight alleles in agarose electrophoresis) create an array of continuous data. As a consequence, individual alleles cannot be identified [8].

    Systems of this sort fit the model of continuous random variables. Therefore, use of VNTRs as genetic markers must address allelic length variation against a background of random measurement fluctuation.

    This paper deals with several procedures which we have adopted to estimate the frequency of occurrence of hypervariable fragments. We have developed some simple routine algorithms, written in BASIC language, which enable calculation of allele frequencies of hypervariable DNA and may be useful as part of a task-oriented computer program. Some properties of hypervariable DNA regions (HVRs) to which the algorithms apply, as well as problems related to the computational approach are discussed.

    Materials and Methods

    Genomic DNA was obtained as follows. Blood samples (0.8 ml each) were first frozen, then thawed and the red cells selectively lyzed by 1 x saline sodium citrate (SSC). The white cells were pelleted and subsequently incubated in a sodium dodecyl sulphate(SDS)/sodium acetatejproteinase K buffer [3]. DNA was finally obtained by phenol/chloroform extraction and ethanol precipitation.

    Enzymatic restriction was carried out using overnight incubation with fivefold excess Hinf I (Boehringer, Mannheim, no. 1274082). Digests of 3 pg were finally electrophoresed on 0.8% agarose gels in 1 x TBE (Tris Borate EDTA buffer; 10 x = 1.3 M Tris; 0.75 M Boric acid; 0.015 M EDTA, pH 8.8). Each run typically contained two visible mol. wt markers (1 kb ladder, BRL, cat. no. 520 - 5615SB, 1 pg per lane) at the gel side extremities. Adjacent to these lanes, two more lanes contained a smaller aliquot of the same marker (1 kb ladder, 2 ng). A central lane contained one genomic control digest, whose polymorphic profile was known from previous analyses. One genomic control digest was typically subjected to serial analyses (30 experiments in most cases), thereafter it was replaced by another digest. Data pertaining to several genomic control digests with different molecular sizes were in this way collected. At the end of the experiments, data on genomic control fragments spanning from 7 - 0.5 kb at roughly regular inter- vals of about 500 base pair (bp) were available.

    The gels were run (35 V, 20 mA) until the 2 kb marker band had reached 160 mm apart from the well line. On completion of electrophoresis, all digests were Southern blotted onto a nylon membrane (Hybond, Amersham) and hybridised

  • 275

    to two different hypervariable probes (YNH24/D2S44; 3HVR/D16) under high stringency conditions after Church and Gilbert [9]. Autoradiography of the hybridised membranes was carried out for 3-5 days, at -80C.

    The relative positions of the detected fragments were measured on a semi- automated basis, using a digitizing tablet (Summagraphics), then turned into kb values by an algorithm based on the reciprocal method of Elder and Southern [lo] and stored in sequential files. Different files were created for the genomic controls and the population data. In both cases, additional information (on the geographical and ethnical origin of the individual and on the relevant experi- ment) were attached to each pair of kb values and recorded in duplicated sequen- tial files.

    Files containing genomic controls were used to derive standard deviations and the percentages of error underlying the procedure of kb values assignment. Files containing population data were used to calculate allele and phenotype frequen- cies. Two hundred samples (unrelated individuals from Central and Southern Ita- ly) were processed in about 40 electrophoretic runs.


    As shown by the diagram in Fig. 1, a system was devised in which genomic control files act as keys of access to the population database. Profiles of unknown DNA sample are first converted into kb values and a class interval is sized around it, by ascribing * 1 to * 3 standard deviations to each individual kb value. Gene frequencies according to different confidence intervals are finally derived by scrambling the population database and assessing how many kb values fall within the class interval. Care is taken in assuming the percentage of error per- taining to the genomic control as closest in length as possible to the fragment whose frequency is sought.


    The procedure of calculating gene frequencies outlined above treats VNTRs alleles as continuous random variables [11,12]. While the distribution of popula- tions of VNTRs alleles does not fit the Gaussian distribution (being multimodal), it can be conversely demonstrated that each individual fragment, if repeatedly measured, generates its own subset of values which fit a normal distribution curve. As a consequence and as long as the hypothesis of normal distribution of the errors of measurements holds, fragments from a population may be sampled according to the variance of every assigned weight.

    To classify fragments of a given population, a variance(s) should be experimen- tally ascribed to each kb measurement. This would involve repetition of every measurement for an adequate number of times under standard experimental conditions. Such a procedure is obviously not practical. A possible way to circum- vent this problem is to create a statistic of serial measurements from a few selected restriction fragments and assume their variance to represent the error in weighting fragments, regardless of their size.

  • 276







    I 1

    I I








    Fig. 1. A scheme of the procedure by which relative frequencies of hypervariable alleles are com- puted. A standardized protocol of Southern blot analysis feeds the population database and a selected array of fragments is scattered at roughly regular kb intervals (genomic controls). Separate comput- er files are provided for the archive and for each genomic control. Serial measures of genomic con- trols contribute standard deviations of the procedure of kb assignment. These are converted in percentage of error and ascribed to every fragment size whose frequency is sought. Point estimates on variable confidence limits


View more >