labman and linkman: a data management system specifically designed for genome searches of complex...

12
Genetic Epidemiology 11:87-98 (1994) LABMAN and LINKMAN: A Data Management System Specifically Designed for Genome Searches of Complex Diseases Phillip Adams Department of Psychiatry, College of Physicians and Surgeons of Columbia University and Division of Clinical and Genetic Epidemiology, New York State Psychiatric Institute, New York, New York Two programs have been developed to manage linkage analysis data. The first program, LABMAN, is a comprehensive laboratory data management system or- ganizing pedigrees, blood DNA samples, DNA markers, Southern blot or poly- acrylamide gels, autoradiographs, and marker-allele typings generated from these samples. Output includes mendelization checks for genetic incompatibilities in typ- ings and formatted files ready for linkage analysis. LABMAN can also compress highly polymorphic allele systems into smaller allele systems facilitating analysis of large systems. The second program, LINKMAN, provides data management for lod score output from linkage analyses.It reads linkage analysis output files, calculates lod scores by family, associates lod scores with specific marker and fam- ily identifiers, and stores these data in a database where they can be combined with lod scores from previous analyses. LINKMAN easily incorporates any of a wide variety of genetic models. It produces formatted output of lod scores by user-specified criteria for reports or as ASCII files for input to other programs. If desired, tests of homogeneity of linkage across families can be run via the HOMOG program [Ott, 19911 and their output included in reports. The programs include features critical for conducting genome searches of com- plex diseases: They are easy-to-use, well-tested, and reliable. Data from multi- center investigations can be easily combined for analysis. Moreover, they include extensive error-checking capabilities, and they are specifically set up to protect blindness between laboratory workers and data analysts. LABMAN and LINKMAN are currently available free of charge under DOS. o 1994 Wiley-Liss, Inc. Key words: linkage, data management, lod scores Received for publication November 11, 1992; revision accepted July 2, 1993. Address reprint requests to Phillip Adams, New York State Psychiatric Institute, 722 West 168th Street, Unit 14, New York, NY 10032. 0 1994 Wiley-Liss, Inc.

Upload: phillip-adams

Post on 11-Jun-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

Genetic Epidemiology 11:87-98 (1994)

LABMAN and LINKMAN: A Data Management System Specifically Designed for Genome Searches of Complex Diseases

Phillip Adams

Department of Psychiatry, College of Physicians and Surgeons of Columbia University and Division of Clinical and Genetic Epidemiology, New York State Psychiatric Institute, New York, New York

Two programs have been developed to manage linkage analysis data. The first program, LABMAN, is a comprehensive laboratory data management system or- ganizing pedigrees, blood DNA samples, DNA markers, Southern blot or poly- acrylamide gels, autoradiographs, and marker-allele typings generated from these samples. Output includes mendelization checks for genetic incompatibilities in typ- ings and formatted files ready for linkage analysis. LABMAN can also compress highly polymorphic allele systems into smaller allele systems facilitating analysis of large systems. The second program, LINKMAN, provides data management for lod score output from linkage analyses.It reads linkage analysis output files, calculates lod scores by family, associates lod scores with specific marker and fam- ily identifiers, and stores these data in a database where they can be combined with lod scores from previous analyses. LINKMAN easily incorporates any of a wide variety of genetic models. It produces formatted output of lod scores by user-specified criteria for reports or as ASCII files for input to other programs. If desired, tests of homogeneity of linkage across families can be run via the HOMOG program [Ott, 19911 and their output included in reports.

The programs include features critical for conducting genome searches of com- plex diseases: They are easy-to-use, well-tested, and reliable. Data from multi- center investigations can be easily combined for analysis. Moreover, they include extensive error-checking capabilities, and they are specifically set up to protect blindness between laboratory workers and data analysts. LABMAN and LINKMAN are currently available free of charge under DOS. o 1994 Wiley-Liss, Inc.

Key words: linkage, data management, lod scores

Received for publication November 11, 1992; revision accepted July 2, 1993.

Address reprint requests to Phillip Adams, New York State Psychiatric Institute, 722 West 168th Street, Unit 14, New York, NY 10032.

0 1994 Wiley-Liss, Inc.

Page 2: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

88 Adams

INTRODUCTION

Linkage analysis of complex phenotypes in humans is now commonplace as a result of recent advances in several areas: laboratory technique, including identifica- tion of genetic markers [Botstein et al., 1980; Weber and May, 19891, computer tech- nology, appropriate computer algorithms [Elston and Stewart, 19711, and computer programs such as LIPED [Ott, 19741 and LINKAGE [Lathrop et al., 19841. Typically such investigations require integrating data across complex data structures (pedigrees) with hundreds, or thousands, of data points for each individual within the pedigree. The results of linkage analyses can be extremely voluminous because of numerous variations of input parameters.

Thus, when one is conducting a genome search for linkage, it is essential to have an efficient method for data management. Moreover, recent experiences with retractions and failures to replicate linkage results in complex diseases have underlined the im- portance of maintaining great care in all the apparently minor technical details involved in a major linkage study. These include error-checking, maintaining blindness, being able to combine results from different collaborative studies, using multiple disease phenotypes, updating analyses with new or re-typed persons, and ensuring that all avail- able DNA samples are analyzed.

We describe two coordinated software programs, LABMAN and LINKMAN, writ- ten specifically to facilitate a linkage-analysis genome search of a complex disease. These programs simplify and check the collection and manipulation of marker-allele- typings and lod scores. LABMAN manages laboratory data and creates output files for linkage analysis. LINKMAN processes linkage analysis inputloutput files, stores lod scores in a database, and produces output files and reports. The programs facilitate data processing for several types of users-clinicians collecting pedigree data, molec- ulargeneticists in the laboratory, and statisticians performing linkage analysis. Little knowledge of computer operation and minimal training are required to use these programs.

To our knowledge there are three published programs which provide similar data management of marker data for linkage analysis. dGENE [Lange et al., 19881 is writ- ten in dBASE I11 and is designed to create input files for MENDEL, FISHER, and LINKAGE. KINDRED is designed to store, display, and edit family history data, including genotypes, in the form of pedigrees. The GEMS package [Kompanek et al., 19921 has been recently announced but is currently unavailable. KINDRED and GEMS both can draw and print pedigrees which LABMAN does not do. However, LABMAN can create output files for plotting by several popular pedigree drawing packages: FI'REE (Go, unpublished material) KINDRED, and PEDRAW [Curtis, 19901. In addition, none of these programs provide internal consistency checks of pedigree structure or error checking routines on the consistency of within-family typings (i.e., mendelization) as does LABMAN. Also, none of these programs is designed to man- age linkage output, as is done by the second program in our package, LINKMAN.

Among our requirements in developing a molecular genetic laboratory database (the LABMAN system) were that it should

Manage blood DNA samples for multiple studies, multiple families, multiple DNA samples on the same individual, and multiple typings of the same DNA sample

Page 3: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

Data Management for Linkage Analysis 89

Manage RFLP typings for markers run with multiple enzymes and detecting

Display data screens which conform to the layout and organization of data forms

Identify subsets of pedigrees for subanalyses and associate multiple disease

Separate disease phenotype assignment from pedigree structure, to ensure that

Identify and create reports of typing data which are genetically “impossible”

Create output files usable by two popular linkage analysis programs, dGENE,

multiple systems of alleles

as used in the laboratory

phenotype/liability class assignments for irtdividuals within the pedigree

laboratory personnel remain blind to disease status

or inconsistent with other family typings

MENDEL [Lange e al., 19881, and LINKAGE [Lathrop et al., 19841

Among our requirements leading to the development of a lod score database (the LINKMAN system) were that it should

Relate LINKAGE program [Lathrop et al., 19841 output to its input marker(s)

Calculate lod scores by family and across f,amilies Store fully identified lod scores in a datatbase, where they can be processed with similar data from other analyses Discriminate/identify lod scores (for any marker) by multiple family composi- tions including phenotype assignments, miiltiple allele frequency assignments, and multiple genetic models

and pedigree name(s>

Print lod scores in a readable format Create output (ASCII) files of tabled lod scores for further analysis

The two programs, LABMAN and LINKMAN, share a similar interface and are fully mouse-aware. Although designed to be used in tandem as shown in Figure 1, each program can be run independently of the other. L.ABMAN manages and prepares data for linkage analysis; LINKMAN manages the lod score results of linkage analyses.

HARDWARE AND SYSTEMS SOFTWARE

LABMAN and LINKMAN are written in FoxPro 2.0 and 2.5 (for DOS and Win- dows) and are supplied as executable programs or compiled FoxPro applications.

BloodDNA \ +Reports

Autoradiograms ’ files

f , Output files 3

input

Reports

-ASCII files for further analyses

Fig. 1 . Flow diagram of linkage analysis data processing using LABMAN and LINKMAN.

Page 4: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

90 Adams

These programs require 640K of RAM and at least 2 megabytes of hard disk stor- age. Optimal performance of FoxPro is achieved with a minimum of 4 + megabytes of extended memory, a high speed processor, and a large, fast hard disk. The program and support files require 500K each for LABMAN and LINKMAN. Data files for each program vary in size depending on the amount of data retained. A math co-processor is not required.

LABORATORY DATA MANAGEMENT SYSTEM (LABMAN)

The Laboratory Data Management System (LABMAN) is a computer software program which manages marker-allele-typing data and can create output files for the LINKAGE program [Lathrop et al., 19841. The system was designed to facilitate the organization of blood DNA samples, Southern blots or polyacrylamide gels, autora- diographs, and marker-allele typings generated from these samples. The data struc- tures are fully normalized to permit recording data on multiple studies, families, and markers; multiple DNA samples from the same person; and multiple marker-typings on the same marker from the same person [Codd, 19901. Flags are used to mark data for output, thereby permitting selection from multipleiduplicate records.

The program is now described in terms of A) data structures, B) error checking, C) reports, and D) specific features facilitating efficient processing of multiple allelic typings.

A. Data Tables

LABMAN uses six primary data tables-studies, blood/DNA samples, pedigrees, collections of DNA samples arranged on gels or blots, autoradiographs of geldblots, and marker-probes. For some of these tables we describe specific features pertinent to molecular genetic analyses.

Master pedigree. Separate pedigree files are set up for each study. These files are used to establish relations between family members by specifying the IDS of each person’s biological parents. This table also includes fields for sex, age, twin status, and ten disease phenotype/liability class assignment codes. Disease phenotype codes can be selectively included in output files created for linkage analysis or for plotting. Similarly, the user can define up to ten subsets of persons within pedigrees and create output files including only persons in the subset.

Gels/blots. Collections of DNA are processed together on Southern blots or poly- acrylamide gels. This table records the lane position of blood/DNA samples. Thirty lanes are available. Larger gels can be accommodated by dividing them up into smaller collections each of which contains only samples from a single family. For example, a 90 lane gel with data on four families could be entered as four separate geldblots with IDS PLO42-PLO44. See Figure 2 for an example.

Autoradiographs. Marker allele typings are processed in collections which cor- respond to the alleles observed on autoradiographs after hybridizing probes to the strands of DNA (lanes) on specific geldblots. This table records the identifying code of the source gel/blot, the marker-enzyme-system used, and the specific alleles ob- served for each of the 30 available lanes. For each lane, the “readability” of the typ- ing is recorded in addition to a “Use” flag (Yes/No) indicating whether the typing

,

Page 5: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

Data Management for Linkage Analysis 91

Fig. 2. Gel/blot data-entry screen. Numbers in lanes are thr: blood/DNA sample identification numbers.

Fig. 3. sented in Figure 1.

Autoradiograph data entry screen for a marker-probe hybridized to DNA on the gel/blot pre-

Page 6: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

92 Adam

the markers database manages unique Marker-Enzyme-System combinations. “PCR” is used as the enzyme code for di/tri-nucleotide repeat allele systems run on polyacryl- amide gels. This table has fields for chromosome, arm, position, locus, source, het- erozygosity , polymorphism information content, number of alleles, CEPH reference alleles for two persons [Knowles et al., 19921, and the specific sizes (in kilobytes) and population frequencies of up to 30 alleles. A field for laboratory identification (SpecID) is also provided. Figure 4 illustrates the contents of the database for the marker pre- viously shown in Figure 3.

B. Data Integrity and Error Trapping

A major concern has been incorporating features which ensure that data is en- tered correctly and error-checking is performed extensively. For example, study, fam- ily codes, markers, alleles, enzymes, blood sample IDS, blotlgel, and autoradiograph identifiers are verified and invalid values cannot be entered. Internal consistency checks inform the user when, for example, an allele number is specified for a marker which exceeds the number of alleles entered into the markers database. As much as possible, the user is prompted for input selection from popups of permissible values. This ap- proach minimizes the number of keystrokes required to enter data. Entering repetitive data such as multiple autoradiographs of a single gel/blot is facilitated by a “replication” feature which copies all duplicated fields into the new record. The system also in- cludes routines which check the internal consistency of pedigree structures. Specific checks include those for missing or duplicate IDS, missing sex codes, parental IDS are either both missing or both present, sex of parent, parental ID different from self, and cross check of parental IDS for twins.

Fig. 4. Marker-enzyme-system data entry screen.

Page 7: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

Data Management for Linkage Analysis 93

C. Reports and Other Output A wide variety of reports are available, ranging from simple lists of bloods, mark-

ers, etc. to checks for mendelization errors within sibships and across parent-offspring typings. Separate algorithms are used for mendelization checks of autosomal and x-linked markers. Flexibility in selecting families/persons and markers has been a high priority in report design. For example, all reports can be requested for either selected subsets or all families and for all, or subsets of, markers. Marker selection can be specific (e.g., marker MFDSO) or dynamic (e.g., all markers on chromosome 3 with polymor- phism information contents greater than .70). In addition, up to ten within-family subsets of persons can be defined, and output can be restricted to only those persons within the family. Simple lists of records from any LABMAN database can be output by user- defined criteria.

A summary study report prints a cross-tabulation of families by markers run within a study, including the number of persons, geldblots, and autoradiographs run. A family summary report can be requested for subsets of families and/or markers. It counts the number of persons within each family having available blood DNA, the number of persons typed for specific markers, and the number of those typings appropriate for analysis. Several types of marker-allele reports can be generated. These include checks for allele numbers which exceed maximum values in the markers database, graphic displays showing the number of observed alleles within families, lists of multiple typ- ings, lists and checks for consistency of parent-offspring alleles, and lists of persons with blood who have not been typed for specific markers. ASCII file output includes pedigree files in input format for MAKEPED [Lathrop et al., 19841, and marker data files in input format for inclusion in LINKAGE parameter files. Linkage analysis out- put files are constrained by an upper limit of 250 marker-enzyme-systems. This limit is high enough to pose no practical limitation. Plots of pedigrees are not produced by LABMAN. However, LABMAN will create ASCII files formatted specifically for three plotting programs: FT'REE, KINDRED, and PEDRAW [Curtis, 19901.

D. Special Features

Allele compression. Large allele systems may exceed the limited computer mem- ory available to perform a linkage analysis. Allele compression produces a smaller system with fewer alleles which requires less memory for analysis. When creating output files for linkage analysis, LABMAN can compress allele typings by lowering the value (number) of observed alleles while preserving the relations of marker typ- ings within and across families. LABMAN does not alter, combine, or in any way assign or estimate allele-typings for any person in the pedigree, nor is the source data altered in any way. Linkage analysis of compressed allele systems produces identical results to analysis of the same data without compression. Operationally, for a given marker, the program cross-tabulates all typings by all family(ies) members and notes which alleles are not present. The program then reduces the values of all typings such that there are no missing alleles in a smaller allele system. In addition, the specific allele frequencies of the compressed alleles are preserved and sent to the output pa- rameter file. In this manner a 30-allele system might be reduced to a 10-allele system. LABMAN will, optionally, print an Allele Distribution Report displaying the observed alleles in all families both before and after compression.

Page 8: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

94 Adam

Collapsing allele typings across enzymes. When creating output files for link- age, LABMAN allows the user to combine and analyze data for the same marker run with different enzymes by creating a new “collapsed-marker.” This feature is useful for RFLP markers run with different enzymes across families. The collapsed-marker typings combine typings made from multiple marker-enzyme combinations. The num- ber of alleles for the collapsed-marker is the highest number of alleles among the com- ponent marker-enzyme-systems. Since allele frequencies cannot be meaningfully collapsed, allele frequencies for collapsed typings are set to l/(total number of alleles).

LINKAGE DATA MANAGEMENT SYSTEM (LINKMAN)

The lod score database system (LINKMAN) is a computer software program which manages lod scores, prints formatted lod score reports, and produces ASCII files of lod scores for use by other programs. LINKMAN associates output from the Linkage analysis program MLINK [Lathrop et al., 19841 with family identifiers and marker names, and calculates lod scores from the llkelihoods produced by MLINK. LINKMAN takes all its input from three files: the parameter and pedigree files used as input for MLINK and the associated MLINK output file. Flexibility in selecting and printing lod scores for studies/families/markers run under different assumptions has been a high priority throughout. LINKMAN is fully integrated with the program HOMOG [Ott, 19911 and will both construct input files and read and print output files produced by HOMOG.

The program is now described in terms of A) data structures and B) reports.

A. Data Tables LINKMAN uses six primary data tables-studies, disease models, pedigree mod-

els, allele models, lod scores, and markers. For some of these tables we describe spe- cific features pertinent to molecular genetic analyses.

Disease model. LINKMAN reads the disease model directly from the input pa- rameter file. Specifically, the disease model includes the frequency of the disease al- lele, all penetrancelliability classes used in the analysis, and sex difference and interference parameters. LINKMAN stores these data with a unique disease model number. All input files with identical parameter files (excepting marker data) will be assigned the same disease model number. The disease model number enables LABMAN to iden- tify lod scores which, although analyzed at different times, can potentially be summed.

Pedigree model. For each family in an analysis, the exact family structure in- cluding disease phenotype and liability class assignments is read from the pedigree (input) file, stored, and assigned a unique number. All input files with identical pedi- gree structures will be assigned the same pedigree model number. Since lod scores can be calculated from varying pedigree structures (composition, affection status, liability classes), the pedigree model number is used by the program to select only lod scores generated by a specific structure for each family. When loading data into LINKMAN, the user is prompted for a TYPE code which will be stored with all the pedigrees read from the pedigree file. Since these families were analyzed together, it is assumed that they all share some factor in common (perhaps the same diagnostic rules were used to make phenotype assignments across all families) which justifies summing their lod scores. The TYPE variable is used by the program to identify family structures (models) whose associated lod scores can be summed.

Page 9: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

Data Management for Linkage Analysis 95

Allele model. For each marker analyzed, the number of alleles and their popula- tion frequency is specified in the parameter (input) file. Alleles can be coded as either numbers or binary codes. These data are stored in the Allele models database and as- signed a unique number. The same marker may be analyzed with varying number(s) of alleles (e.g., due to compression) or with different population frequencies. The al- lele model number is used by the program when summing lod scores-for a given marker, only scores generated with the same allele model can be summed.

Lod scores. Lod scores are uniquely identified by seven variables: study, family, marker, recombination fraction ( O ) , and disease, allele, and pedigree models. The values for all these variables are read from the three input files.

Markers. The Markers database is structurally identical in both LABMAN and LINKMAN and a utility is provided to update the LINKMAN markers table with all data from LABMAN. An important idiosyncrasy is shared by both programs: marker names are combinations of Marker-Enzyme-System namedcodes (because unique mark- ers may be run with more than a single enzyme and may reveal more than one system of alleles). The Markers database is useful for dyinamic selection of lod score output based on marker attributes. For example, the user can request output on all PCR mark- ers on chromosome 11, q arm, with band number greater than 21 .O. Selection of lod scores by marker attributes requires that the marker names match perfectly across the lod score and markers databases. (This correspondence will always exist when the linkage parameter file is one that was created by the LABMAN program).

B. Reports and Other Output Two output types are available-printed reports and ASCII files. When request-

ing output, the user must specify the study, family(ies), marker(s), disease model(s), and pedigree type. Selection is facilitated by use of popups of available/permissible values. Output can be ordered either alphabetically by marker or by chromosome-arm- region. Finally, the user can request that family lod scores for all values of estimated recombination fraction (0) be output (with their sum) or, alternately, that only the sums be output.

When printing lod scores the user can also specify a lod score threshold which, optionally, can be applied to either the sum of the lods across families or to individual family lod scores. If a positive threshold is entered only markers with lod scores equal or exceeding the threshold will be output. If a negative threshold is entered only mark- ers with lod scores equal or less than the threshold will be output. For example, the user can request a report on all markers for which the sum of the family lod scores exceeds 1.7, etc. LINKMAN is also integrated with the program HOMOG [Ott, 19911 to test for homogeneity and print HOMOG output, with no additional file manipula- tion required from the user. A sample report is presented in Figure 5 .

DISCUSSION

LABMAN and LINKMAN together provide a.n integrated environment for man- aging marker-allele typings in the laboratory, reading linkage analysis output files, and creating output files for reports, display, or analysis. The programs facilitate the construction of input files for several other programs (dGENE, FTREE, HOMOG, LINKAGE, MENDEL, PEDRAW, and KINDRED) and integrate output files making

Page 10: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

Report Date:

11/18/92

Page

1

SUMMAR

Y U)D

SCORE REPORT ON STUDY: SAMPLE DATA

All

Families --

Pedigree Type:II

Disease Model

- 9:

Single Marker

(Fir

nil

is w

ith

Ld

scor

es o

f zero f

or a

ll v

alue

s of

the

ta a

re printed)

-*-**

*----

--

*n

ee

Fm

ily

O.Oo0

0.01

0 O.Oz0

0.03

0 0.

040

0.05

0 0.

W

0.07

0 0.

080

0.09

0 0.

100

0.11

0 0.

120

0.13

0 0.

140

0.15

0 0.

200

0.50

0 0.

403

-- -**-*-*

*- -

w-

-

Mar

ker:

UWL-

PCR-

1 Allele -1

: 25

D

isea

se e

l:

9 Pe

digr

ee T

ype:

11

0101

5 01

019

0102

0 01

026

0103

7 01

042

0104

5

sun L

& ne

t Lo

cis

- - - - -

- - -. -0

.121

7 -0

.437

1 -0

.958

5 -1

.870

6 -2

.7co

4 -3

.9m

-1

.446

5

-11.

553

0.12

7 0.

050

- - - - - -

- -

0.04

77

-0.4

493

-1.6

597

-2

.m

-1

.836

3 -1

.160

3

-0.8

057

- - - - - -

- - -

-8.5

125

0.11

7 0.

050

0.16

35

-0.4

560

-0.w

78

-1.4

76

-2.5

038

-1.5

3V1

-1 .O

M

-7.5

252

0.10

8 0.

050

.---

---.

.-. 0.

2493

-0

.458

0 -0

.619

0 -1

3151

-2

.367

8 -1

.363

4 -0

.8V3

5

-6.7

705

0.09

9 0.

050

.- -_

---.

-.

0.31

57

-0.4

557

-0.5

578

-1.1

703

-2.2

290

-1.2

451

-0.7

979

-6.1

401

0.09

0 0.

050

. - - - -

- - 0.

3683

-0

.45W

-0

.508

9 -1

.041

5 -2

.W15

-1

.152

1 -0

.718

7

-5.5

943

0.08

2 0.

050

- - - - -

- - - -.

0.41

07

-0.4

415

-0

.W

-0

.927

0 -1

.- - 1 .o

n1

-0.6

512

. - - - - -

- - - .

-5.1

134

0.07

5 0.

050

0.44

49

-0.4

360

-0.8

253

-1.8

309

-1.0

144

-0.5

923

-4.6

849

0.06

8 0.

050

-0.4

308

. - - -

- - - - .

0.47

ZS

-0.4

187

-0.4

086

-0.7

347

-1.7

100

-0.-

-0.5

403

-4.3006

0.06

2 0.

050

. - - - -

- - - -

0.49

45

0.51

19

0.52

52

-0.4

055

-0.3

916

-0.3

T15

-0.3

857

-0.3

665

-0.3

504

-0.6

539

-0.5

816

-0.5

1M

-1.5

961

-1.4

890

-1

.m

-0

.913

8 -0

.872

1 -0

.834

5 -0

.493

6 -0

.451

5 -0

.413

2 _

__

__

__

__

__

__

-_--

----

----

-.

-3.%40

-3.6

404

-3.3

556

0.05

6 0.

051

0.04

6 0.

050

0.05

0 0.

050

0.53

49

-0.3

632

-0.-

-0.4588

-1.z

pcS

-0.I

IXu

-0.3

781

-3.- 0.

042

0.05

0

._

_^

_ -_

___ 0.

5415

-0

.349

0 -0

.325

6 -0

.406

6 -1

.205

9 -0

.768

4 -0

.346

0

-2.8

600

0.03

8 0.

050

, - - - - -

- - - -

0.54

52

-0.3

549

-0.3

162

-0.3

597

-1.1

230

-0.7

387

-0.3

163

-2.- 0.

034

0.05

0

. - - - - -

- - - -

0.54

63

-0.3

212

-0.3

084

-0.3

175

-1 .0

453

-0.2

890

-2.4

459

0.03

1 0.

050

-0.7

im

__

__

_--

__

.

ESTI

WTE

S OF

ColpOlENlS O

F CHI-S(UARE

AF9

lHES

ES

W.L

M

ALPH

A TH

ETA

SU

An

OF

CHI-ZQIIRL

L RA

TIO

nZ:

LINK

AGE,

KTE

RO

CEL

KIN

O.oOa,

l.W

W.o

o00

HZ V

S.

HI

HE

TE

WY

ITV

1

0.m

1

.m

0.m

l.m

H

l:

LINKA

GE,

HO

mN

ElT

Y

O.oOa,

(1)

W.o

o00

Hl

VS.

H)

LINK

AGE

1 HO

: NO

LIN

KAOE

(0

) (0

) (0

.5)

H2 V

S.

HO

TOTA

L 2

0.m

l.m

3.00

-WIT

SU

PWRT

INTE

RVAL

S (L

R =

20.0

855)

: AL

PW

0.O

500

1.

W

THET

A 0

.m

0.4OOO

0.

m

0.37

01

0.17

11

-0.2

577

-0.1

474

-0.0

511

-0.2

850

-0.2

502

-0.1

529

-0.1

630

-0.0

337

-0.0

213

-0.7

215

-0.3

048

-0.0

947

-0.5

861

-0.3

631

-0.1

603

-0.1

7W

-0.0

560

-0.0

087

,__

_----_

-_

_----_

__

_------

-1.6

722

-0.7

851

-0.MBo

0.02

1 0.

015

0.01

1 0.

050

0.05

0 0.

050

Fig.

5.

Exam

ple

of L

INK

MA

N l

od s

core

repo

rt in

clud

ing

hom

ogen

eity

test

ing

usin

g H

OM

OG

. Su

m L

ods

is th

e su

m o

f th

e lo

d sc

ores

un

der h

omog

enei

ty. H

et L

ods i

s the

lod

scor

e m

axim

ized

ove

r the

reco

mbi

natio

n fr

actio

n an

d al

pha.

Alp

haM

ax is

the

best

est

imat

e fo

r alp

ha.

Page 11: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

Data Management for Linkage Analysis 97

it easier to use these programs. When used together, LABMAN and LINKMAN facili- tate the analysis of the large numbers of marker typings and associated lod scores which can be generated while conducting a genome search for linkage to a disease pheno- type. The separation of laboratory data management from data analysis, insuring that allele typings are assigned without knowledge of disease phenotype status, can be achieved by transferring cleaned data, ready for analysis, firom the laboratory to those persons responsible for analysis to whom the disease phenotypes are known. Since the data are maintained in a commonly used format (xBase dbf files) it is relatively easy to import/export into other formats. Also, users with their own distribution version of FoxPro have full access to all the features of that package, including the ability to write programs which access their data.

Although each program is completely self-contained and can be run independently of the other, there are several benefits which accrue: when they are used together. First, the markers database in LABMAN can be directly transferred to LINKMAN with the result that data entry is more efficient, marker-enzyme-system names will correspond, and dynamic selection of markers is facilitated. Second, linkage analysis files created by LABMAN can be read more efficiently by LINKMAN. When reading such files, LINKMAN can more fully characterize lod scores by recording both the total and the number of persons typed within the family for each marker analyzed. This feature en- ables LINKMAN to distinguish between families which are untyped and those who were typed but are uninformative for linkage. Recording the number of persons typed for each lod score provides a level of discrimination above that of knowing that some analyses were performed with identical family stnictures (pedigree models). For ex- ample, given a fixed family structure of, say, 30 persons, a given marker might be analyzed in one instance with 21 and in another instance with 25 typings. Knowing how many typings were included makes it possiblle to identify lod scores based on older, less complete typing data which might be re-analyzed with more complete data.

These programs were written to facilitate and iimprove the quality of marker-allele data in two ongoing whole-genome searches for linkage to psychiatric disorders. They replaced a system of laboratory notebooks and manual entry of observed alleles di- rectly into ASCII files for analysis. This previous system contained no internal verifi- cation or error-checking components and the process of updating data was error-prone and time-consuming . The development process of LABMAN and LINKMAN includes full data management of two independent studies e,ach with separate personnel cover- ing a period of 4 years. To date these studies include a total of 45 families with DNA on 1,443 persons. The number of markers typed has been 381 and 229 for the two studies, producing over 91,000 marker typings. Verification of entered data has been accomplished by a two-person procedure in which entered data are verbally checked by a co-worker. While transcription errors do occur, our experience has been that almost all errors arise due to misreading of allelic bands. Mendelization error-checks identify some of these errors. At this time, LABMAN contains no auditing component to re- cord the occurrence of data entry errors and corrections. Although we have not col- lected data on the cost effectiveness of these programs, the researchers using them are enthusiastic.

While these two programs are useful they are: not without limitations. First, at this time both LABMAN and LINKMAN are available only for the DOS and Win- dows operating systems. However, the programs should become even more widely

Page 12: Labman and Linkman: A data management system specifically designed for genome searches of complex diseases

98 Adams

portable when Microsoft releases FoxPro 2.5 for the Unix and the Macintosh operat- ing systems. No OS/2 version of FoxPro is planned. However, the programs can be run in a DOS window under 0 9 2 . Second, although FoxPro provides many network features automatically, neither program has been designed or tested as a network pro- gram. Therefore, when used in a multi-user setting contention and concurrency prob- lems may arise. Third, LINKMAN does not, yet, manage multipoint linkage analyses or handle differential recombination with respect to sex.

Nevertheless, the LABMAN-LINKMAN package does provide a unified, well- tested, user-friendly data management system, expressly designed for the specific needs of workers in linkage analysis, especially those conducting whole-genome searches for complex diseases of unknown mode of inheritance.

LABMAN and LINKMAN are available from the author at no cost to interested researchers.

ACKNOWLEDGMENTS

These programs were developed with support from NIMH grants MH28274 and MH43878. The author thanks N. Freimer, L. Brzustowicz, V. Vieland, R. Straub, T. Lehner, and J . Knowles for suggestions, comments, and alpha and beta testing of the programs and G. Heiman for graphic design. Thanks also to S.E. Hodge for criti- cal reading and development of the manuscript and to T.C. Gilliam and M. Weissman for providing the research sample and laboratory data.

REFERENCES

Botstein D, White RL, Skolonick M, Davis RW (1980): Construction of a genetic linkage map in man

Codd EF (1990): “The Relational Model for Database Management: Version 2.” Reading, MA:

Curtis D (1990): A program to draw pedigrees using LINKAGE or LINKSYS data files. Ann Hum Genet

Elston RC, Stewart J (1971): A general model for the genetic analysis of pedigree data. Hum Hered 21533-542. Knowles JA, Vieland VJ, Gilliam TC (1992): Perils of gene mapping with microsatellite markers. Am J

Kompanek AJ, Kauffman ER, Blaschak J, Chakravarti A (1992): GEMS: a comprehensive database for

Lange K, Weeks D, Boehnke M (1988): Programs for pedigree analysis: MENDEL, FISHER and dGENE.

Lathrop GM, Lalouel JM, Julier C, Ott J (1984): Strategies for multilocus linkage analysis in humans. Proc

Ott J (1974): Estimation of the recombination fraction in human pedigrees: efficient computation of the

Ott J (1991): “Analysis of Human Genetic Linkage” Revised edition. London: The Johns Hopluns Univer-

Weber JL, May PE (1989): Abundant class of human DNA polymorphisms which can be typed using the

using restriction fragment length polymorphisms. Am J Hum Genet 32:314-331.

Addison-Wesley .

541365-367.

Hum Genet 5 1 :905-909.

genetic epidemiological studies. Am SOC Hum Genet Abstracts 51:A152.

Genet Epidemiolk47 1-472.

Natl Acad Sci USA 81:3443-3446.

likelihood for human linkage studies. Am J Hum Genet 2638-597.

sity Press.

polymerase chain reaction. Am J Hum Genet 44:388-396.