copy the folder… faculty/sarah/tues_merlin to the c drive c:/tues_merlin

62
Copy the folder… Faculty/Sarah/Tues_merlin to the C Drive C:/Tues_merlin

Upload: lynette-howard

Post on 11-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

  • Copy the folderFaculty/Sarah/Tues_merlinto the C Drive

    C:/Tues_merlin

  • MERLIN (and other Abecasis products)Sarah Medland & Kate MorleyBoulder 2009

  • MERLIN softwarePrograms:GRRMERLINMinXMERLIN-regressPedstatsPedwipePedmerge

  • We will be using CygwinUnix emulator for windowsOpen by double clickingMigrate to this sessions working directorycd C:/tues_merlinCheck to see the files in the directoryls

  • Data Input FilesGetting your data into Merlin

  • Input File TypesPedigree FileFamily relationshipsPhenotype dataGenotype dataData FileDescribes contents of pedigree fileMap FileRecords location of genetic markers

  • Example Pedigree File

  • Data File Field Codes

    CodeDescriptionMMarker Genotype.AAffection Status.TQuantitative Trait.CCovariate.ZZygosity.S[n]Skip n columns.

  • First step check relationshipsGRR

  • GRR - www.sph.umich.edu/csg/abecasis/GRRGraphs mean IBS against sd IBSEither within families or across everyone in the sampleIdeally 200+ markers genotyped in common for each pairIf you want to try this laterSample.ped1300 individuals from 200 familiesGenotyped on 320 markers across the genome

  • Load grr.pedTick all pairs

  • GRR is good for findingMZ pairs labeled as sib-pairsDuplicatesDads that arent dadsFull sibs who are half-sibs

  • Manipulating Data FilesPedmerge

  • Manipulating Data FilesPedmergeCombine multiple data files Remove columns from a ped fileRecode the dat file so unwanted columns are skippedAssumes ped and dat files have the same prefix example.ped example.dat

  • Type pedmerge

  • Checking for genotype errorPedstats

  • Usagepedstats.exe p pedstats.ped d pedstats.dat

  • Summarizes pedigree

  • Trait summary

  • Pedstats will crash if there are Medelian errors

    Draw a diagram for this familyfamiddadmumsexA1A21100m321200f211312m231412f331512f33

  • 3/22/12/33/33/3

  • Mendelian errorsTry to localize the errorShort term solution delete the bad genotypesLong term solution retype the family at this marker

  • After fixing the problems

  • Merlin

  • MERLINAutomates simple linkage tests (black box)Uses fast multipoint calculations to generate IBD and kinship matricesKey options are vc (variance components analysis) useCovariates (user-specified covariates)Means modelCan incorporate user-specified covariatesVariance components model

  • Merlin's Standard Variance Components Model - AQEEnvironmental component Non shared, uses identity matrixAdditive Polygenic component Shared among relatives, according to kinship matrixQTL componentShared when individuals are IBD, kinship matrix at marker

  • What is a Kinship Coefficient?Kinship coefficient (): probability that two alleles sampled at random, one from each individual, are identical by descent

    2 x ij = expected proportion of alleles IBD across genome for individuals i and j ( )But will vary at each locus

    1 / 21 / 2For MZ twins = .5For Full sibs = .25

  • General covariance model

  • Practical overviewUsing the LDL data from chromosome 19 (yesterday afternoons practical)Data cleaningMerging phenotype and genotype dataChecking you data with pedstatsVC analysis in MERLINMERLIN-regress analysisComparison of MERLIN vs Mx

  • Step #1: combining phenotypes and genotypesStart with four files:pheno.ped + pheno.dat (phenotype data)geno.ped + geno.dat (genotype data)

    Combine .ped files and combine .dat files using pedmerge to create 1 pedigree file and 1 .dat file

  • Practical #1: commandsHave a look at your fileshead Combine your pedigree files and dat files pedmerge pheno geno linkage

    Check your file using the head commandCalls up the programmeNames of the two sets of files to be combined (N.B. the matching .ped and .dat files must have the same name)Name of the newly created .ped and .dat files

  • linkage.ped

  • Step #2: checking your data with pedstatsPedstats provides preliminary data checksInitial check of input filesPedigree consistencyInformation on genetic marker dataMarker heterozygosityProportion of individuals genotypedTests of Hardy Weinberg equilibrium

  • Prac #2: commands./pedstats -x-9999.000 -d linkage.dat -p linkage.ped > prac2.out

    pedstats -x-9999.000

    d linkage.dat p linkage.ped

    > prac2.out

    Calls up the programmeSpecifies the missing valueIdentify the .dat fileIdentify the .ped fileSend the output to a text file

  • Step #3: running VC linkage./merlin --vc -x -9999.000 -p linkage.ped -d linkage.dat -m linkage.map > linkage.out

    merlin --vc -x -9999.000

    -p linkage.ped -d linkage.dat -m linkage.map

    > linkage.outCalls up the programmeSpecifies VC linkage and the missing valueIdentify the .ped, .dat, and .map filesSend the output to a text file

  • So why would we run Mx Merlin can not analyse ordinal dataLimited correction for ascertainmentLimited multivariate linkage repeated measures using the mean and TRT correlationOnly runs an AE model no C or D

  • A 86% E 14%

    Chart1

    9.599.593

    10.1210.118

    12.4612.46

    5.565.562

    3.243.242

    4.754.752

    1.791.789

    1.461.46

    1.981.975

    1.341.339

    1.31.3

    MERLIN

    Mx

    cM

    Chi-square

    AE

    Change in -2LL

    cMAE

    MERLINMXmerlin lodmx lodregress lod

    09.599.592.052.051.537

    510.1210.122.172.171.199

    1012.4612.462.672.670.725

    155.565.561.191.190.387

    203.243.240.690.690.097

    254.754.751.021.020.177

    301.791.790.380.380.279

    351.461.460.310.310.439

    401.981.980.420.420.44

    451.341.340.290.290.488

    501.31.300.280.280.441

    cMACE

    MERLINMXmerlin lodmx lodregress lod

    020.2022.914.334.91

    516.6020.953.554.49

    1010.7214.212.303.04

    156.147.951.311.70

    202.444.060.520.87

    252.333.300.500.71

    302.663.240.570.69

    353.454.050.740.87

    403.363.460.720.74

    453.443.430.740.73

    503.163.230.680.69

    Change in -2LL

    MERLIN

    Mx

    cM

    Chi-square

    AE

    Sheet2

    MERLIN

    Mx

    cM

    Chi-square

    ACE

    Sheet3

  • A 60%C 30%E 10%

    Chart2

    20.222.913

    16.620.948

    10.7214.212

    6.147.951

    2.444.055

    2.333.302

    2.663.239

    3.454.046

    3.363.464

    3.443.428

    3.163.228

    MERLIN

    Mx

    cM

    Chi-square

    ACE

    Change in -2LL

    cMAE

    MERLINMXmerlin lodmx lodregress lod

    09.599.592.052.051.537

    510.1210.122.172.171.199

    1012.4612.462.672.670.725

    155.565.561.191.190.387

    203.243.240.690.690.097

    254.754.751.021.020.177

    301.791.790.380.380.279

    351.461.460.310.310.439

    401.981.980.420.420.44

    451.341.340.290.290.488

    501.31.300.280.280.441

    cMACE

    MERLINMXmerlin lodmx lodregress lod

    020.2022.914.334.91

    516.6020.953.554.49

    1010.7214.212.303.04

    156.147.951.311.70

    202.444.060.520.87

    252.333.300.500.71

    302.663.240.570.69

    353.454.050.740.87

    403.363.460.720.74

    453.443.430.740.73

    503.163.230.680.69

    Change in -2LL

    MERLIN

    Mx

    cM

    Chi-square

    AE

    Sheet2

    MERLIN

    Mx

    cM

    Chi-square

    ACE

    Sheet3

  • Merlin Regress

  • AimTo develop a regression-based method thatHas same power as maximum likelihood variance components, for sib pair dataWill generalise to general pedigreesIs computationally efficient

  • Multivariate Regression ModelWeighted Least Squares EstimationWeight matrix based on IBD informationDependent variables = IBDIndependent variables = Trait

  • General approachStandard regression based methods model trait (D2, S2) in terms of estimated IBD statusY = + + Instead IBD estimate is regressed on trait value = + Y +

  • Extend to general pedigrees = + Y +

  • Dependent VariablesEstimated IBD sharing of all pairs of relativesExample:

  • Independent VariablesSquares and cross-products(equivalent to non-redundant squared sums and differences)Example

  • EstimationFor a family, regression model is

    Estimate Q by weighted least squares, and obtain sampling variance, family by familyCombine estimates across families, inversely weighted by their variance, to give overall estimate, and its sampling variance

  • Why is that better?Regression methods assume that the dependant variable (left hand side) is normally distributed

  • Distribution of pi-hat

  • Why is that better?But central limit theorem works well when data a symmetric with mode in the centreIn a general pedigree, sib-pairs provide the most information on linkageIBD under null hypothesis (with complete inheritance information)0 25%0.5 50%1 25%

  • Selected SamplesMerlin-regress is particularly suited to the analysis of selected samples

    Ordinary variance component analysis (e.g. using Merlin) gives biased QTL estimatesMerlin-regress is designed to be robust to data selection

  • Example Data BMI 10000 pairs

  • Selected Sample 500 pairs

  • Results VC

  • Results Merlin-Regress

  • Practical #4: running regress./merlin-regress -x -9999.000 -p linkage.ped -d linkage.dat -m linkage.map --mean ? --variance ? --heritability ? > linkage2.out merlin-regress --vc -x -9999.000

    -p linkage.ped -d linkage.dat -m linkage.map

    --mean ? --variance ? --heritability ?

    > linkage.outCalls up the programmeSpecifies VC linkage and the missing valueIdentify the .ped, .dat, and .map filesSend the output to a text fileSpecify the mean, variance, and heritability from the whole population (Pedstats)

  • *****