prevalence, phenotype and architecture of developmental ... · 2 abstract individuals with severe,...

39
1 Prevalence, phenotype and architecture of developmental disorders caused by de novo mutation The Deciphering Developmental Disorders Study Abbreviations PTV: Protein-Truncating Variant DNM: De Novo Mutation DD: Developmental Disorder DDD: Deciphering Developmental Disorders study Key Words De novo mutation; Developmental Disease; Seizures; Intellectual Disability; PhenIcons; Average Faces; ANKRD11; ARID1B; KMT2A; DDX3X; ADNP; MED13L; DYRK1A; EP300; SCN2A; SETD5; KCNQ2; MECP2; SYNGAP1; ASXL3; SATB2; TCF4; CDK13; CREBBP; DYNC1H1; FOXP1; PPP2R5D; PURA; CTNNB1; KAT6A; SMARCA2; STXBP1; EHMT1; ITPR1; KAT6B; NSD1; SMC1A; TBL1XR1; CASK; CHD2; CHD4; HDAC8; USP9X; WDR45; AHDC1; CSNK2A1; GNAI1; GNAO1; HNRNPU; KANSL1; KIF1A; MEF2C; PACS1; SLC6A1; CNOT3; CTCF; EEF1A2; FOXG1; GATAD2B; GRIN2B; IQSEC2; POGZ; PUF60; SCN8A; TCF20; BCL11A; BRAF; CDKL5; NFIX; PTPN11; AUTS2; CHAMP1; CNKSR2; DNM1; KCNH1; NAA10; PPM1D; ZBTB18; ZMYND11; ASXL1; COL4A3BP; KCNQ3; MSL3; MYT1L; PDHA1; PPP2R1A; SMAD4; TRIO; WAC; CHD8; GABRB3; KDM5B; PTEN; QRICH1; SET; ZC4H2; ALG13; SCN1A; SUV420H1; SLC35A2 . CC-BY-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . http://dx.doi.org/10.1101/049056 doi: bioRxiv preprint first posted online Apr. 20, 2016;

Upload: lamtuong

Post on 24-Nov-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

1

Prevalence,phenotypeandarchitectureofdevelopmentaldisorderscausedbydenovomutationTheDecipheringDevelopmentalDisordersStudy

AbbreviationsPTV:Protein-TruncatingVariantDNM:DeNovoMutationDD:DevelopmentalDisorderDDD:DecipheringDevelopmentalDisordersstudy

KeyWordsDenovomutation;DevelopmentalDisease;Seizures;IntellectualDisability;PhenIcons;AverageFaces;ANKRD11;ARID1B;KMT2A;DDX3X;ADNP;MED13L;DYRK1A;EP300;SCN2A;SETD5;KCNQ2;MECP2;SYNGAP1;ASXL3;SATB2;TCF4;CDK13;CREBBP;DYNC1H1;FOXP1;PPP2R5D;PURA;CTNNB1;KAT6A;SMARCA2;STXBP1;EHMT1;ITPR1;KAT6B;NSD1;SMC1A;TBL1XR1;CASK;CHD2;CHD4;HDAC8;USP9X;WDR45;AHDC1;CSNK2A1;GNAI1;GNAO1;HNRNPU;KANSL1;KIF1A;MEF2C;PACS1;SLC6A1;CNOT3;CTCF;EEF1A2;FOXG1;GATAD2B;GRIN2B;IQSEC2;POGZ;PUF60;SCN8A;TCF20;BCL11A;BRAF;CDKL5;NFIX;PTPN11;AUTS2;CHAMP1;CNKSR2;DNM1;KCNH1;NAA10;PPM1D;ZBTB18;ZMYND11;ASXL1;COL4A3BP;KCNQ3;MSL3;MYT1L;PDHA1;PPP2R1A;SMAD4;TRIO;WAC;CHD8;GABRB3;KDM5B;PTEN;QRICH1;SET;ZC4H2;ALG13;SCN1A;SUV420H1;SLC35A2

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

2

AbstractIndividualswithsevere,undiagnoseddevelopmentaldisorders(DDs)areenrichedfordamagingdenovomutations(DNMs)indevelopmentallyimportantgenes.Weexomesequenced4,293familieswithindividualswithDDs,andmeta-analysedthesedatawithpublisheddataon3,287individualswithsimilardisorders.Weshowthatthemostsignificantfactorsinfluencingthediagnosticyieldofdenovomutationsarethesexoftheaffectedindividual,therelatednessoftheirparentsandtheageofbothfatherandmother.Weidentified94genesenrichedfordamagingdenovomutationatgenome-widesignificance(P<7x10-7),including14genesforwhichcompellingdataforcausationwaspreviouslylacking.Wehavecharacterisedthephenotypicdiversityamongthesegeneticdisorders.Wedemonstratethat,atcurrentcostdifferentials,exomesequencinghasmuchgreaterpowerthangenomesequencingfornovelgenediscoveryingeneticallyheterogeneousdisorders.Weestimatethat42%ofourcohortcarrypathogenicDNMs(singlenucleotidevariantsandindels)incodingsequences,withapproximatelyhalfoperatingbyaloss-of-functionmechanism,andtheremainderresultinginaltered-function(e.g.activating,dominantnegative).Weestablishedthatmosthaploinsufficientdevelopmentaldisordershavealreadybeenidentified,butthatmanyaltered-functiondisordersremaintobediscovered.ExtrapolatingfromtheDDDcohorttothegeneralpopulation,weestimatethatdevelopmentaldisorderscausedbyDNMshaveanaveragebirthprevalenceof1in213to1in448(0.22-0.47%oflivebirths),dependingonparentalage.

MaintextApproximately2-5%ofchildrenarebornwithmajorcongenitalmalformationsand/ormanifestsevereneurodevelopmentaldisordersduringchildhood1,2.Whilediversemechanismscancausesuchdevelopmentaldisorders,includinggestationalinfectionandmaternalalcoholconsumption,damaginggeneticvariationindevelopmentallyimportantgeneshasamajorcontribution.SeveralrecentstudieshaveidentifiedasubstantialcausalroleforDNMsnotpresentineitherparent3-15.DespitetheidentificationofmanydevelopmentaldisorderscausedbyDNMs,itisgenerallyacceptedthatmanymoresuchdisordersawaitdiscovery15,andtheoverallcontributionofDNMstodevelopmentaldisordersisnotknown.Moreover,somepathogenicDNMscompletelyablatethefunctionoftheencodedprotein,whereasothersalterthefunctionoftheencodedprotein16;therelativecontributionsofthesetwomechanisticclassesisalsonotknown.Werecruited4,293individualstotheDecipheringDevelopmentalDisorders(DDD)study15.Eachoftheseindividualswasreferredwithsevereundiagnoseddevelopmentaldisordersandmostweretheonlyaffectedfamilymember.Wesystematicallyphenotypedtheseindividualsandsequencedtheexomesoftheseindividualsandtheirparents.Analysesof1,133ofthesetriosweredescribedpreviously15,17.Wegeneratedahighsensitivitysetof8,361candidateDNMsincodingorsplicingsequence(meanof1.95DNMsperproband),whileremovingsystematicerroneouscalls(SupplementaryTable1).1,624genescontainedtwoormoreDNMsinunrelatedindividuals.Twenty-threepercentofindividualshadlikelypathogenicprotein-truncatingormissenseDNMswithintheclinicallycuratedsetofgenesrobustlyassociatedwithdominantdevelopmentaldisorders17.Weinvestigatedfactorsassociatedwithwhetheranindividual

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

3

hadalikelypathogenicDNMinthesecuratedgenes(Figure1A,B).WeobservedthatmaleshadalowerchanceofcarryingalikelypathogenicDNM(P=1.8x10-4;OR0.75,0.65-0.8795%CI),ashasalsobeenobservedinautism18.WealsoobservedincreasedlikelihoodofhavingapathogenicDNMwiththeextentofspeechdelay(P=0.00123),butnototherindicatorsofseverityrelativetotherestofthecohort.Furthermore,thetotalgenomicextentofautozygosity(duetoparentalrelatedness)wasnegativelycorrelatedwiththelikelihoodofhavingapathogenicDNM(P=1.7x10-7),foreverylog10increaseinautozygouslength,theprobabilityofhavingapathogenicDNMdroppedby7.5%,likelyduetoincreasingburdenofrecessivecausation(Figure1C).Nonetheless,6%ofindividualswithautozygosityequivalenttoafirstcousinunionorgreaterhadaplausiblypathogenicDNM,underscoringtheimportanceofconsideringdenovocausationinallfamilies.PaternalagehasbeenshowntobetheprimaryfactorinfluencingthenumberofDNMsinachild19,20,andthusisexpectedtobeariskfactorforpathogenicDNMs.PaternalagewasonlyweaklyassociatedwithlikelihoodofhavingapathogenicDNM(P=0.016).However,focusingontheminorityofDNMsthatweretruncatingandmissensevariantsinknownDD-associatedgeneslimitsourpowertodetectsuchaneffect.Analysingall8,409highconfidenceexonicandintronicautosomalDNMsconfirmedastrongpaternalageeffect(P=1.4x10-10,1.53DNMs/year,1.07-2.0195%CI),aswellashighlightingaweaker,independent,maternalageeffect(P=0.0019,0.86DNMs/year,0.32-1.4095%CI,Figure1D,E),ashasrecentlybeendescribedinwholegenomeanalyses21.WeidentifiedgenessignificantlyenrichedfordamagingDNMsbycomparingtheobservedgene-wiseDNMcounttothatexpectedunderanullmutationmodel22,asdescribedpreviously15.Wecombinedthisanalysiswith4,224publishedDNMsin3,287affectedindividualsfromthirteenexomeorgenomesequencingstudies(SupplementaryTable2)3-14thatexhibitedasimilarexcessofDNMsinourcuratedsetofDD-associatedgenes(SupplementaryFigure1).Wefound93geneswithgenome-widesignificance(P<5×10-7,Figure2),80ofwhichhadpriorevidenceofDD-association(SupplementaryTable3).Wehavedevelopedvisualsummariesofthephenotypesassociatedwitheachgenetofacilitateclinicaluse.Inaddition,wecreatedanonymisedaveragefaceimagesfromindividualswithDNMsingenome-widesignificantgenes(Figure2).Theseimageshighlightfacialdysmorphologiesspecifictocertaingenes.ToassessanyincreaseinpowertodetectnovelDD-associatedgenes,weexcludedindividualswithlikelypathogenicvariantsinknownDD-associatedgenes15,leaving3,158probandsfromourcohort,alongwith2,955probandsfromthemeta-analysisstudies.Inthissubset,fourteengenesforwhichnostatistically-compellingpriorevidenceforDDcausationwasavailableachievedgenome-widesignificance:CDK13,CHD4,CNOT3,CSNK2A1,GNAI1,KCNQ3,MSL3,PPM1D,PUF60,QRICH1,SET,SUV420H1,TCF20,andZBTB18(P<5x10-7,Table1,SupplementaryFigure4).TheclinicalfeaturesassociatedwiththesenewlyconfirmeddisordersaresummarisedinFigure3,SupplementaryFigure2andSupplementaryFigure3.QRICH1wouldnotachievegenome-widesignificancewithoutexcludingindividualswithlikelypathogenicvariantsinDD-associatedgenes.InadditiontodiscoveringnovelDD-associatedgenes,weidentifiedseveralnewdisorderslinkedtoknownDD-associatedgenes,butwithdifferentmodesofinheritanceormolecularmechanisms.WefoundUSP9XandZC4H2hadagenome-widesignificantexcessofDNMsinfemaleprobands,indicatingthesegeneshaveX-linkeddominantmodesofinheritanceinadditiontopreviouslyreportedX-linkedrecessivemode

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

4

ofinheritanceinmales23,24.Inaddition,wefoundtruncatingmutationsinSMC1Awerestronglyassociatedwithanovelseizuredisorder(P=6.5x10-19),whilein-frame/missensemutationsinSMC1Awithdominantnegativeeffects25areaknowncauseofCorneliadeLangeSyndrome(CdLS).IndividualswithtruncatingmutationsinSMC1AlackedthecharacteristicfacialdysmorphologyofCdLS.Wethenexploredtwoapproachesforintegratingphenotypicdataintodiseasegeneassociation:statisticalassessmentofHumanPhenotypeOntology(HPO)termsimilaritybetweenindividualssharingcandidateDNMsinthesamegene(aswedescribedpreviously26)andphenotypicstratificationbasedonspecificclinicalcharacteristics.CombininggeneticevidenceandHPOtermsimilarityincreasedthesignificanceofsomeknownDD-associatedgenes.However,significancedecreasedforalargernumberofgenescausingsevereDDbutassociatedwithnondiscriminatoryHPOterms(SupplementaryFigure5A).Althoughwedidnotincorporatecategoricalphenotypicsimilarityinthegenediscoveryanalysesdescribedabove,thesystematicacquisitionofphenotypicdataonaffectedindividualswithinDDDenabledaggregaterepresentationstobecreatedforeachgeneachievinggenome-widesignificance.Wepresenttheseintheformoficon-basedsummariesofgrowthanddevelopmentalmilestones(PhenIcons),heatmapsoftherecurrentlycodedHPOtermsand,wheresufficientfaceimageswereavailable,ananonymisedaveragefacialrepresentation(SupplementaryFigure3).TwentypercentofindividualshadHPOtermswhichindicatedseizuresand/orepilepsy.Wecomparedanalysiswithinthisphenotypicallystratifiedgroupwithgene-wiseanalysesoftheentirecohort,toseeifitincreasedpowertodetectknownseizure-associatedgenes(SupplementaryFigure5B).Fifteenseizure-associatedgenesweregenome-widesignificantinboththeseizure-onlyandtheentire-cohortanalyses.Nineseizure-associatedgenesweregenome-widesignificantintheentirecohortbutnotintheseizuresubset.Ofthe285individualswithtruncatingormissenseDNMsinknownseizure-associatedgenes,56%ofindividualshadnocodedtermsrelatedtoseizures/epilepsy.Thesefindingssuggestthatthepowerofincreasedsamplesizefaroutweighsspecificphenotypicexpressivityduetothesharedgeneticetiologybetweenindividualswithandwithoutepilepsyinourcohort.Thelargenumberofgenome-widesignificantgenesidentifiedintheanalysesaboveallowsustocompareempiricallydifferentexperimentalstrategiesfornovelgenediscoveryinageneticallyheterogeneouscohort.Wecomparedthepowerofexomeandgenomesequencingtodetectgenome-widesignificantgenes,assumingthatbudgetandnotsamplesarelimiting,underdifferentscenariosofcostratiosandsensitivityratios(Figure4).Atcurrentcostratios(exomecosts30-40%ofagenome)andwithaplausiblesensitivitydifferential(genomedetects5%moreexonicvariantsthanexome27)exomesequencingdetectsmorethantwiceasmanygenome-widesignificantgenes.Theseempiricalestimateswereconsistentwithpowersimulationsforidentifyingdominantloss-of-functiongenes(SupplementaryFigure6).Insummary,whilegenomesequencinggivesgreatestsensitivitytodetectpathogenicvariationinasingleindividual(oroutsideofthecodingregion),exomesequencingismorepowerfulfornoveldiseasegenediscovery(and,analogously,likelydeliverslowercostperdiagnosis).

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

5

Ourprevioussimulationssuggestedthatanalysisofacohortof4,293DDDfamiliesoughttobeabletodetectapproximatelyhalfofallhaploinsufficientDD-associatedgenesatgenome-widesignificance15.Empirically,wehaveidentified47%(50/107)ofhaploinsufficientgenespreviouslyrobustlyassociatedwithneurodevelopmentaldisorders17.Wehypothesisedthatgenetictestingpriortorecruitmentintoourstudymayhavedepletedthecohortofthemostclinicallyrecognisabledisorders.Indeed,weobservedthatthegenesassociatedwiththemostclinicallyrecognisabledisorderswereassociatedwithasignificant,three-foldlowerenrichmentoftruncatingDNMsthanotherDD-associatedgenes(~40-foldenrichmentvs~120-foldenrichment,Figure5A).Removingthesemostrecognisabledisordersfromtheanalysis,weidentified55%(42/76)oftheremaininghaploinsufficientDD-associatedgenes.TheknownDD-associatedhaploinsufficientgenesthatdidnotreachgenome-widesignificancewereclearlyenrichedforthosewithlowermutability,whichwewouldexpecttolowerpowertodetectinouranalyses.WeidentifiedDD-associatedgenes(e.g.NRXN2)withhighmutability,lowclinicalrecognisabilityandyetnosignalofenrichmentforDNMsinourcohort(SupplementaryFigure7).Ouranalysescallintoquestionwhetherthesegenesreallyareassociatedwithhaploinsufficientneurodevelopmentaldisordersandhighlightsthepotentialforwell-poweredgenediscoveryanalysestorefutepriorcredenceregardingdiseasegeneassociations.WeestimatedthelikelyprevalenceofpathogenicmissenseandtruncatingDNMswithinourcohortbyincreasingthestringencyofcalledDNMsuntiltheobservedsynonymousDNMsequatedthatexpectedunderthenullmutationmodel(SupplementaryFigure8A),thenquantifyingtheexcessofobservedmissenseandtruncatingDNMsacrossallgenes(Figure5B).Weobservedanexcessof576truncatingand1,220missensemutations,suggesting41.8%(1,796/4,293)ofthecohorthasapathogenicDNM.ThisestimateofthenumberofexcessmissenseandtruncatingDNMsinourcohortisrobusttovaryingthestringencyofDNMcalling(SupplementaryFigure8B).ThevastmajorityofsynonymousDNMsarelikelytobebenign,asevidencedbythembeingdistributeduniformly(Figure5C)amonggenesirrespectiveoftheirtoleranceoftruncatingvariationinthegeneralpopulation(asquantifiedbytheprobabilityofbeingLoF-intolerant(pLI)metric28).Bycontrast,missenseandtruncatingDNMsaresignificantlyenrichedingeneswiththehighestprobabilitiesofbeingintolerantoftruncatingvariation(Figure5D).Only51%(923/1,796)oftheseexcessmissenseandtruncatingDNMsarelocatedinDD-associateddominantgenes,withtheremainderlikelytoaffectgenesnotyetassociatedwithDDs.AmuchhigherproportionoftheexcesstruncatingDNMs(71%)thanmissenseDNMs(42%)affectedknownDD-associatedgenes.ThissuggeststhatwhereasmosthaploinsufficientDD-associatedgeneshavealreadybeenidentified,manyDD-associatedgenescharacterisedbypathogenicmissenseDNMsremaintobediscovered.Understandingthemechanismofactionofamonogenicdisorderisanimportantprerequisitefordesigningtherapeuticstrategies29.Wesoughttoestimatetherelativeproportionofaltered-functionandloss-of-functionmechanismsamongtheexcessDNMsinourcohort,byassumingthatthevastmajorityoftruncatingmutationsoperatebyaloss-of-functionmechanismandusingtwoindependentapproachestoestimatetherelativecontributionofthetwomechanismsamongtheexcessmissenseDNMs(Methods).First,weusedtheobservedratiooftruncatingandmissenseDNMswithinhaploinsufficientDD-associatedgenestoestimatetheproportionoftheexcessmissenseDNMsthatlikelyactby

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

6

loss-of-function(Figure5C).Thisapproachestimatedthat47%(42-51%95%CI)ofexcessmissenseandtruncatingDNMsoperatebyloss-of-function,and53%byaltered-function.Second,wetookadvantageofthedifferentpopulationgeneticcharacteristicsofknownaltered-functionandloss-of-functionDD-associatedgenes.Specifically,weobservedthatthesetwoclassesofDD-associatedgenesaredifferentiallydepletedoftruncatingvariationinindividualswithoutovertdevelopmentaldisorders(pLImetric28).WemodelledtheobservedpLIdistributionofexcessmissenseDNMsasamixtureofthepLIdistributionsofknownaltered-functionandloss-of-functionDD-associatedgenes(Figure5E,F),andestimatedthat63%(50-76%95%CI)ofexcessmissenseDNMslikelyactbyaltered-functionmechanisms.IncorporatingthetruncatingDNMsoperatingbyaloss-of-functionmechanism,thisapproachestimatedthat57%(48-66%95%CI)ofexcessmissenseandtruncatingDNMsoperatebyloss-of-functionand43%byaltered-function.WeestimatedthebirthprevalenceofmonoallelicdevelopmentaldisordersbyusingthegermlinemutationmodeltocalculatetheexpectedcumulativegermlinemutationrateoftruncatingDNMsinhaploinsufficientDD-associatedgenesandscalingthisupwardsbasedonthecompositionofexcessDNMsintheDDDcohortdescribedabove(seeMethods),correctingfordisordersthatareunder-representedinourcohortasaresultofpriorgenetictesting(e.g.clinically-recognisabledisordersandlargepathogenicCNVsidentifiedbypriorchromosomalmicroarrayanalysis).Thisgivesameanprevalenceestimateof0.34%(0.31-0.3795%CI),or1in295births.Byfactoringinthepaternalandmaternalageeffectsonthemutationrate(Figure1)wemodelledage-specificestimatesofbirthprevalence(Figure6)thatrangefrom1in448(bothmotherandfatheraged20)to1in213(bothmotherandfatheraged45).Insummary,wehaveshownthatdenovomutationsaccountforapproximatelyhalfofthegeneticarchitectureofseveredevelopmentaldisorders,andaresplitroughlyequallybetweenloss-of-functionandaltered-function.WhereasmosthaploinsufficientDD-associatedgeneshavealreadybeenidentified,currentlymanyactivatinganddominantnegativeDD-associatedgeneshaveeludeddiscovery.Thiselusivenesslikelyresultsfromthesedisordersbeingindividuallyrarer,beingcausedbyarelativelysmallnumberofmissensemutationswithineachgene.Discoveryoftheremainingdominantdevelopmentaldisordersrequireslargerstudiesandnovel,morepowerful,analyticalstrategiesfordisease-geneassociationthatleveragegene-specificpatternsofpopulationvariation,specificallytheobserveddepletionofdamagingvariation.TheintegrationofaccurateandcompletequantitativeandcategoricalphenotypicdataintotheanalysiswillimprovethepowertoidentifyultrarareDDwithdistinctiveclinicalpresentations.Wehaveestimatedthemeanbirthprevalenceofdominantmonogenicdevelopmentaldisorderstobearound1in295,whichisgreaterthanthecombinedimpactoftrisomies13,18and2130andhighlightsthecumulativepopulationmorbidityandmortalityimposedbytheseindividuallyraredisorders.

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

7

MethodsFamilyrecruitmentAt24clinicalgeneticscenterswithintheUnitedKingdom(UK)NationalHealthServiceandtheRepublicofIreland,4,293patientswithsevere,undiagnoseddevelopmentaldisordersandtheirparents(4,125families)wererecruitedandsystematicallyphenotyped.ThestudyhasUKResearchEthicsCommitteeapproval(10/H0305/83,grantedbytheCambridgeSouthResearchEthicsCommitteeandGEN/284/12,grantedbytheRepublicofIrelandResearchEthicsCommittee).Familiesgaveinformedconsentforparticipation.Clinicaldata(growthmeasurements,familyhistory,developmentalmilestones,etc.)werecollectedusingastandardrestricted-termquestionnairewithinDECIPHER31,anddetaileddevelopmentalphenotypesfortheindividualswereenteredusingHumanPhenotypeOntology(HPO)terms32.Salivasamplesforthewholefamilyandblood-extractedDNAsamplesfortheprobandswerecollected,processedandqualitycontrolledaspreviouslydescribed15.ExomesequencingGenomicDNA(approximately1μg)wasfragmentedtoanaveragesizeof150base-pairs(bp)andsubjectedtoDNAlibrarycreationusingestablishedIlluminapaired-endprotocols.Adaptor-ligatedlibrarieswereamplifiedandindexedviapolymerasechainreaction(PCR).Aportionofeachlibrarywasusedtocreateanequimolarpoolcomprisingeightindexedlibraries.EachpoolwashybridizedtoSureSelectribonucleicacid(RNA)baits(AgilentHumanAll-ExonV3PluswithcustomELIDC0338371andAgilentHumanAll-ExonV5PluswithcustomELIDC0338371)andsequencetargetswerecapturedandamplifiedinaccordancewiththemanufacturer'srecommendations.Enrichedlibrariesweresubjectedto75-basepaired-endsequencing(IlluminaHiSeq)followingthemanufacturer'sinstructions.

Alignmentandcallingsinglenucleotidevariants,insertionsanddeletionsMappingofshort-readsequencesforeachsequencinglaneletwascarriedoutusingtheBurrows-WheelerAligner(BWA;version0.59)33backtrackalgorithmwiththeGRCh371000GenomesProjectphase2reference(alsoknownashs37d5).Sample-levelBAMimprovementwascarriedoutusingtheGenomeAnalysisToolkit(GATK;version3.1.1)34andSAMtools(version0.1.19)35.Thisconsistedofarealignmentofreadsaroundknownanddiscoveredindelsfollowedbybasequalityscorerecalibration(BQSR),withbothstepsperformedusingGATK.Lastly,SAMtoolscalmdwasappliedandindexeswerecreated.KnownindelsforrealignmentweretakenfromtheMillsDevineand1000GenomesProjectGoldsetandthe1000GenomesProjectphaselow-coverageset,bothpartoftheGATKresourcebundle(version2.2).KnownvariantsforBQSRweretakenfromdbSNP137,alsopartoftheGATKresourcebundle.Finally,singlenucleotidevariants(SNVs)andindelswerecalledusingtheGATKHaplotypeCaller(version3.2.2);thiswasruninmultisamplecallingmodeusingthecompletedataset.GATKVariantQualityScoreRecalibration(VQSR)wasthencomputedonthewholedatasetandappliedtotheindividual-samplevariantcallingformat(VCF)files.DeNovoGear(version0.54)36wasusedtodetectSNV,insertionanddeletiondenovomutations(DNMs)fromchildandparentalexomedata(BAMfiles).

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

8

VariantannotationVariantsintheVCFwereannotatedwithminorallelefrequency(MAF)datafromavarietyofdifferentsources.TheMAFannotationsusedincludeddatafromfourdifferentpopulationsofthe1000GenomesProject37(AMR,ASN,AFRandEUR),theUK10Kcohort,theNHLBIGOExomeSequencingProject(ESP),theNon-FinnishEuropean(NFE)subsetoftheExomeAggregationConsortium(ExAC)andaninternalallelefrequencygeneratedusingunaffectedparentsfromthecohort.VariantsintheVCFwereannotatedwithEnsemblVariantEffectPredictor(VEP)38basedonEnsemblgenebuild76.ThetranscriptwiththemostsevereconsequencewasselectedandallassociatedVEPannotationswerebasedonthepredictedeffectofthevariantonthatparticulartranscript;wheremultipletranscriptssharedthesamemostsevereconsequence,thecanonicalorlongestwasselected.Weincludedanadditionalconsequenceforvariantsatthelastbaseofanexonbeforeanintron,wherethefinalbaseisaguanine,sincethesevariantsappeartobeasdamagingasasplicedonorvariant26.WecategorizedvariantsintothreeclassesbyVEPconsequence:

1. protein-truncatingvariants(PTV):splicedonor,spliceacceptor,stopgained,frameshift,initiatorcodon,andconservedexonterminusvariant.

2. missensevariants:missense,stoplost,inframedeletion,inframeinsertion,codingsequence,andproteinalteringvariant.

3. silentvariants:synonymous.DenovomutationfilteringWefilteredcandidateDNMcallstoreducethefalsepositiveratebutmaximizesensitivity,basedonpriorresultsfromexperimentalvalidationbycapillarysequencingofcandidateDNMs15.CandidateDNMswereexcludedifnotcalledbyGATKinthechild,orcalledineitherparent,oriftheyhadamaximumMAFgreaterthan0.01.CandidateDNMswereexcludedwhentheforwardandreversecoveragedifferedbetweenreferenceandalternativealleles,definedasP<10-3fromaFisher’sexacttestofcoveragefromorientationbyallelesummedacrossthechildandparents.CandidateDNMswerealsoexcludediftheymettwoofthethreefollowingthreecriteria:1)anexcessofparentalalternativealleleswithinthecohortattheDNMsposition,definedasP<10-3underaone-sidedbinomialtestgivenanexpectederrorrateof0.002andthecumulativeparentaldepth;2)anexcessofalternativealleleswithinthecohortinDNMsinagene,definedasP<10-3underaone-sidedbinomialtestgivenanexpectederrorrateof0.002andthecumulativedepth,or3)bothparentshadoneormorereadssupportingthealternativeallele.If,afterfiltering,morethanonevariantwasobservedinagivengeneforaparticulartrio,onlythevariantwiththehighestpredictedfunctionalimpactwaskept(proteintruncating>missense>silent).SourcecodeforfilteringcandidateDNMscanbefoundhere:https://github.com/jeremymcrae/denovoFilter

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

9

DenovomutationvalidationForcandidateDNMsofinterest,primersweredesignedtoamplify150-250bpproductscenteredaroundthesiteofinterest.Defaultprimer3designsettingswereusedwiththefollowingadjustments:GCclamp=1,humanmispriminglibraryused.Site-specificprimersweretailedwithIlluminaadaptersequences.PCRproductsweregeneratedwithJumpStartAccuTaqLADNApolymerase(SigmaAldrich),using40nggenomicDNAastemplate.AmpliconsweretaggedwithIlluminaPCRprimersalongwithuniquebarcodesenablingmultiplexingof96samples.BarcodeswereincorporatedusingKapaHiFimastermix(KapaBiosystems).SampleswerepooledandsequenceddownonelaneoftheIlluminaMiSeq,using250bppairedendreads.Anin-houseanalysispipelineextractedthereadcountpersiteandclassifiedinheritancestatuspervariantusingamaximumlikelihoodapproach.IndividualswithlikelypathogenicvariantsWepreviouslyscreened1,133individualsforvariantsthatcontributetotheirdisorder15,17.Allcandidatevariantsinthe1,133individualswerereviewedbyconsultantclinicalgeneticistsforrelevancetotheindividuals’phenotypes.Mostdiagnosablepathogenicvariantsoccurreddenovoindominantgenes,butasmallproportionalsooccurredinrecessivegenesorunderotherinheritancemodes.DNMswithindominantDD-associatedgeneswereverylikelytobeclassifiedasthepathogenicvariantfortheindividuals’disorder.Duetothetimerequiredtoreviewindividualsandtheircandidatevariants,wedidnotconductasimilarreviewintheremainderofthe4,293individuals.InsteadwedefinedlikelypathogenicvariantsascandidateDNMsfoundinautosomalandX-linkeddominantDD-associatedgenes,orcandidateDNMsfoundinhemizygousDD-associatedgenesinmales.1,136individualsinthe4,293cohorthadvariantseitherpreviouslyclassifiedaspathogenic15,17,orhadalikelypathogenicDNM.Gene-wiseassessmentofDNMsignificanceGene-specificgermlinemutationratesfordifferentfunctionalclasseswerecomputed15,22forthelongesttranscriptintheunionoftranscriptsoverlappingtheobservedDNMsinthatgene.Weevaluatedthegene-specificenrichmentofPTVandmissenseDNMsbycomputingitsstatisticalsignificanceunderanullhypothesisoftheexpectednumberofDNMsgiventhegene-specificmutationrateandthenumberofconsideredchromosomes22.WealsoassessedclusteringofmissenseDNMswithingenes15,asexpectedforDNMsoperatingbyactivatingordominantnegativemechanisms.WedidthisbycalculatingsimulateddispersionsoftheobservednumberofDNMswithinthegene.TheprobabilityofsimulatingaDNMataspecificcodonwasweightedbythetrinucleotidesequence-context15,22.Thisallowedustoestimatetheprobabilityoftheobserveddegreeofclusteringgiventhenullmodelofrandommutations.Fisher’smethodwasusedtocombinethesignificancetestingofmissense+PTVDNMenrichmentandmissenseDNMclustering.WedefinedageneassignificantlyenrichedforDNMsifthePTVenrichmentP-valueorthecombinedmissenseP-valuelessthan7×10-7,whichrepresentsaBonferonnicorrectedP-valueof0.05adjustedfor4×18500tests(2×consequenceclassestested×proteincodinggenes).

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

10

CompositefacegenerationFamiliesweregiventheoptiontohavephotographsoftheaffectedindividual(s)uploadedwithinDECIPHER31.UsingimagesofindividualswithDNMsinthesamegenewegeneratedde-identifiedrealisticaveragefaces(compositefaces).Facesweredetectedusingadiscriminatelytraineddeformablepartmodeldetector39.Theannotationalgorithmidentifiedasetof36landmarksperdetectedface40andwastrainedonamanuallyannotateddatasetof3100images41.TheaveragefacemeshwascreatedbytheDelaunaytriangulationoftheaverageconstellationoffaciallandmarksforallpatientswithasharedgeneticdisorder.Theaveragingalgorithmissensitivetoleft-rightfacialasymmetriesacrossmultiplepatients.Forthispurpose,weuseatemplateconstellationoflandmarksbasedontheaverageconstellationsof2000healthyindividuals41.Foreachpatient,wealigntheconstellationoflandmarkstothetemplatewithrespecttothepointsalongthemiddleofthefaceandcomputetheEuclideandistancesbetweeneachlandmarkanditscorrespondingpaironthetemplate.Thefacesaremirroredsuchthatthehalfofthefacewiththegreaterdifferenceisalwaysonthesameside.Thedatasetusedforthisworkmaycontainmultiplephotosforonepatient.Toavoidbiasingtheaveragefacemeshtowardstheseindividuals,wecomputedanaveragefaceforeachpatientandusethesepersonalaveragestocomputethefinalaverageface.Finally,toavoidanyimageinthecompositedominatingfromvarianceinilluminationbetweenimages,wenormalisedtheintensitiesofpixelvalueswithinthefacetoanaveragevalueacrossallfacesineachaverage.Thecompositefaceswereexaminedmanuallytoconfirmsuccessfulablationofanyindividuallyidentifiablefeatures.AssessingpowerofincorporatingphenotypicinformationWepreviouslydescribedamethodtoassessphenotypicsimilaritybyHPOtermsamonggroupsofindividualssharinggeneticdefectsinthesamegene26.Weexaminedwhetherincorporatingthisstatisticaltestimprovedourabilitytoidentifydominantgenesatgenome-widesignificance.Pergene,wetestedthephenotypicsimilarityofindividualswithDNMsinthegene.WecombinedthephenotypicsimilarityP-valuewiththegenotypicP-valuepergene(theminimumP-valuefromtheDDD-onlyandmeta-analysis)usingFisher’smethod.WeexaminedthedistributionofdifferencesinP-valuebetweentestswithoutthephenotypicsimilarityP-valueandteststhatincorporatedthephenotypicsimilarityP-value.Many(854,20%)oftheDDDcohortexperienceseizures.Weinvestigatedwhethertestingwithinthesubsetofindividualswithseizuresimprovedourabilitytofindassociationsforseizurespecificgenes.Alistof102seizure-associatedgeneswascuratedfromthreesources,agenepanelforOhtaharasyndrome,acurrentlyusedclinicalgenepanelforepilepsyandapanelderivedfromDD-associatedgenes17.TheP-valuesfromtheseizuresubsetwerecomparedtoP-valuesfromthecompletecohort.AssessingpowerofexomevsgenomesequencingWecomparedtheexpectedpowerofexomesequencingversusgenomesequencingtoidentifydiseasegenes.WithintheDDDcohort,55dominantDD-associatedgenesachievegenome-widesignificancewhentestingforenrichmentofDNMswithingenes.Wedidnot

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

11

incorporatemissenseDNMclusteringduetothelargecomputationalrequirementsforassessingclusteringinmanyreplicates.Weassumedacostof1,000USDperindividualforgenomesequencing.Weallowedthecostofexomesequencingtovaryrelativetogenomesequencing,from10-100%.Wecalculatedthenumberoftriosthatcouldbesequencedunderthesescenarios.EstimatesoftheimprovedpowerofgenomesequencingtodetectDNMsinthecodingsequencearearound1.05-fold27andweincreasedthenumberoftriosby1.0–1.2-foldtoallowthis.Wesampledasmanyindividualsfromourcohortasthenumberoftriosandcountedwhichofthe55DD-associatedgenesstillachievedgenome-widesignificanceforDNMenrichment.Weran1000simulationsofeachconditionandobtainedthemeannumberofgenome-widesignificantgenesforeachcondition.AssociationswithpresenceoflikelypathogenicdenovomutationsWetestedwhetherphenotypeswereassociatedwiththelikelihoodofhavingalikelypathogenicDNM.Categoricalphenotypes(e.g.sexcodedasmaleorfemale)weretestedbyFisher’sexacttestwhilequantitativephenotypes(e.g.durationofgestationcodedinweeks)weretestedwithlogisticregression,usingsexasacovariate.WeinvestigatedwhetherhavingautozygousregionsaffectedthelikelihoodofhavingadiagnosticDNM.Autozygousregionsweredeterminedfromgenotypesineveryindividual,toobtainthetotallengthperindividual.WefittedalogisticregressionforthetotallengthofautozygousregionsonwhetherindividualshadalikelypathogenicDNM.ToillustratetherelationshipbetweenlengthofautozygosityandtheoccurrenceofalikelypathogenicDNM,wegroupedtheindividualsbylengthandplottedtheproportionofindividualsineachgroupwithaDNMagainstthemedianlengthofthegroup.TheeffectsofparentalageonthenumberofDNMswereassessedusing8,409highconfidence(posteriorprobabilityofDNM>0.5)unphasedcodingandnoncodingDNMsin4,293individuals.APoissonmultipleregressionwasfitonthenumberofDNMsineachindividualwithbothmaternalandpaternalageatthechild’sbirthascovariates.Themodelwasfitwiththeidentitylinkandallowedforoverdispersion.Thismodelusedexome-basedDNMs,andtheanalysiswasscaledtothewholegenomebymultiplyingthecoefficientsbyafactorof50,basedon~2%ofthegenomebeingwellcoveredinourdata(exons+introns).ExcessofdenovomutationsbyconsequenceWeidentifiedthethresholdforposteriorprobabilityofDNMatwhichthenumberofobservedcandidatesynonymousDNMsequalledthenumberofexpectedsynonymousDNMs.CandidateDNMswithscoresbelowthisthresholdwereexcluded.WealsoexaminedthelikelysensitivityandspecificityofthisthresholdbasedonvalidationresultsforDNMswithinapreviouspublication15inwhichcomprehensiveexperimentalvalidationwasperformedon1,133triosthatcompriseasubsetofthefamiliesanalysedhere.ThenumbersofexpectedDNMspergenewerecalculatedperconsequencefromexpectedmutationratespergeneandthe2,407maleand1,886femalesinthecohort.Wecalculated

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

12

theexcessofDNMsformissenseandPTVsastheratioofnumbersofobservedDNMsversusexpectedDNMs,aswellasthedifferenceofobservedDNMsminusexpectedDNMs.AscertainmentbiaswithindominantneurodevelopmentalgenesWeidentified150autosomaldominanthaploinsufficientgenesthataffectedneurodevelopmentwithinourcurateddevelopmentaldisordergeneset.Genesaffectingneurodevelopmentwereidentifiedwheretheaffectedorgansincludedthebrain,orwhereHPOphenotypeslinkedtodefectsinthegeneincludedeitheranabnormalityofbrainmorphology(HP:0012443)orcognitiveimpairment(HP:0100543)term.The150geneswereclassifiedforeaseofclinicalrecognitionofthesyndromefromgenedefectsbytwoconsultantclinicalgeneticists.Geneswereratedfrom1(leastrecognisable)to5(mostrecognisable).Categories1and2contained5and22genesrespectively,andsowerecombinedinlateranalyses.Theremainingcategorieshadmorethan33genespercategory.Theratioofobservedloss-of-functionDNMstoexpectedloss-of-functionDNMswascalculatedforeachrecognisabilitycategory,alongwith95%confidenceintervalsfromaPoissondistributiongivenobservedcounts.Proportionofdenovomutationswithloss-of-functionmechanismTheobservedexcessofmissense/inframeindelDNMsiscomposedofamixtureofDNMswithloss-of-functionmechanismsandDNMswithaltered-functionmechanisms.WefoundthattheexcessofPTVDNMswithindominanthaploinsufficientDD-associatedgeneshadagreaterskewtowardsgeneswithhighintoleranceforloss-of-functionvariantsthantheexcessofmissenseDNMsindominantnon-haploinsufficientgenes.Webinnedgenesbytheprobabilityofbeingloss-of-functionintolerant28constraintdecileandcalculatedtheobservedexcessofmissenseDNMsineachbin.Wemodelledthisbinneddistributionasatwo-componentmixturewiththecomponentsrepresentingDNMswithaloss-of-functionorfunction-alteringmechanism.Weidentifiedtheoptimalmixingproportionfortheloss-of-functionandaltered-functionDNMsfromthelowestgoodness-of-fit(fromasplinefittedtothesum-of-squaresofthedifferencesperdecile)tomissense/inframeindelsinallgenesacrossarangeofmixtures.TheexcessofDNMswithaloss-of-functionmechanismwascalculatedastheexcessofDNMswithaVEPloss-of-functionconsequence,plustheproportionoftheexcessofmissenseDNMsattheoptimalmixingproportion.Weindependentlyestimatedtheproportionsofloss-of-functionandaltered-function.WecountedPTVandmissense/inframeindelDNMswithindominanthaploinsufficientgenestoestimatetheproportionofexcessDNMswithaloss-of-functionmechanism,butwhichwereclassifiedasmissense/inframeindel.WeestimatedtheproportionofexcessDNMswithaloss-of-functionmechanismasthePTVexcessplusthePTVexcessmultipliedbytheproportionofloss-of-functionclassifiedasmissense.PrevalenceofdevelopmentaldisordersfromdominantdenovomutationsWeestimatedthebirthprevalenceofmonoallelicdevelopmentaldisordersbyusingthegermlinemutationmodel.Wecalculatedtheexpectedcumulativegermlinemutationrate

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

13

oftruncatingDNMsin238haploinsufficientDD-associatedgenes.WescaledthisupwardsbasedonthecompositionofexcessDNMsintheDDDcohortusingtheratioofexcessDNMs(n=1816)toDNMswithindominanthaploinsufficientDD-associatedgenes(n=412).Around10%ofDDsarecausedbydenovoCNVs42,43,whichareunderrepresentedinourcohortasaresultofpriorgenetictesting.Ifincluded,theexcessDNMinourcohortwouldincreaseby21%,thereforewescaledtheprevalenceestimateupwardsbythisfactor.Mothersaged29.9andfathersaged29.5havechildrenwith77DNMspergenomeonaverage20.WecalculatedthemeannumberofDNMsexpectedunderdifferentcombinationsofparentalages,givenourestimatesoftheextraDNMsperyearfromoldermothersandfathers.Wescaledtheprevalencetodifferentcombinationsofparentalagesusingtheratioofexpectedmutationsatagivenagecombinationtothenumberexpectedatthemeancohortparentalages.

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

14

References1. Sheridan,E.etal.Riskfactorsforcongenitalanomalyinamultiethnicbirthcohort:

ananalysisoftheBorninBradfordstudy.Lancet382,1350-9(2013).2. Ropers,H.H.Geneticsofearlyonsetcognitiveimpairment.AnnuRevGenomicsHum

Genet11,161-87(2010).3. DeLigt,J.etal.Diagnosticexomesequencinginpersonswithsevereintellectual

disability.TheNewEnglandJournalofMedicine367,1921-9(2012).4. DeRubeis,S.etal.Synaptic,transcriptionalandchromatingenesdisruptedin

autism.Nature515,209-215(2014).5. Epi4KConsortium&EpilepsyPhenome/GenomeProject.Denovomutationsin

epilepticencephalopathies.Nature501,217-21(2013).6. EuroEPINOMICS-RESConsortium,EpilepsyPhenome/GenomeProject&Epi4K

Consortium.DenovomutationsinsynaptictransmissiongenesincludingDNM1causeepilepticencephalopathies.AmJHumGenet95,360-70(2014).

7. Fromer,M.etal.Denovomutationsinschizophreniaimplicatesynapticnetworks.Nature506,179-184(2014).

8. Gilissen,C.etal.Genomesequencingidentifiesmajorcausesofsevereintellectualdisability.Nature511,344-7(2014).

9. Iossifov,I.etal.Thecontributionofdenovocodingmutationstoautismspectrumdisorder.Nature515,216-221(2014).

10. Iossifov,I.etal.DeNovoGeneDisruptionsinChildrenontheAutisticSpectrum.Neuron74,285-299(2012).

11. O’Roak,B.J.etal.Sporadicautismexomesrevealahighlyinterconnectedproteinnetworkofdenovomutations.Nature485,1-7(2012).

12. Rauch,A.etal.Rangeofgeneticmutationsassociatedwithseverenon-syndromicsporadicintellectualdisability:anexomesequencingstudy.Lancet380,1674-82(2012).

13. Sanders,S.J.etal.Denovomutationsrevealedbywhole-exomesequencingarestronglyassociatedwithautism.Nature485,237-41(2012).

14. Zaidi,S.etal.Denovomutationsinhistone-modifyinggenesincongenitalheartdisease.Nature498,220-3(2013).

15. TheDecipheringDevelopmentalDisordersStudy.Large-scalediscoveryofnovelgeneticcausesofdevelopmentaldisorders.Nature519,223-228(2015).

16. Wilkie,A.O.Themolecularbasisofgeneticdominance.JMedGenet31,89-98(1994).

17. Wright,C.F.etal.GeneticdiagnosisofdevelopmentaldisordersintheDDDstudy:ascalableanalysisofgenome-wideresearchdata.TheLancet(2014).

18. Jacquemont,S.etal.Ahighermutationalburdeninfemalessupportsa"femaleprotectivemodel"inneurodevelopmentaldisorders.AmJHumGenet94,415-25(2014).

19. Kong,A.etal.Rateofdenovomutationsandtheimportanceoffather'sagetodiseaserisk.Nature488,471-5(2012).

20. Rahbari,R.etal.Timing,ratesandspectraofhumangermlinemutation.NatGenet48,126-33(2016).

21. Wong,W.S.etal.Newobservationsonmaternalageeffectongermlinedenovomutations.NatCommun7,10486(2016).

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

15

22. Samocha,K.E.etal.Aframeworkfortheinterpretationofdenovovariationinhumandisease.NatureGenetics46,944-950(2014).

23. Hirata,H.etal.ZC4H2mutationsareassociatedwitharthrogryposismultiplexcongenitaandintellectualdisabilitythroughimpairmentofcentralandperipheralsynapticplasticity.AmJHumGenet92,681-95(2013).

24. Homan,C.C.etal.MutationsinUSP9XareassociatedwithX-linkedintellectualdisabilityanddisruptneuronalcellmigrationandgrowth.AmJHumGenet94,470-8(2014).

25. Liu,J.etal.SMC1AexpressionandmechanismofpathogenicityinprobandswithX-LinkedCorneliadeLangesyndrome.HumMutat30,1535-42(2009).

26. Akawi,N.etal.Discoveryoffourrecessivedevelopmentaldisordersusingprobabilisticgenotypeandphenotypematchingamong4,125families.NatureGenetics47,1363-1369(2015).

27. Meynert,A.M.,Ansari,M.,FitzPatrick,D.R.&Taylor,M.S.Variantdetectionsensitivityandbiasesinwholegenomeandexomesequencing.BMCBioinformatics15,247(2014).

28. Lek,M.etal.Analysisofprotein-codinggeneticvariationin60,706humans.bioRxivX,XX-XX(2015).

29. Boycott,K.M.,Vanstone,M.R.,Bulman,D.E.&Mackenzie,A.E.Rare-diseasegeneticsintheeraofnext-generationsequencing:discoverytotranslation.NatureReviewsGenetics14,681-91(2013).

30. Springett,A.etal.CongenitalAnomalyStatistics2011:EnglandandWales.(2013).31. Bragin,E.etal.DECIPHER:databasefortheinterpretationofphenotype-linked

plausiblypathogenicsequenceandcopy-numbervariation.NucleicAcidsRes42,D993-D1000(2014).

32. Köhler,S.etal.Clinicaldiagnosticsinhumangeneticswithsemanticsimilaritysearchesinontologies.AmericanJournalofHumanGenetics85,457-464(2009).

33. Li,H.&Durbin,R.FastandaccurateshortreadalignmentwithBurrows-Wheelertransform.Bioinformatics25,1754-1760(2009).

34. McKenna,A.etal.TheGenomeAnalysisToolkit:aMapReduceframeworkforanalyzingnext-generationDNAsequencingdata.GenomeRes20,1297-303(2010).

35. Li,H.etal.TheSequenceAlignment/MapformatandSAMtools.Bioinformatics25,2078-2079(2009).

36. Ramu,A.etal.DeNovoGear:denovoindelandpointmutationdiscoveryandphasing.NatureMethods10,985-7(2013).

37. Abecasis,G.R.etal.Anintegratedmapofgeneticvariationfrom1,092humangenomes.Nature491,56-65(2012).

38. McLaren,W.etal.DerivingtheconsequencesofgenomicvariantswiththeEnsemblAPIandSNPEffectPredictor.Bioinformatics26,2069-70(2010).

39. Felzenszwalb,P.F.,Girshick,R.B.,McAllester,D.&Ramanan,D.Objectdetectionwithdiscriminativelytrainedpart-basedmodels.IEEEtransactionsonpatternanalysisandmachineintelligence32,1627-45(2010).

40. Xiong,X.&DelaTorre,F.SupervisedDescentMethodandItsApplicationstoFaceAlignment.in2013IEEEConferenceonComputerVisionandPatternRecognition

(CVPR)532-539(IEEE,Portland,OR,2013).41. Ferry,Q.etal.Diagnosticallyrelevantfacialgestaltinformationfromordinary

photos.eLife3,e02020-e02020(2014).

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

16

42. Cooper,G.M.etal.Acopynumbervariationmorbiditymapofdevelopmentaldelay.NatGenet43,838-46(2011).

43. Sagoo,G.S.etal.ArrayCGHinpatientswithlearningdisability(mentalretardation)andcongenitalanomalies:updatedsystematicreviewandmeta-analysisof19studiesand13,926subjects.GenetMed11,139-46(2009).

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

17

AcknowledgmentsWethankthefamiliesfortheirparticipationandpatience.WearegratefultotheExomeAggregationConsortiumformakingtheirdataavailable.TheDDDstudypresentsindependentresearchcommissionedbytheHealthInnovationChallengeFund(grantHICF-1009-003),aparallelfundingpartnershipbetweentheWellcomeTrustandtheUKDepartmentofHealth,andtheWellcomeTrustSangerInstitute(grantWT098051).Theviewsexpressedinthispublicationarethoseoftheauthor(s)andnotnecessarilythoseoftheWellcomeTrustortheUKDepartmentofHealth.ThestudyhasUKResearchEthicsCommitteeapproval(10/H0305/83,grantedbytheCambridgeSouthResearchEthicsCommitteeandGEN/284/12,grantedbytheRepublicofIrelandResearchEthicsCommittee).TheresearchteamacknowledgesthesupportoftheNationalInstitutesforHealthResearch,throughtheComprehensiveClinicalResearchNetwork.TheauthorswishtothanktheSangerHumanGenomeInformaticsteam,theSampleManagementteam,theIlluminaHigh-Throughputteam,theNewPipelineGroupteam,theDNApipelinesteamandtheCoreSequencingteamfortheirsupportingeneratingandprocessingthedata.D.R.F.isfundedthroughanMRCHumanGeneticsUnitprogramgranttotheUniversityofEdinburgh.FinallywegratefullyacknowledgethecontributionoftwoesteemedDDDclinicalcollaborators,JohnTolmieandLouiseBrueton,whodiedinthecourseofthestudy.

AuthorContributionsJeremyFMcRae1,StephenClayton1,TomasWFitzgerald1,JoannaKaplanis1,ElenaPrigmore1,DianaRajan1,AlejandroSifrim1,StuartAitken2,NadiaAkawi1,MohsanAlvi3,KirstyAmbridge1,DanielMBarrett1,TanyaBayzetinova1,PhilipJones1,WendyDJones1,DanielKing1,NetravathiKrishnappa1,LauraEMason1,TarjinderSingh1,AdrianRTivey1,MunazaAhmed4,UrujAnjum5,HayleyArcher6,RuthArmstrong7,JanaAwada1,MeenaBalasubramanian8,SiddharthBanka9,DianaBaralle4,AngelaBarnicoat10,PaulBatstone11,DavidBaty12,ChrisBennett13,JonathanBerg12,BirgittaBernhard14,APaulBevan1,MariaBitner-Glindzicz10,EdwardBlair15,MoiraBlyth13,DavidBohanna16,LouiseBourdon14,DavidBourn17,LisaBradley18,AngelaBrady14,SimonBrent1,CaroleBrewer19,KateBrunstrom10,DavidJBunyan4,JohnBurn17,NatalieCanham14,BruceCastle19,KateChandler9,ElenaChatzimichali1,DeirdreCilliers15,AngusClarke6,SusanClasper15,JillClayton-Smith9,VirginiaClowes14,AndreaCoates13,TrevorCole16,IrinaColgiu1,AmandaCollins4,MoragNCollinson4,FionaConnell20,NicolaCooper16,HelenCox16,LaraCresswell21,GarethCross22,YanickCrow9,MariellaD'Alessandro11,TabibDabir18,RosemarieDavidson23,SallyDavies6,DylandeVries1,JohnDean11,CharuDeshpande20,GemmaDevlin19,AbhijitDixit22,AngusDobbie13,AlanDonaldson24,DianDonnai9,DeirdreDonnelly18,CarinaDonnelly9,AngelaDouglas25,SofiaDouzgou9,AlexisDuncan23,JacquelineEason22,SianEllard19,IanEllis25,FrancesElmslie5,KarenzaEvans6,SarahEverest19,TinaFendick20,RichardFisher17,FrancesFlinter20,NicolaFoulds4,AndrewFry6,AlanFryer25,CarolGardiner23,LorraineGaunt9,NeetiGhali14,RichardGibbons15,HarinderGill26,JudithGoodship17,DavidGoudie12,EmmaGray1,AndrewGreen26,PhilipGreene2,LynnGreenhalgh25,SusanGribble1,RachelHarrison22,LucyHarrison4,VictoriaHarrison4,RoseHawkins24,LiuHe1,StephenHellens17,AlexHenderson17,SarahHewitt13,LucyHildyard1,EmmaHobson13,SimonHolden7,MurielHolder14,SusanHolder14,GeorginaHollingsworth10,TessaHomfray5,MervynHumphreys18,JaneHurst10,BenHutton1,StuartIngram8,MelitaIrving20,LilyIslam16,AndrewJackson2,JoannaJarvis16,LucyJenkins10,DianaJohnson8,ElizabethJones9,DraganaJosifova20,ShelaghJoss23,BeckieKaemba21,SandraKazembe21,RosemaryKelsell1,BronwynKerr9,HelenKingston9,Usha

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

18

Kini15,EstherKinning23,GailKirby16,ClaireKirk18,EmmaKivuva19,AlisonKraus13,DhavendraKumar6,V.KAjithKumar10,KatherineLachlan4,WayneLam2,AnneLampe2,CarolineLangman20,MelissaLees10,DerekLim16,CherylLongman23,GordonLowther23,SallyALynch26,AlexMagee18,EddyMaher2,AlisonMale10,SaharMansour5,KarenMarks5,KatherineMartin22,UnaMaye25,EmmaMcCann27,VivienneMcConnell18,MerielMcEntagart5,RuthMcGowan11,KirstenMcKay16,ShaneMcKee18,DominicJMcMullan16,SusanMcNerlan18,CatherineMcWilliam11,SarjuMehta7,KayMetcalfe9,AnnaMiddleton1,ZosiaMiedzybrodzka11,EmmaMiles9,ShehlaMohammed20,TaraMontgomery17,DavidMoore2,SianMorgan6,JennyMorton16,HoodMugalaasi6,VictoriaMurday23,HelenMurphy9,SwatiNaik16,AndreaNemeth15,LouiseNevitt8,RuthNewbury-Ecob24,AndrewNorman16,RosieO'Shea26,CarolineOgilvie20,Kai-RenOng16,Soo-MiPark7,MichaelJParker8,ChiragPatel16,JoanPaterson7,StewartPayne14,DanielPerrett1,JuliePhipps15,DanielaTPilz23,MartinPollard1,CarolinePottinger27,JoannaPoulton15,NormanPratt12,KatrinaPrescott13,SuePrice15,AbigailPridham15,AnnieProcter6,HellenPurnell15,OliverQuarrell8,NicolaRagge16,RahelehRahbari1,JoshRandall1,JuliaRankin19,LucyRaymond7,DebbieRice12,LeemaRobert20,EileenRoberts24,JonathanRoberts7,PaulRoberts13,GillianRoberts25,AlisonRoss11,ElisabethRosser10,AnandSaggar5,ShalakaSamant11,JulianSampson6,RichardSandford7,AjoySarkar22,SusannSchweiger12,RichardScott10,IngridScurr24,AnnSelby22,AnnekeSeller15,CherylSequeira14,NoraShannon22,SabaSharif16,CharlesShaw-Smith19,EmmaShearing8,DebbieShears15,EamonnSheridan13,IngridSimonic7,RoldanSingzon14,ZaraSkitt9,AudreySmith13,KathSmith8,SarahSmithson24,LindaSneddon17,MirandaSplitt17,MirandaSquires13,FionaStewart18,HelenStewart15,VolkerStraub17,MohnishSuri22,VivienneSutton25,GaneshJawaharSwaminathan1,ElizabethSweeney25,KateTatton-Brown5,CatTaylor8,RohanTaylor5,MarkTein16,IKarenTemple4,JennyThomson13,MarcTischkowitz7,SusanTomkins24,AudreyTorokwa4,BeckyTreacy7,ClaireTurner19,PeterTurnpenny19,CarolynTysoe19,AnthonyVandersteen14,VinodVarghese6,PradeepVasudevan21,ParthibanVijayarangakannan1,JulieVogt16,EmmaWakeling14,SarahWallwark7,JonathonWaters10,AstridWeber25,DianaWellesley4,MargoWhiteford23,SaraWidaa1,SarahWilcox7,EmilyWilkinson1,DeniseWilliams16,NicolaWilliams23,LouiseWilson10,GeoffWoods7,ChristopherWragg24,MichaelWright17,LauraYates17,MichaelYau20,ChrisNellåker28,29,30,MichaelJParker31,HelenVFirth1,7,32,CarolineFWright1,32,DavidRFitzPatrick1,2,32,JeffreyCBarrett1,32,MatthewEHurles1,32

1WellcomeTrustSangerInstitute,WellcomeTrustGenomeCampus,Hinxton,Cambridge,CB101SA,UK

2MRCHumanGeneticsUnit,MRCIGMM,UniversityofEdinburgh,WesternGeneralHospital,Edinburgh,EH42XU,UK

3DepartmentofEngineeringScience,UniversityofOxford,ParksRoad,Oxford,OX13PJ,UK4WessexClinicalGeneticsService,UniversityHospitalSouthampton,PrincessAnneHospital,CoxfordRoad,Southampton,SO165YA,UKandWessexRegionalGeneticsLaboratory,SalisburyNHSFoundationTrust,SalisburyDistrictHospital,OdstockRoad,Salisbury,Wiltshire,SP28BJ,UKandFacultyofMedicine,UniversityofSouthampton,Building85,LifeSciencesBuilding,HighfieldCampus,Southampton,SO171BJ,UK

5SouthWestThamesRegionalGeneticsCentre,StGeorge'sHealthcareNHSTrust,StGeorge's,UniversityofLondon,CranmerTerrace,London,SW170RE,UK

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

19

6InstituteOfMedicalGenetics,UniversityHospitalOfWales,HeathPark,Cardiff,CF144XW,UKandDepartmentofClinicalGenetics,Block12,GlanClwydHospital,Rhyl,Denbighshire,LL185UJ,UK

7EastAnglianMedicalGeneticsService,Box134,CambridgeUniversityHospitalsNHSFoundationTrust,CambridgeBiomedicalCampus,Cambridge,CB20QQ,UK

8SheffieldRegionalGeneticsServices,SheffieldChildren'sNHSTrust,WesternBank,Sheffield,S102TH,UK

9ManchesterCentreforGenomicMedicine,StMary'sHospital,CentralManchesterUniversityHospitalsNHSFoundationTrust,ManchesterAcademicHealthScienceCentre,ManchesterM139WL,UK

10NorthEastThamesRegionalGeneticsService,GreatOrmondStreetHospitalforChildrenNHSFoundationTrust,GreatOrmondStreetHospital,GreatOrmondStreet,London,WC1N3JH,UK

11NorthofScotlandRegionalGeneticsService,NHSGrampian,DepartmentofMedicalGeneticsMedicalSchool,Foresterhill,Aberdeen,AB252ZD,UK

12EastofScotlandRegionalGeneticsService,HumanGeneticsUnit,PathologyDepartment,NHSTayside,NinewellsHospital,Dundee,DD19SY,UK

13YorkshireRegionalGeneticsService,LeedsTeachingHospitalsNHSTrust,DepartmentofClinicalGenetics,ChapelAllertonHospital,ChapeltownRoad,Leeds,LS74SA,UK

14NorthWestThamesRegionalGeneticsCentre,NorthWestLondonHospitalsNHSTrust,TheKennedyGaltonCentre,NorthwickParkAndStMark'sNHSTrustWatfordRoad,Harrow,HA13UJ,UK

15OxfordRegionalGeneticsService,OxfordRadcliffeHospitalsNHSTrust,TheChurchillOldRoad,Oxford,OX37LJ,UK

16WestMidlandsRegionalGeneticsService,BirminghamWomen'sNHSFoundationTrust,BirminghamWomen'sHospital,Edgbaston,Birmingham,B152TG,UK

17NorthernGeneticsService,NewcastleuponTyneHospitalsNHSFoundationTrust,InstituteofHumanGenetics,InternationalCentreforLife,CentralParkway,NewcastleuponTyne,NE13BZ,UK

18NorthernIrelandRegionalGeneticsCentre,BelfastHealthandSocialCareTrust,BelfastCityHospital,LisburnRoad,Belfast,BT97AB,UK

19PeninsulaClinicalGeneticsService,RoyalDevonandExeterNHSFoundationTrust,ClinicalGeneticsDepartment,RoyalDevon&ExeterHospital(Heavitree),GladstoneRoad,Exeter,EX12ED,UK

20SouthEastThamesRegionalGeneticsCentre,Guy'sandStThomas'NHSFoundationTrust,Guy'sHospital,GreatMazePond,London,SE19RT,UK

21LeicestershireGeneticsCentre,UniversityHospitalsofLeicesterNHSTrust,LeicesterRoyalInfirmary(NHSTrust),Leicester,LE15WW,UK

22NottinghamRegionalGeneticsService,CityHospitalCampus,NottinghamUniversityHospitalsNHSTrust,TheGables,HucknallRoad,NottinghamNG51PB,UK

23WestofScotlandRegionalGeneticsService,NHSGreaterGlasgowandClyde,InstituteOfMedicalGenetics,YorkhillHospital,Glasgow,G38SJ,UK

24BristolGeneticsService(Avon,Somerset,GloucsandWestWilts),UniversityHospitalsBristolNHSFoundationTrust,StMichael'sHospital,StMichael'sHill,Bristol,BS28DT,UK

25MerseysideandCheshireGeneticsService,LiverpoolWomen'sNHSFoundationTrust,DepartmentofClinicalGenetics,RoyalLiverpoolChildren'sHospitalAlderHey,EatonRoad,Liverpool,L122AP,UK

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

20

26NationalCentreforMedicalGenetics,OurLady'sChildren'sHospital,Crumlin,Dublin12,Ireland

27DeptartmentofClinicalGenetics,Block12,GlanClwydHospital,Rhyl,Denbighshire,Wales,LL185UJ,UK

28NuffieldDepartmentofObstetrics&Gynaecology,UniversityofOxford,Level3,Women'sCentre,JohnRadcliffeHospital,Oxford,OX39DU,UK

29InstituteofBiomedicalEngineering,DepartmentofEngineeringScience,UniversityofOxford,OldRoadCampusResearchBuilding,Oxford,OX37DQ,UK

30BigDataInstitute,UniversityofOxford,Rooseveltdrive,Oxford,OX37LF,UK31TheEthoxCentre,NuffieldDepartmentofPopulationHealth,UniversityofOxford,OldRoadCampus,Oxford,OX37LF,UK

32Theseauthorsjointlysupervisedthiswork.Patientrecruitmentandphenotyping:M.Ahmed,U.A.,H.A.,R.A.,M.Balasubramanian,S.

Banka,D.Baralle,A.Barnicoat,P.B.,D.Baty,C.Bennett,J.Berg,B.B.,M.B-G.,E.B.,M.Blyth,D.Bohanna,L.Bourdon,D.Bourn,L.Bradley,A.Brady,C.Brewer,K.B.,D.J.B.,J.Burn,N.Canham,B.C.,K.C.,D.C.,A.Clarke,S.Clasper,J.C-S.,V.C.,A.Coates,T.C.,A.Collins,M.N.C.,F.C.,N.Cooper,H.C.,L.C.,G.C.,Y.C.,M.D.,T.D.,R.D.,S.Davies,J.D.,C.Deshpande,G.D.,A.Dixit,A.Dobbie,A.Donaldson,D.Donnai,D.Donnelly,C.Donnelly,A.Douglas,S.Douzgou,A.Duncan,J.E.,S.Ellard,I.E.,F.E.,K.E.,S.Everest,T.F.,R.F.,F.F.,N.F.,A.Fry,A.Fryer,C.G.,L.Gaunt,N.G.,R.G.,H.G.,J.G.,D.G.,A.G.,P.G.,L.Greenhalgh,R.Harrison,L.Harrison,V.H.,R.Hawkins,S.Hellens,A.H.,S.Hewitt,E.H.,S.Holden,M.Holder,S.Holder,G.H.,T.H.,M.Humphreys,J.H.,S.I.,M.I.,L.I.,A.J.,J.J.,L.J.,D.Johnson,E.J.,D.Josifova,S.J.,B.Kaemba,S.K.,B.Kerr,H.K.,U.K.,E.Kinning,G.K.,C.K.,E.Kivuva,A.K.,D.Kumar,V.A.K.,K.L.,W.L.,A.L.,C.Langman,M.L.,D.L.,C.Longman,G.L.,S.A.L.,A.Magee,E.Maher,A.Male,S.Mansour,K.Marks,K.Martin,U.M.,E.McCann,V.McConnell,M.M.,R.M.,K.McKay,S.McKee,D.J.M.,S.McNerlan,C.M.,S.Mehta,K.Metcalfe,Z.M.,E.Miles,S.Mohammed,T.M.,D.M.,S.Morgan,J.M.,H.Mugalaasi,V.Murday,H.Murphy,S.N.,A.Nemeth,L.N.,R.N-E.,A.Norman,R.O.,C.O.,K-R.O.,S-M.P.,M.J.Parker,C.Patel,J.Paterson,S.Payne,J.Phipps,D.T.P.,C.Pottinger,J.Poulton,N.P.,K.P.,S.Price,A.Pridham,A.Procter,H.P.,O.Q.,N.R.,J.Rankin,L.Raymond,D.Rice,L.Robert,E.Roberts,J.Roberts,P.R.,G.R.,A.R.,E.Rosser,A.Saggar,S.Samant,J.S.,R.Sandford,A.Sarkar,S.Schweiger,R.Scott,I.Scurr,A.Selby,A.Seller,C.S.,N.S.,S.Sharif,C.S-S.,E.Shearing,D.S.,E.Sheridan,I.Simonic,R.Singzon,Z.S.,A.Smith,K.S.,S.Smithson,L.S.,M.Splitt,M.Squires,F.S.,H.S.,V.Straub,M.Suri,V.Sutton,E.Sweeney,K.T-B.,C.Taylor,R.T.,M.Tein,I.K.T.,J.T.,M.Tischkowitz,S.T.,A.T.,B.T.,C.Turner,P.T.,C.Tysoe,A.V.,V.V.,P.Vasudevan,J.V.,E.Wakeling,S.Wallwark,J.W.,A.W.,D.Wellesley,M.Whiteford,S.Wilcox,D.Williams,N.W.,L.W.,G.W.,C.W.,M.Wright,L.Y.,M.Y.,H.V.F.,D.R.F.

Sampleanddataprocessing:S.Clayton,T.W.F.,E.P.,D.Rajan,K.A.,D.M.B.,T.B.,P.J.,N.K.,

L.E.M.,A.R.T.,A.P.B.,S.Brent,E.C.,I.C.,E.G.,S.G.,L.Hildyard,B.H.,R.K.,D.P.,M.P.,J.Randall,G.J.S.,S.Widaa,E.Wilkinson

Validationexperiments:J.F.M.,E.P.,D.Rajan,A.Sifrim,N.K.,C.F.W.Studydesign:M.J.Parker,H.V.F.,C.F.W.,D.R.F.,J.C.B.,M.E.H.

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

21

Methoddevelopmentanddataanalysis:J.F.M.,S.Clayton,T.W.F.,J.K.,E.P.,D.Rajan,A.

Sifrim,S.A.,N.A.,M.Alvi,P.J.,W.D.J.,D.King,T.S.,J.A.,D.d.V.,L.He,R.R.,G.J.S.,P.Vijayarangakannan,C.N.,H.V.F.,C.F.W.,D.R.F.,J.C.B.,M.E.H.

Datainterpretation:J.F.M.,H.V.F.,C.F.W.,D.R.F.,J.C.B.,M.E.H.Writing:J.F.M.,C.F.W.,D.R.F.,M.E.H.Experimentalandanalyticalsupervision:M.J.Parker,H.V.F.,C.F.W.,D.R.F.,J.C.B.,M.E.H.ProjectSupervision:M.E.H.

AuthorInformationExomesequencingdataareaccessibleviatheEuropeanGenome-phenomeArchive(EGA)underaccessionEGAS00001000775.DetailsofDD-associatedgenesareavailableatwww.ebi.ac.uk/gene2phenotype.M.E.H.isaco-founderof,andholdssharesin,CongenicaLtd,ageneticsdiagnosticcompany.CorrespondenceandrequestsformaterialsshouldbeaddressedtoM.E.H([email protected]).

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

22

TablesTable1:Genesachievinggenome-widesignificantstatisticalevidencewithoutpreviouscompellingevidenceforbeingdevelopmentaldisordergenes.Thenumbersofunrelatedindividualswithindependentdenovomutations(DNMs)aregivenforproteintruncatingvariants(PTV)andmissensevariants.Ifanyadditionalindividualswereinothercohorts,thatnumberisgiveninbrackets.TheP-valuereportedistheminimumP-valuefromthetestingoftheDDDdatasetorthemeta-analysisdataset.ThesubsetprovidingtheP-valueisalsolisted.MutationsareconsideredclusterediftheP-valueproximityclusteringofDNMsislessthan0.01.

Gene Missense PTV P-value Test ClusteringCDK13 10 1 3.2x10-19 DDD YesGNAI1 7(1) 1 2.1x10-13 DDD NoCSNK2A1 7 0 1.4x10-12 DDD YesPPM1D 0 5(1) 6.3x10-12 Meta NoCNOT3 5 2(1) 5.2x10-11 DDD YesMSL3 0 4 2.2x10-10 DDD NoKCNQ3 4(3) 0 3.4x10-10 Meta YesZBTB18 1(1) 4 1.4x10-9 DDD NoPUF60 4(1) 3 2.6x10-9 DDD NoTCF20 1 5 2.7x10-9 DDD NoSUV420H1 0(2) 2(3) 2.9x10-9 Meta NoCHD4 8(1) 1 7.6x10-9 DDD NoSET 0 3 1.2x10-7 DDD NoQRICH1 0 3(1) 3.6x10-7 Meta No

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

23

SupplementaryTablesTableprovidedinexternalspreadsheet.SupplementaryTable1:Tableofdenovomutations(DNM)inthe4,293DDDindividuals.Thetableincludessex,chromosome,position,referenceandalternatealleles,HGNCsymbol,VEPconsequence,posteriorprobabilityofDNMandvalidationstatuswhereavailable.IndividualIDsareavailableonrequest.Thislistexcludesthesitesthatfailedvalidations,butincludessitesthatpassedvalidation(confirmed),sitesthatwereuncertain(uncertain),andsitesthatwerenottestedbysecondaryvalidation(NA).GenomepositionsaregivenasGRCh37coordinates.

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

24

SupplementaryTable2:Detailsofcohortsusedinmeta-analyses.Thisincludesnumbersofindividualsbysexandpublicationdetails.

Phenotype Year Male Female Note CitationIntellectualdisability 2012 47 53 DeLigt,etal.3

Autismspectrumdisorder 2012 314 29 subsetofIossifov,etal.9 Iossifov,etal.10

Autismspectrumdisorder 2012 151 58 subsetofIossifov,etal.10 O’Roak,etal.11

Intellectualdisability 2012 19 32 Rauch,etal.12

Autismspectrumdisorder 2012 157 68 subsetofIossifov,etal.9 Sanders,etal.13

Seizures 2013 156 108subsetofEuroEPINOMICS-RESConsortium,etal.6

Epi4KConsortiumandEpilepsyPhenome/GenomeProject5

Congenitalheartdisease 2013 220 142 Zaidi,etal.14

Seizures 2014 54 38 EuroEPINOMICS-RESConsortium,etal.6

Schizophrenia 2014 308 317 Fromer,etal.7

Intellectualdisability 2014 0 0 subsetofDeLigt,etal.3 Gilissen,etal.8

Autismspectrumdisorder(normalIQ) 2014 1099 74CountsareforindividualswithIQ>=70.

Iossifov,etal.9

Autismspectrumdisorder 2014 446 112 ProbandswithIQ<70. Iossifov,etal.9

Autismspectrumdisorder 2014 1192 253Countsareextrapolatedfromthesexratioofindividualswithdenovomutations.

DeRubeis,etal.4

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

25

Tableprovidedinexternalspreadsheet.SupplementaryTable3:Geneswithgenome-widesignificantstatisticalevidencetobedevelopmentaldisordergenes.Thenumbersofunrelatedindividualswithindependentdenovomutations(DNMs)aregivenforproteintruncatingvariants(PTV)andmissensevariants.Ifanyadditionalindividualswereinothercohorts,thatnumberisgiveninbrackets.TheP-valuereportedistheminimumP-valuefromthetestingoftheDDDdatasetorthemeta-analysisdataset.ThesubsetprovidingtheP-valueisalsolisted.MutationsareconsideredclusterediftheP-valueproximityclusteringofDNMsislessthan0.01.

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

26

Figures

Figure1:Associationofphenotypeswithpresenceoflikelypathogenicdenovomutations(DNMs).A)Oddsratiosand95%confidenceintervals(CI)forbinaryphenotypes.PositiveoddsratiosareassociatedwithincreasedriskofpathogenicDNMswhenthephenotypeispresent.P-valuesaregivenforaFisher’sExacttest.B)Betacoefficientsand95%CIfromlogisticregressionofquantitativephenotypesversuspresenceofapathogenicDNM.Allphenotypesasidefromlengthofautozygousregionswerecorrectedforgenderasacovariate.Thedevelopmentalmilestones(agetoachievefirstwords,walkindependently,sitindependentlyandsocialsmile)werelog-scaledbeforeregression.Thegrowthparameters(height,birthweightandoccipitofrontalcircumference(OFC))wereevaluatedasabsolutedistancefromthemedian.C)RelationshipbetweenlengthofautozygousregionschanceofhavingapathogenicDNM.Theregressionlineisplottedasthedarkgrayline.The95%confidenceintervalfortheregressionisshadedgray.Theautozygositylengthsexpectedunderdifferentdegreesofconsanguineousunionsareshownasverticaldashedlines.n,numberofindividualsineachautozygositygroup.D)RelationshipbetweenageoffathersatbirthofchildandnumberofhighconfidenceDNMs.n,numberofhighconfidenceDNMs.E)RelationshipbetweenageofmothersatbirthofchildandnumberofhighconfidenceDNMs.n,numberofhighconfidenceDNMs.

1st c

ousi

n

3rd

cous

in

2nd

cous

in

107 108

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

prop

ortio

n w

ith p

atho

geni

c de

nov

o m

utat

ion

summed length of autozygosity (bp)

C

>0-10 (n=3165)

20-100 (n=129)

100-1000 (n=203)

Autozygous length (Mb)

10-20 (n=745)

A B

C D E

20 30 40 50

1.5

2.0

2.5

3.0

Father's age (years)

high

con

fiden

ce m

utat

ions

(n)

high

con

fiden

ce m

utat

ions

(n)

20 30 4025Mother's age (years)

35

1.5

2.0

2.5

3.0

0.6 0.8 1.0 1.2 1.4

assisted reproduction P = 0.584abnormal scan P = 0.071

bleeding P = 0.346

feeding problems

male sex

maternal illness P = 0.278

P = 0.0358

P = 0.000182

Odds ratio

neonatal intensive care P = 0.190

Pos

t -na

tal

Pre

-nat

al

-0.2 0.0 0.2 0.4Beta

autozygosity lengthmother's age

father's agegestation

age at assessment

OFCbirthweight

heightphenotypic terms (n)

social smilesat independently

walked independentlyfirst words

P = 1.7 x 10-7P = 0.0626P = 0.0164P = 0.164P = 0.0248

P = 0.147P = 0.715P = 0.699P = 0.0444

P = 0.307P = 0.399P = 0.0274P = 0.00123

Dev

elop

men

tal

mile

ston

esG

row

thA

ge

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

27

Figure2:Genesexceedinggenome-widesignificance.ManhattanplotofcombinedP-valuesacrossalltestedgenes.Thereddashedlineindicatesthethresholdforgenome-widesignificance(P<7x10-7).GenesexceedingthisthresholdhaveHGNCsymbolslabelled.CompositefacialimagesfromindividualswithDNMsinselectedgenesareincludedforthesixmost-significantlyassociatedgenes.

AHDC1

POGZ

GATAD2B

KDM5BKCNH1

ZBTB18

HNRNPU

MYT1L

BCL11A

SCN2A

SCN1A

SATB2

KIF1AITPR1

SETD5

BRPF1

SLC6A1

CTNNB1

FOXP1

TBL1XR1

TRIO BTF3COL4A3BP

MEF2C

PURA

NSD1

SYNGAP1

PPP2R5D

ARID1B

CDK13

AUTS2

GNAI1

BRAF

KAT6A

KCNQ3 PUF60

SMARCA2

STXBP1

DNM1

SET

EHMT1WAC

KAT6B

PTEN

PACS1

SUV420H1

KMT2A

CHD4

GRIN2B

SCN8APTPN11

MED13L

LRRC43 CHAMP1

CHD8

FOXG1

DYNC1H1GABRB3

CHD2CREBBP

GNAO1

CTCF

ANKRD11

CHD3

KANSL1

PPM1D

ASXL3

SMAD4

TCF4

NFIXPPP2R1A

CNOT3

CSNK2A1

ASXL1

ADNP

KCNQ2

EEF1A2

DYRK1A

EP300

TCF20MSL3

CDKL5

PDHA1

CNKSR2

USP9X

DDX3X

CASK

SLC35A2

WDR45

IQSEC2SMC1A

ZC4H2

HDAC8ALG13

NAA10

MECP2

-log 10(P)

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 910

1112

13141516171819202122X

Chromosome

SYNGAP1 ARID1B KMT2A DDX3XANKRD11ADNP

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

28

Figure3:Phenotypicsummaryofgeneswithoutpreviouscompellingevidence.Phenotypesaregroupedbytype.ThefirstgroupindicatescountsofindividualswithDNMspergenebysex(m:male,f:female),andbyfunctionalconsequence(nsv:nonsynonymousvariant,PTV:protein-truncatingvariant).Thesecondgroupindicatesmeanvaluesforgrowthparameters:birthweight(bw),height(ht),weight(wt),occipitofrontalcircumference(OFC).ValuesaregivenasstandarddeviationsfromthehealthypopulationmeanderivedfromALSPACdata.Thethirdgroupindicatesthemeanageforachievingdevelopmentalmilestones:ageoffirstsocialsmile,ageoffirstsittingunassisted,ageoffirstwalkingunassistedandageoffirstspeaking.Valuesaregiveninmonths.ThefinalgroupsummarisesHumanPhenotypeOntology(HPO)-codedphenotypespergene,ascountsofHPO-termswithindifferentclinicalcategories.

Mutations Growth Development Clinical features

-2 0 22 4 6 8 10 12 mild moderate severe 0 10 20 30

2 1 1

3 1 2

3 2 1

4 3 1

4 3 1

5 2 3

5 2 3

7 3 4

7 4 3

7 4 3

8 6 2

8 4 4

9 2 7

12 11 1

0 2

0 3

0 3

0 4

4 0

1 4

0 5

2 5

4 3

5 2

7 0

8 0

8 1

11 1

0.98 1.82 0.88 1.93

-0.62 -1.15 -0.09 -0.66

-0.73 -2.36 -1.88 -3.6

-0.73 -1.47 -0.17 0.59

0.3 0.24 0.11 -2.96

0.75 -0.75 -0.66 -2.73

-1.37 -2.64 -2.55 -2.53

0.07 0.87 1.06 -0.33

-0.82 -2.66 -1.89 -1.59

-0.34 -1.82 -0.99 -0.78

0.53 -0.98 -0.4 -2.57

-0.06 -1.43 -0.92 -2.18

-0.87 -0.37 0.24 -0.15

-0.49 -2.01 -1.05 -1.67

1.75 10 19

3 11.5 27

2.5 10 22

3 18 23.5

1.75 18 21

7.75 10.5 23

3.25 12 24

2.75 8 19

1.5 12 23

1.88 11.5 23.5

3.25 10 30

1.75 11.5 30

3 11.5 24

1.75 12 24

18

36

24

30

48

30

22

30

24

45

117.5

30

21

22

4 0 2 0 2 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

6 0 2 0 4 0 0

3 0 0 0 5 0 0

4 0 0 5 5 0 0

8 0 13 2 6 5 2

7 0 5 2 11 2 0

11 2 10 4 6 2 4

10 0 13 4 12 2 2

16 0 11 2 12 0 2

14 0 4 4 10 0 0

14 11 7 3 10 3 4

36 13 18 10 22 3 0

n m f nsv PTV bw ht wt OFC smile sit walk speak face heart skelskinhair

teeth

neurodev

eye abdo

CDK13

CHD4

CSNK2A1

GNAI1

CNOT3

PUF60

TCF20

PPM1D

ZBTB18

KCNQ3

MSL3

QRICH1

SET

SUV420H1

CDK13

CHD4

CSNK2A1

GNAI1

CNOT3

PUF60

TCF20

PPM1D

ZBTB18

KCNQ3

MSL3

QRICH1

SET

SUV420H1

probands (n) Z-score delayed development terms (n)

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

29

Figure4:Powerofgenomeversusexomesequencingtodiscoverdominantgenesassociatedwithdevelopmentaldisorders.Thepowerwasestimatedatthreedifferentfixedbudgets(1million(M)USD,2Mand3M)andarangeofrelativesensitivitiesforgenomesversusexomestodetectdenovomutations.Thenumberofgenesidentifiablebyexomesequencingareshadedblue,whereasthenumberofgenesidentifiablebygenomesequencingareshadedgreen.indicateTheregionswhereexomesequencingcosts30-40%ofgenomesequencingareshadedwithagreybackground,whichcorrespondstothepricedifferentialin2016.

Relative cost of exome to genome0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

40

30

20

10

0

$1M $2M $3M

1.201.151.101.051.00

exome

genome sensitivity

Dom

inan

t gen

es a

t gen

omew

ide

sign

ifica

nce

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

30

Figure5:Excessofdenovomutations(DNMs).A)Enrichmentratiosofobservedtoexpectedloss-of-functionDNMsbyclinicalrecognisabilityfordominanthaploinsufficientneurodevelopmentalgenesasjudgedbytwoconsultantclinicalgeneticists.B)EnrichmentofDNMsbyconsequencenormalisedrelativetothenumberofsynonymousDNMs.C)ProportionofexcessDNMswithloss-of-functionoraltered-functionmechanisms.ProportionsarederivedfromnumbersofexcessDNMsbyconsequence,andnumbersofexcesstruncatingandmissenseDNMsindominanthaploinsufficientgenes.D)EnrichmentratiosofobservedtoexpectedDNMsbypLIconstraintquantileforloss-of-function,missenseandsynonymousDNMs.CountsofDNMsineachlowerandupperhalfofthequantilesareprovided.E)NormalisedexcessofobservedtoexpectedDNMsbypLIconstraintquantile.ThisincludesmissenseDNMswithinallgenes,loss-of-functionincludingmissenseDNMsindominanthaploinsufficientgenesandmissenseDNMsindominantnonhaploinsufficientgenes(geneswithdominantnegativeoractivatingmechanisms).F)ProportionofexcessmissenseDNMswithaloss-of-functionmechanism.ThereddashedlineindicatestheproportioninobservedexcessDNMsattheoptimalgoodness-of-fit.Thehistogramshowsthefrequenciesofestimatedproportionsfrom1000permutations,assumingtheobservedproportioniscorrect.

Freq

uenc

y

250

200

150

100

50

0

Proportion of missense as loss-of-function0.2 0.3 0.4 0.5

Consequencesynonymous missense loss-of-function

0.0

0.5

1.0

1.5

2.0

2.5

Enr

ichm

ent (

obse

rved

/exp

ecte

d)

n=968excess=576

n=3853excess=1220

n=1236excess=-5

Clinical recognisability

0

40

80

120

160

Enr

ichm

ent (

obse

rved

/exp

ecte

d)

Mild ModerateLow High

Cryptic DistinctiveA B

D E

0.0

0.2

0.4

0.0

0.2

0.4

0.0

0.1

0.2

0.3

0.0

- 0.2

0.2

- 0.4

0.4

- 0.6

0.6

- 0.7

0.7

- 0.8

0.8

- 0.9

0.9

- 1.0

Nor

mal

ised

enr

ichm

ent (

obse

rved

- ex

pect

ed) all genes

missense

haploinsufficient genesloss-of-function

nonhaploinsufficient genesmissense

constraint quantileLOW HIGH

F

0

2

4

6

8n=189 n=777

loss-of-function

0

2 n=1461 n=2354missense

Enr

ichm

ent (

obse

rved

/exp

ecte

d)

0.0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

constraint quantileLOW HIGH

n=589 n=558synonymous

01

1220

576

PTV Missense

381 325

PTV Missense

576955

LoF

265

Alteredfunction

Inferred mechanism of excess DNMs

Excess DNMs DNMs in HI genesC

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

31

Figure6:Prevalenceoflivebirthswithdevelopmentaldisorderscausedbydominantdenovomutations(DNMs).Theprevalencewithinthegeneralpopulationisprovidedaspercentageforcombinationsofparentalages,extrapolatedfromthematernalandpaternalratesofDNMs.DistributionsofparentalageswithintheDDDcohortandtheUKpopulationareshownatthematchingparentalaxis.

0.06

0.00

20 25 30 35 40 45Paternal age (years)

20

25

30

35

40

45

Mat

erna

l age

(yea

rs)

0.24 0.26 0.28 0.29 0.31 0.33 0.35 0.37 0.39

0.25 0.27 0.29 0.31 0.32 0.34 0.36 0.38 0.40

0.26 0.28 0.30 0.32 0.34 0.35 0.37 0.39 0.41

0.27 0.29 0.31 0.33 0.35 0.36 0.38 0.40 0.42

0.28 0.30 0.32 0.34 0.36 0.38 0.39 0.41 0.43

0.29 0.31 0.33 0.35 0.37 0.39 0.40 0.42 0.44

0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.43 0.45

0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 0.46

0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.47

0.00

0.03

0.06

Den

sity

UKDDD

0.03

Density

UKDDD

0.30

0.35

0.40

0.45

0.25

Prevalence (%)

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

32

SupplementaryFigures

SupplementaryFigure1:Proportionofindividualswithadenovomutation(DNM)likelytobepathogenic.TheseonlyincludedindividualswithproteinalteringorproteintruncatingDNMsindominantorX-linkeddominantdevelopmentaldisorder(DD)associatedgenes,ormaleswithDNMsinhemizygousDD-associatedgenes.TheproportionsgivenareforthoseindividualswithanyDNMsratherthanthetotalnumberofindividualsineachsubset.CohortsincludedintheDNMmeta-analysesareshadedblue.

inte

llect

ual d

isabi

lity

DDD

epile

psy

autis

m s

pect

rum

diso

rder

norm

al IQ

aut

ism s

pect

rum

diso

rder

schi

zoph

reni

a

cong

enita

l hea

rt di

seas

e

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Pro

porti

on w

ith li

kely

pat

hoge

nic

de n

ovo

mut

atio

n

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

33

SupplementaryFigure2:Phenotypicsummaryofindividualswithdenovomutationsingenesachievinggenomewidesignificance.Phenotypesaregroupedbytype.ThefirstgroupindicatescountsofindividualswithDNMspergenebysex(m:male,f:female),andbyfunctionalconsequence(nsv:nonsynonymousvariant,PTV:protein-truncatingvariant).Thesecondgroupindicatesmeanvaluesforgrowthparameters:birthweight(bw),height(ht),weight(wt),occipitofrontalcircumference(OFC).ValuesaregivenasstandarddeviationsfromthehealthypopulationmeanderivedfromALSPACdata.Thethirdgroupindicatesthemeanageforachievingdevelopmentalmilestones:ageoffirstsocialsmile,ageoffirstsittingunassisted,ageoffirstwalkingunassistedandageoffirstspeaking.Valuesaregiveninmonths.ThefinalgroupsummarisesHumanPhenotypeOntology(HPO)-codedphenotypespergene,ascountsofHPO-termswithindifferentclinicalcategories.

STXBP1 11 7 4 6 5 0.31 -0.17 0.85 -0.35 2 11 27 48 0 0 3 0 15 3 0 STXBP1SMARCA2 10 7 3 10 0 -0.08 -0.56 0.02 -0.34 1.5 12 24 30 4 0 11 7 11 0 0 SMARCA2

ANKRD11ARID1BKMT2ADDX3XADNP

MED13LDYRK1AEP300SCN2ASETD5KCNQ2MECP2

SYNGAP1ASXL3SATB2TCF4CDK13

CREBBPDYNC1H1FOXP1

PPP2R5DPURA

CTNNB1KAT6A

EHMT1ITPR1KAT6BNSD1

SMC1ATBL1XR1CASKCHD2CHD4HDAC8USP9XWDR45AHDC1

CSNK2A1GNAI1GNAO1

HNRNPUKANSL1KIF1AMEF2CPACS1SLC6A1CNOT3CTCF

EEF1A2FOXG1

GATAD2BGRIN2BIQSEC2POGZPUF60SCN8ATCF20BCL11ABRAFCDKL5NFIX

PTPN11AUTS2

CHAMP1CNKSR2DNM1KCNH1NAA10PPM1DZBTB18

ZMYND11ASXL1

COL4A3BPKCNQ3MSL3MYT1LPDHA1

PPP2R1ASMAD4TRIOWACCHD8

GABRB3KDM5BPTEN

QRICH1SET

ZC4H2ALG13SCN1A

SUV420H1SLC35A2

24

probands (n) Z-score delayed development terms (n)

0 10 20 30 40 50mild moderate severe-4 2 0 2 40 10 20 30

34 18 16322928211918171717161515141413

15 1713 1728 06 158 116 129 810 710 79 714 111 47 73 117 6

1212121212121111

10101010

11 188

8

44

46 67 57 44 7

7

7

5 53

101099999988888888887777777777766666555555555444444444433333332221

8310374287854654453444153423544352432323125222313324434211112132111 0

110021122220120211313330342324324143324534263445344324312175270

6420876108486577407414723642610055501504402134

3578012780402200370363053024045500054040042310

410302000

00

22

2

0

0

0

33

3

3330

1

1

2

214142645122169006311912412503

3102326538 1

64487807

8117080311081415601551214131914263032

0.310.41-0.430.340.24-0.49-1.180.01-0.251.20.84-1.050.23

0.280.27-0.360.67-1.18-0.30.4-0.38-0.87-1.55-0.641.740.04-0.060.53-0.41-0.94-1.55-0.230.090.360.59-0.34-0.690.11.30.52-0.14-0.03-0.03-0.82-0.320.070.81.070.37-0.550.19-0.70.41-0.50.680.11-0.56-1.370.75-0.64-1.34-0.780.3-0.73-0.88-0.260.77-1.70.53-0.590.70.37-0.240.86-0.73-0.62-0.22-0.69-1.260.98-1.15 -2.8

1.82-2.51

-3.41-1.152.36-0.55-0.98-0.832.070.01-1.14

-2.660.880.84-1.46-1

-0.09-1.88-0.040.310.581.10.01-1.63

-3.781.93-2.04-1.76-1.17-0.66-3.60.30.67-0.391.47-0.52-2.79

-2.6-1.09-1.310.45-1.470.24-0.8-2.46-1.18-0.75-2.64-1.78

-1.03-1.61-0.791.95-0.170.110.64-1.75-1.19-0.66-2.55-2.17

-2.54-1.24-4.07-0.210.59-2.96-3.08-1.59-1.42-2.73-2.53-3.17

-1.061.63-0.49-0.17-1.68-2.39-0.83-1.37-2.62-1.330.87-1.21-2.66

-0.190.53-1.511.13-2.06-1.12-0.5-0.9-2.06-1.221.06-1.34-1.89

-1.42-0.98-2.1-1.3-4

-2.850.89-2.05-0.6-2.39-0.33-2.62-1.59

-0.45-0.510.65-1.130.46-1.37-0.98-1.82-0.19-0.34-0.79-2.07

0.45-0.04-0.3-1.26-0.56-0.56-0.63-0.990.16-1.060.44-1

-0.99-1.16-1.341.09-2.09-1.66-1.62-0.78-0.43-2.43-1.02-0.96

-1.49-2.08-1.63-0.98-1.43-0.58-0.77-0.31-1.86-0.37-0.57

-1.20.02-1.07-0.4-0.92-0.170.25-0.84-1.510.240.83

-0.3-1.16-1.49-2.57-2.180.68-1.29-0.99-2.81-0.15-1.72

-1.72-1.23-2.051.99-0.71-0.770.41

-0.82-1.28-0.18

-0.43-0.75-1.231.2-1.07-0.950.95

-0.8-0.9-0.44

-3.94-1.13-4.211.37-2.6-1.3-2.37

-2.88-4.27-0.44

-0.680.24-0.83-2

-2.01-0.91-0.43-0.1-1.410.06

0.3-0.26

-0.02-1.06-1.050.19-0.43-0.85-0.410.66

2.081.43-1.65-2.38-1.67-2.6-0.8-1.67-1.47

0.71-0.69-0.04-1.09-1.82-0.87-0.48-0.37-0.43-0.65-0.38 -1.81

-1.33-2.120.12-1.43-0.72-2.04-2.25-0.57-0.630.09 0.34

-0.70.26-1.91-1.930.39-0.430.3-1.12-0.68-1.16 -1.68

-0.56-2.17-1.27-1.03-1.18-4.67-4.29-2.22-0.96-0.64-0.74

NA

21.752.51.53.7532.53.25NA2NA1.5

1011080

107.51911.510812102211

1011948.5155.524272217.520242424

1011880

155.527362415NA303125.5

63.7523

0.753

1.751.7521.57.753.25

118.5164818181822169

10.512

261621

66.56023.5214726152324

482410684.5363048

137.5243022

61.6231.51.756.523.52.1232

2.751.62

161785121111911261010864

256011819241915333419221985.5

33.5113146204.553.536.515.5544524243064

1.5354.52.752.751.52.251.881.53.52.25

121121271316131011.59.51216.5

2319367023.535.5452023.5222436

24246973.54950.54536452427.5118.5

1.53

1.751.253.251.75251.55.75NA3

5310.513341011.5159111211.5

31.518.52434303044.536242424

39273060

117.5303073.5302421

2971.52322

2.25

2.253

8.51211121091112

12

212418.52424186024

22

41.53632.52830243636

36

3.253.382.252

1.751.752.751.52.251.51.5

17141912108121510131210

3032.54821.53020243524602427

4236606027332284.574734830.5

2.123.126

2.122.52

2.12

3612121210.51212

60243024243630

601236.534.53647

2.251.52.252 12

9.5912 24

192422

333023.533

50260220252617650

0 11 10 31 0 000

0

937

16

00 9

0 25

1703417 44 8 0

0 000199

0 13 7 23 8 60 14 6 15 0 50 14 0 18 0 00 0 5 13 0 00 5 5 14 0 00 4 0 14 0 0

59

0 4 0 15 0 00 6 0 18 4 0

4 0 12 5 12 6 90 0 5 4 8 4 1218 0 6 4 20 4 036 13 18 10 22 3 011 0 13 3 19 7 40 0 6 0 19 0 319 3 9 3 10 6 010 0 0 0 15 0 44 0 0 0 19 3 014 0 3 0 10 0 09 3 17 3 16 0 3

3 0 3 3 8 0 60 0 0 0 9 0 014 0 9 3 21 3 03 5 9 3 10 0 04 0 3 0 8 0 00 0 8 3 9 0 09 0 0 3 8 4 00 0 0 0 9 0 014 11 7 3 10 3 47 0 6 4 7 0 0

0 0 4 11 3 040 0 0 0 13 0 0

0210060414 0 4 4 10 0 0

20122110160 0 0 0 8 0 0

029350710 0 7 0 0 2 2

04700022 0 4 0 0 4 2

0012395103 0 2 0 8 6 0

221210 0 13 42 2 2 0 4 0 02 0 5 0 0 0 05 0 3 0 11 2 013 0 0 0 10 2 02 0 3 2 4 2 04 0 4 0 15 2 05 0 0 0 8 2 011 2 10 4 6 2 44 0 0 0 6 0 07 0 5 2 11 2 02 0 3 0 8 0 05 0 0 0 5 0 03 0 2 0 0 0 04 0 0 0 5 0 014 2 5 4 4 0 20 0 0 0 0 0 04 0 2 0 0 0 00 0 3 0 6 0 02 0 5 0 4 4 05 0 8 3 4 0 02 0 0 2 5 0 28 0 13 2 6 5 24 0 0 5 5 0 00 0 0 0 0 0 0

00000002 0 0 2 5 0 2

00500036 0 2 0 4 0 0

00424026 0 2 0 3 0 0

101

3 0 0 2 3 2 02 0 8 2 3 0 04 0 2 0 8 2 04 0 2 0 4 0 00 0 0 0 2 0 00 0 0 0 3 0 00 0 0 0 0 0 02 0 2 0 2 0 00 0 0 0 0 0 00 0 0 0 0 0 03 0 4 0 3 0 00 0 0 0 2 0 02 0 0 0 2 0 04 0 2 0 2 0 00 0 0 0 0 0 0

ANKRD11ARID1BKMT2ADDX3XADNPMED13LDYRK1AEP300SCN2ASETD5KCNQ2MECP2SYNGAP1ASXL3SATB2TCF4CDK13CREBBPDYNC1H1FOXP1PPP2R5DPURACTNNB1KAT6A

EHMT1ITPR1KAT6BNSD1SMC1ATBL1XR1CASKCHD2CHD4HDAC8USP9XWDR45AHDC1CSNK2A1GNAI1GNAO1HNRNPUKANSL1KIF1AMEF2CPACS1SLC6A1CNOT3CTCFEEF1A2FOXG1GATAD2BGRIN2BIQSEC2POGZPUF60SCN8ATCF20BCL11ABRAFCDKL5NFIXPTPN11AUTS2CHAMP1CNKSR2DNM1KCNH1NAA10PPM1DZBTB18ZMYND11ASXL1COL4A3BPKCNQ3MSL3MYT1LPDHA1PPP2R1ASMAD4TRIOWACCHD8GABRB3KDM5BPTENQRICH1SETZC4H2ALG13SCN1ASUV420H1SLC35A2

n m f nsv PTV bw ht wt OFC smile sit speak face heart skelskinhairteeth

neurodev eye abdowalk

Mutations Growth Development Clinical features

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

34

SupplementaryFigure3:Exampleofanicon-,heatmap-andimage-basedsummaryofthequantitative,categoricalandaveragefaceforeachofthegenesexceedinggenome-widesignificance.Thisusesdataonthe17individualswithdenovomutations(DNMs)inEP300.Aseparatepdffilecontainingthese“phenicons”forallgenesisprovided.Eachhasuptothreeparts.Thelefthandhalfofeachpageprovidesvisualrepresentationsofthegenename,thenumberofindividualswithdenovomutationsinthatgene,sexratio,gestation(inweeks),anthropometricdata(zscoresforbirthweight,height,weightandoccipital-frontalheadcircumference(ofc))anddevelopmentalmilestones(inmonthsforattainmentofsocialsmile,sittingunaided,walkingunaidedandfirstclearwords)fromindividualswithDNMsinthegene.ThescaledcartoonfigureshowstheheightweightandOFCwiththecolourofthehead,trunkandheightgradedwithgreyrepresentingazscoreof0andredincreasingnegativeandgreenincreasingpositivescores.Foreachmetricascatterplotisgivenabovetheindicatorbarrepresentingthemeasurementforeachindividual.Wheremorethatfourvaluesareavailabletwodensityplotsaregivenbelowthebarthegreyrepresentingthedataforallindividualsinthe94-genesetandcolouredthedensityplotforthegeneinquestion.InEP300theOFCmeasurementsareshiftedsignificantlytotheleftcomparedtothewholegroup.Forthezscoredatameanvaluesareprovidedandforthedevelopmentaldatamedianvaluesaregivenabovethebar.ThetoppanelonthelefthandsideofthepagesummarisesthekeyHumanPhenotypeOntology(HPO)termsforeachgene.TheHPOtermsintheindividualswereselected,includingtheancestralterms.Termsthatarerarerinthe4,293individualsrankhigher,adjustedbythenumberofindividualswithDNMswhohadtheterm.Theheatmapsareshadedbythenumberofindividualswitheachterm.Theheatmapsexcludetermsthatranklowerthanadescendantterm(excludingmoregeneraltermsifamorespecifictermoccurredfirst),andtermswherefewerthan25%ofindividualshadtheterm,oringeneswithlessthan8individuals,termswithfewerthantwoindividuals.ThebottompanelontherighthandhalfofthepagesummarisesthefacialphotographsfromindividualswithDNMsineachgene.Theaveragedfaceimagesareonlyavailableforselectedgenes,basedontheavailabilityofsufficienthigh-qualityfacialphotographsofindividualsforeachgene.ThewholeimagewasgeneratedusingacustomRscriptemployinggridbasedgraphics.

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

35

SupplementaryFigure4:Dispersionofdenovomutationsanddomainsforeachnovelgene.A)CDK13,B)CHD4,C)CNOT3,D)CSNK2A1,E)GNAI1,F)KCNQ3,G)MSL3,H)PPM1D,I)PUF60,J)QRICH1,K)SET,L)SUV420H1,M)TCF20andN)ZBTB18.

activ

e site

ATP bind

ing si

te Protein kinase

G714R G71

7R

G717R

V719G

K734R

R751Q

E792V

N842S

N842S

N842S

R860Q

V874L

c.289

8-1G

>A

CDK13 (1512 aa)

A

PHD-finge

r

Chrom

o/chr

omo s

hado

w

CHDNT (NUC03

4)

Chrom

o/chr

omo s

hado

w

CHDCT2 (NUC03

8)

Helica

se, C

-term

inal

K1752

K

V1636

I

R1127

Q

R1068

H

L100

9ins

M954V

I871T

S851Y

Q715*

R645W

C467Y

R341S

CHD4 (1940 aa)

PHD-finge

r

SNF2 fam

ily N

-term

inal

B

NOT2/NOT3/N

OT5

CCR4-Not

N-term

inal

E20Q

L48V

R188C

R188H

R188H

P244fs

V660fs

Q694*

CNOT3 (753 aa)

C

Protein kinase

activ

e site

ATP bind

ing si

te

R312WK19

8R

K198R

K198R

F197I

R191Q

I174M

R80H

CSNK2A1 (391 aa)

D

Alpha G

prote

in (tr

ansd

ucin)

sign

ature

Alpha G

prote

in (tr

ansd

ucin)

sign

ature

Alpha G

prote

in (tr

ansd

ucin)

sign

ature

Alpha G

prote

in (tr

ansd

ucin)

sign

ature

G-pro

tein a

lpha s

ubun

it, gr

oup I

G-pro

tein a

lpha s

ubun

it, gr

oup I

G-pro

tein a

lpha s

ubun

it, gr

oup I

G-pro

tein a

lpha s

ubun

it, gr

oup I

G-pro

tein a

lpha s

ubun

it, gr

oup I

Q52P Q17

2del

Q172d

el

E186d

el

Q204R K27

0R

K270R

I319T

V332E

GNAI1 (354 aa)

E

intra

membr

ane

cytop

lasmic

extra

cellu

lar

intra

membr

ane

cytop

lasmic

intra

membr

ane

intra

membr

ane

extra

cellu

lar

intra

membr

ane

cytop

lasmic

intra

membr

ane

cytop

lasmic

Ankyri

n-G bi

nding

site

intra

membr

ane

KCNQ volta

ge-g

ated p

otass

ium ch

anne

l

extra

cellu

lar

extra

cellu

lar

G553R

A356T

R236CR23

0C

R230C

R230C

R227Q

KCNQ3 (872 aa)

F

Chrom

o/chr

omo s

hado

wMRG

Y189fs

L314

fs

A340fs

F460fs

MSL3 (521 aa)

G

RNA bind

ing

ac

tivity

-knot

of a

chro

modom

ain

PPM-type phosphatase

D397fs

P418fsE42

4fs

E424fs

W42

7*

W42

7*

PPM1D (605 aa)

H

RNA reco

gnitio

n moti

f

RNA reco

gnitio

n moti

f

RNA reco

gnitio

n moti

fPoly-U binding splicingfactor, half-pint

H526fsG49

1R

G491E

T311fs

R298W

c.604

-2A>G

E181KE17

6ins

D159N

PUF60 (559 aa)

I

Protei

n of u

nkno

wn

functi

on D

UF3504

R652fs

R652*

Q47*

Q46fs

QRICH1 (776 aa)

J

Nucleo

some a

ssem

bly

prote

in (N

AP)

R57fs

R57fs

K154fs

SET (290 aa)

K

Histone-lysineN-methyltransferase,Suvar4-20

SET

R783*

A513V

c.977

+0G>A

W26

4SR187*

Y185fs

R143C

A74fs

SUV420H1 (885 aa)

L

PHD-like z

inc-b

inding

R1907

*

L183

8fs

K1173

Q

Q1127

fs

Y1009

*

G199fs

TCF20 (1960 aa)

M

Zinc fin

ger

Zinc fin

ger

Zinc fin

ger

Zinc fin

gerBTB/POZ

G208*

P212fs

Q271*

E350fs

R464H

H475H

R495G

ZBTB18 (531 aa)

N

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

36

SupplementaryFigure5:Effectofclusteringbyphenotypeontheabilitytoidentifygenomewidesignificantgenes.A)ComparisonofP-valuesderivedfromgenotypicinformationaloneversusP-valuesthatincorporategenotypicinformationandphenotypicsimilarity.B)ComparisonofP-valuesfromtestsinthecompleteDDDcohortversustestsinthesubsetwithseizures.Genesthatwerepreviouslylinkedtoseizuresareshadedblue.

-log10(Pall probands)0 20 40 60

-log1

0(P

seiz

ure

prob

ands

)

0

20

40

60

all genesknown seizure genes

A B

-4 -2 0 2 4

Den

sity

0.00

0.10

0.20

0.30

0.40

delta P (combined minus genotypic)

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

37

SupplementaryFigure6:Simulatedestimatesofpowertodetectloss-of-functiongenesinthegenomeatdifferencecohortsizes,givenfixedbudgets.

relative cost of exome to genome0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

$1M $2M $3M

1.201.151.101.051.00

exome

genome sensitivity

0.0

0.1

0.2

0.3

0.4

0.5

Pow

er

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

38

SupplementaryFigure7:Neurodevelopmentalgenesclassifiedbyclinicalrecognisabilitywerecomparedforthegene-wisesignificanceversustheexpectednumberofmutationspergene.Pointsareshadedbyrecognisabilitycategory.Geneshavebeenseparatedintotwoplots,oneplotwithgenesforcrypticdisorderswithlow,mildormoderateclinicalrecognisability,andoneplotwithgenesfordistinctivedisorderswithhighclinicalrecognisability.

0.00 0.04 0.08 0.12Expected loss-of-function mutations

0

20

40

60

log 1

0(P

)

0.00 0.04 0.08 0.12Expected loss-of-function mutations

0

20

40

60

log 1

0(P

)Cryptic disorders Distinctive disorders

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;

39

SupplementaryFigure8:Stringencyofdenovomutation(DNM)filtering.A)SensitivityandspecificityofDNMvalidationswithinsetsfilteredonvaryingthresholdsofDNMquality(posteriorprobabilityofDNM).TheanalysedDNMswererestrictedtositesidentifiedwithintheearlier1133trios15,whereallcandidateDNMsunderwentvalidationexperiments.ThelabelledvalueisthequalitythresholdatwhichthenumberofcandidatesynonymousDNMsequalsthenumberofexpectedsynonymousmutationsunderanullgermlinemutationrate.B)Excessofmissenseandloss-of-functionDNMsatvaryingDNMqualitythresholds.TheDNMexcessisadjustedforthesensitivityandspecificityateachthreshold.

Exce

ss o

f de

novo

mut

atio

ns

Quality threshold (posterior probability(DNM))

0

500

1000

1500

2000

2500

0.0 0.2 0.4 0.6 0.8 1.0Positive predictive value

True

pos

itive

rate

0.5

0.6

0.7

0.8

0.9

1.0

0.86 0.90 0.94 0.98

threshold:0.00781

A B

.CC-BY-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/049056doi: bioRxiv preprint first posted online Apr. 20, 2016;