integration of machine learning improves the prediction ... · 26/06/2020 · substrate...

Integration of Machine Learning Improves the Prediction Accuracy of Molecular Modelling for M. jannaschii Tyrosyl-tRNA Synthetase

Substrate Specificity BingyaDuan

SchoolofElectronic,ElectricalandCommunicationEngineering

UniversityofChineseAcademyofSciencesBeijingChina

[email protected]

YingfeiSun*SchoolofElectronic,ElectricalandCommunication

EngineeringUniversityofChineseAcademyofSciences

[email protected]

AbstractDesignofenzymebindingpockettoaccommodatesubstrateswith different chemical structure is a great challenge.Traditionally, thousandsevenmillionsofmutantshave tobescreenedinwet-labexperimenttofindaligand-specificmutantand large amount of time and resources is consumed. Toaccelerate the screening process, here we propose a novelworkflowthroughintegrationofmolecularmodelinganddata-drivenmachinelearningmethodtogeneratemutantlibrarieswithhighenrichmentratioforrecognitionofspecificsubstrate.M.jannaschiityrosyl-tRNAsynthetase(Mj.TyrRS)isusedasanexamplesystemtogiveaproofofconceptsincethesequenceandstructureofmanyunnaturalaminoacidspecificMj.TyrRSmutantshavebeenreported.BasedonthecrystalstructuresofdifferentMj.TyrRSmutantsandRosettamodeling result,wefind D158G/P is the critical residue which influences thebackbonedisruptionofhelixwithresidue158-163.Ourresultsshowthatcomparedwithrandommutation,Rosettamodelingandscorefunctioncalculationcanelevatetheenrichmentratioof desired mutants by 2-fold in a test library having 687mutants, while after calibration by machine learning modeltrainedusingknowndataofMj.TyrRSmutantsandligand,theenrichment ratio can be elevated by 11-fold. Thismolecularmodeling and machine learning-integrated workflow isanticipated to significantly benefit to the Mj. tyrRS mutantscreeningandsubstantiallyreducethetimeandcostofweb-labexperiment. Besides, this novel process will have broadapplicationinthefieldofcomputationalproteindesign.

CCSConcepts• Applied computing • Life and medical sciences •Computationalbiology•Molecularstructuralbiology

Keywords

Tyrosyl-tRNA synthetase, Genetic code expansion, Enzymesubstrate speciWicity, Rosetta, Molecular modelling, Machinelearning

1Introduction

Geneticcodeexpansion technology1hasbeenwidelyused inbiologyresearchwhichcanbeappliedonmonitoringproteinconformationalchange2causedbyPTM3asbiophysicalprobe4,improvingenzymeactivity5anddesigningproteinswithnovelcatalytic functions6-8. Through this technology, we canincorporate artiWiciallydesignedunnatural aminoacid (UAA)into almost any speciWic site of target protein. Usually anorthogonal tRNA and amino acyl-tRNA synthetase (aaRS)mutantpair is necessary for recognitionof speciWicUAAandsubsequentlyacylligationtotRNA.Thoughtherearemorethan100UAAsand their correspondingorthogonal aaRSmutantshave been reported, UAAs with novel chemical structure,biophysicalandbiologicalfunctionwillsigniWicantlybeneWittothe biology research andprotein therapeutic9. Pursue of theUAA’scorrespondingaaRSbyresearchershasneverstopped.Traditionally, an aaRS mutant recognizing speciWic UAA isselectedbyseveralroundsofpositiveandnegativescreeningfrom large and diverse aaRS mutant libraries rationallydesignedfromwildtypeaaRS1.Thescreeningprocessisverytedious, manpower-costing and error-prone. As a result, amore rapidandefWicientmethod to identify aaRSmutant forspeciWicUAAisinurgentneed.Fromtheperspectiveofcomputationalchemistry,WindingaaRSmutant for speciWic UAA is a protein design problem. ThesequencespaceofwildtypeaaRSasreceptorisexploredwiththe goal of Winding the mutant having lowest binding freeenergy to the UAA ligand. In recent years, computationalchemistrybased-molecularmodellingmethod,suchasproteinhomology modelling10, protein structure prediction11,molecular docking12 and protein design13 has been rapidly

.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint

https://doi.org/10.1101/2020.06.26.174524

http://creativecommons.org/licenses/by-nc-nd/4.0/

developed.Largenumberofsuccessfulcasesonenzymedesignforsubstrateselectivity14arereported.Recently there have been great advances on artiWicialintelligence(AI)technology,suchasmachinelearninganddeeplearning. AI has been used in computer vision, speechrecognition,machinetranslation,small-moleculedrugdesign15,protein engineering16, antibody structure prediction17 andantibody design18. In general AI can extract representativefeaturesofproteinsequenceandstructurefromlargeamountofchemicalmoleculeandproteindata,learninternalpatternswhich can’t explicitly spotted by human and utilizeexperimentalvalidatedpropertiessuchasbindingafWinitydata,IC50andenzymeactivityaslabelstotrainamodelwhichcanbeusedtopredictwhichsamplehastargetpropertyofinterestfromhugenumberofmoleculesanddiversiWiedproteinmutantlibraries. Forexample, improved score functionofmoleculardocking software19, improved protein design performance20and accurate prediction of thermostability for proteinmutants21havebeenachievedthroughmachinelearninganddeeplearning.Combinedwithcomputationalchemistry-basedmethod,data-drivenAImethodisgivingsatisfactorysolutionandaccuratepredictiononmanybiologicalproblems.In thiswork,we collected all theM. jannaschii tyrosyl-tRNAsynthetase (Mj. TyrRS)mutants reported in the literature tocompareandanalyzethesequenceandstructuralfeatureanddifferencebetweenmutantsandwildtypeMj.TyrRS.Thenweuse RosettaEnzymeDesignmethod22 tomodel thestructureandpredictbindingposeofUAA-Mj.TyrRSmutantpair.Finall,machine learning model is integrated to calibrate the scorefunctionforaaRSsubstrateselectivitypredictionandmutantdesign to achieve better prediction accuracy. We think theimprovedMj.TyrRS-UAAselectivitypredictionmodelwillbeextremely useful for in silico screening of UAA-speciWic aaRSmutants.

2Results

2.1CrystalstructurescomparisonbetweenMj.TyrRSmutantsandWTFirstly,wecollectedtheinformationofallUAAandcorrespondingMj.TyrRSmutantsofwhichhigh-resolutionX-raycrystalstructurehasbeenreleased,asshowninScheme1,

Table1andSupplementaryWileS1.Intotalthereare52UAAsand132TyrRSmutants.TheX-raycrystalstructureof18TyrRSmutantshasbeenreported(Table1).ThediversityofUAAs(Scheme1)islarge.Bothofsmallandlarge,polarandnon-polar,hydrophilicandhydrophobicsubstitutiongroupsofp-hydroxyofTyrareincludedintheseUAAs.IntheWTofMj.TyrRS,hydrogenbondnetworkbetweentyrosinehydroxygroupandpocketresiduesstabilizesthetyrosineligandandlowersthebindingfreeenergy(Figure1).InordertorecognizeotherUAAsinscheme1,mutationswithdifferentaminoacidtypesandphysicochemicalpropertieshavetobeintroducedtoaccommodatedifferentchemicalproperties.

Scheme1.ChemicalstructureoftheUAAsrecognizedbyMj.TyrRSmutantsofwhichX-raycrystalstructurehasbeensolved.

Figure1.CrystalstructureoftheWTMj.TyrRSandtyrosinecomplex.PDBID:1J1U.ThefigureisrenderedbyPymol23.(A)Mj.TyrRSisshownascartoon.Tyrosineligandisshownassticks.Surroundingresiduesareshownaslines.(B)Thebindingpocketofligandtyrosine.TyrosineformshydrogenbondnetworkwithTyr32,Asp158andGln155,whichstabilizesthehydroxygroupoftyrosine.

Table1.UAAindexandname,aaRSmutation,PDBIDandX-raycrystalstructurebackboneRMSDofUAAbindingpocketresidues.Crystalstructuresofmutantswiththesamesequenceisonlyshownonce.

UAA Index

Full name of UAA Abbrevation of UAA

Mj. TyrRS mutation PDB IDa Average backbone RMSD of pocket

residues between mutant and WT

Large mutant backbone fluctuation compared with WT?

Reference

000 L-tyrosine Tyr Wild type 1J1U 0 No 24 001 p-Methoxy-L-

phenylalanine pMeF Y32Q D158A E107T L162P 1U7X 1.02 No 25


https://doi.org/10.1101/2020.06.26.174524


003 3-(2-Naphthyl)-L-alanine H-2Nal-OH

Y32L D158P I159A L162Q A167V 1ZH0 2.20 Yes 26

004 3-Iodo-L-tyrosine 3IY H70A D158T 2ZP1 0.36 No 005 p-Benzoyl-L-phenylalanine pBpa Y32G E107P D158G I159T 2HGZ 0.38 No 27 009 p-Acetyl-L-phenylalanine pAcF Y32L D158G I159C L162R 1ZH6 1.27 Yes 28 015 p-Cyano-L-phenylalanine pCNF Y32L L65V F108W Q109M D158G

I159A 3QE4 0.68 No 29

018 p-(3-trifluoromethyl-3H-diazirin-3-yl)-phenylalanine

TfmdPhe Y32I H70F E107S Q109M D158P I159L L162E

3D6U 2.15 Yes 30

020 bipyridylalanine BpyAla Y32G L65Y H70A Q155E D158G I159W L162S

2PXH 0.40 No 31

035 3-Nitro-L-tyrosine 3NT Y32H H70C D158S I159A L162R 4NDA 0.39 No 32 042 p-Bromo-L-phenylalanine pBrF Y32L E107S D158P I159L L162E 2AG6 2.12 Yes 26 049 p-(2-tetrazole)-L-

phenylalanine p-Tpa Y32L L65I Q109M D158G L162V

V164G 3N2Y 0.33 No 33

053 3,5-difluoro tyrosine F2Y Y32R L65Y H70G F108N Q109C D158N L162S

4HJX 0.84 No 3

058 3-o-methyl tyrosine OMeTyr Y32E L65S H70G Q109G D158N L162V

4HPW 0.88 No

062 3,5-dichloride tyrosine Cl2Y Y32L L65I H70G F108I Q109L Y114G D158S L162M

4NX2 0.76 No 34

065 4-(2-bromoisobutyramido)-

phenylalanine

BibaF Y32G L65E F108W Q109M D158S L162K

4PBR 0.25 No 35

066 4-trans-cyclooctene-amidopheylalanine

Tco-amF Y32G L65E F108W Q109M D158S L162K

4PBT 0.57 No 35

068 O-phosphotyrosine pTyr Y32S L65A F108K Q109H D158G L162K

5U36 2.33 Yes 36

Forareliablemodelofthemutantstructure,thebackbonefluctuationintroducedbymutationshouldbesmallcomparedwiththatofWT.Ithasbeenreportedthatsomemutationscanchangethebackboneconformation36,sonextwealignedthecrystalstructuresofMj.TyrRSmutantsandWTandthencalculatedtheRMSDofbackboneN,CA,C,OofbindingpocketresiduesbetweenmutantandWTtocomparethestructuraldifference.AsshowninTable1andFigure2A,structureswithPDBID1ZHO,1ZH6,2AG6,3D6U,3D6VhaveaveragebackboneRMSDlargerthan1.2,ofwhichbackboneconformationischangedbymutationcomparedwithWT.Othermutantshavesmallbackbonefluctuationwhichcanbeignoredwhenbuildingthehomologymodel.4ofthe16mutantstructureshavelargebackbonefluctuation(Table1).InFigure2B,backboneRMSDofdifferentbindingpocketresiduesarecompared.Wecanseeresidues158~163havelargerRMSDthanotherresidues,whichareonthealphahelixnearUAAligand.4representativemutantstructuresarechosenandsuperimposedonWTstructure,asshowninFigure3.Inthestructureof1ZH0and2AG6thebackbonechangeofalphahelixislargewhilein2HGZand4PBRthechangeissmall.ThisimpliesitmaybedifficulttoobtainaccurateUAAandpocketresiduebindingposepredictionresultformutantbearinglargebackboneconformationchangeinhomologymodel.

Figure2.CrystalstructurebackboneheavyatomRMSDofUAAbindingpocketresidues(12intotal)betweenmutants(21intotal)andWT.(A)Foreachmutant,backboneRMSDofpocketresiduesiscalculatedandplottedusingboxenplotshowingdifferentquantiles.(B)Foreachpocketresidue,backboneRMSDfromallmutantsandWTiscalculatedandshown.


https://doi.org/10.1101/2020.06.26.174524


Figure3.Comparisonofthealphahelixcontainingresidues155-166between4representativeMj.TyrRSmutantsandWTX-raycrystalstructures.WTstructureiscoloredinmagenta,lightpink,green,magentafor(A),(B),(C)and(D),respectively.Howdifferentmutationsinducethebackboneconformationchangeremainsunclear.ItwasreportedthatinpTyrMj.TyrRSD158GmutationactedasahelixbreakerandcausedtherearrangeofthehelixandopenedupthepockettoaccommodatethebulkyUAA36.ThoughpCNFMj.TyrRShasthesameD158Gmutation,noobviousbackboneconformationchangeisobserved(Table1).Tofurtherinvestigatethecriticalresiduesaffectsthehelixconformation,wecalculatedAAindex37(531typesofnumericalindicesrepresentingvariousphysicochemicalandbiochemicalpropertiesofaminoacids)ofresidues158~163forthe17Mj.TyrRSinTable1.Totallythereare531*6=3186featuresforeachTyrRSmutant.Weperformedanalysisofvariance(ANOVA)bysklearn38tofindwhichresidueandAAindexpropertywillcontributemosttothediscriminationofTyrRShelixbackbonedisruption.Thetoplargest10f_valuearefromresidue158andhavep_value<0.001.The10AAindextypesarerelatedtointrinsicsecondarystructurepropensitiesoftheaminoacids(Table2),whichimpliesthatresidue158mayinfluencesthesecondarystructure.AAindextypeandvalueversusbackbonedisruptionornotfor17pdbsandresidue158isshowninFigure4.WecanseethatthedistributionoftheAAindexvalueisdifferentforTyrRSwithdifferenthelixbackbone.Eachofthe10AAindexcanbeusedtoseparatetheTyrRSmutantwithhelixbackbonedisruptionfromothers.ThisindicatesthatD158P/Gmutationisahelixbreakerandcandisruptthebackboneconformation,whichshouldbetakenintoconsiderationwhenbuildinghomologymodelfortheTyrRSmutantstoobtainaccuratestructuremodel.Table2.Residuesiteandtop10AAindexcontributesmosttothediscriminationofTyrRShelixbackbonedisruption.

Position AAindex type f_value p_value

158 MUNV940104 27.74547 0.000063

158 ROBB760104 27.262113 0.000069

158 MUNV940105 26.026208 0.000089

158 BLAM930101 25.948507 0.00009

158 BUNA790103 25.905193 0.000091

158 QIAN880134 25.607837 0.000097

158 MUNV940101 24.323819 0.000126

158 ONEK900101 23.726363 0.000144

158 RACS820114 23.355244 0.000156

158 QIAN880112 22.67982 0.000181

Figure4.BoxplotofAAindextypeandvalueversusbackbonedisruptionornotfor17pdbsandresidue158.EachsubplothasadifferentAAindextype.Thereare17datapointsineachplot,whichistheAAindexvalueofresidue158from17pdbs.1isforbackbonedisruptionwhile0isnot.

2.2UAA-Mj.TyrRSmutantcomplexmodellingusingRosettaTofacilitatetheTyrRSwet-labscreeningprocessandreducethetimeandresourcecost,molecularmodellingiscarriedouttopredictwhichTyrRSmutantcanrecognizethetargetUAA.FirstweuseRosettaEnzymeDesign22tomodelthebackbonechangeandAAsidechainpackingtoaccommodatespecificUAAligandforeachofthe6864UAA-mutantpaircomplex(52UAAstimes132TyrRSmutants).ThecrystalstructureofWTMj.TyrRS(PDBID1J1U)isusedasinputtemplate.RepresentativemodelingresultsareshowninFigure5.FormutantswithPDBID2HGZand4PBR,noobviousbackbonedisruptionisobservedincrystalstructure(Table1).ThepredictedbindingposeinFigure5A2HGZisaccurate,whileinFigure5B4PBRthepredictedorientationofUAAliganddeviatesfromthetruepositionincrystalstructureandleadstoinaccuratesidechainpackingofresidueLys162andSer158,whichmaybecausedbywrongselectionofBibaFconformation.FormutantswithPDBID1ZH0and2AG6,largebackbonedisruptionisobservedincrystalstructure(Table1).ThoughtheUAAligandpositionandconformation(Figure5Cand5D)isaccuratelymodeled,thesidechainofAAsurroundingthebindingpocketdeviatesalotfromthecrystalstructuremainlyduetoinaccuratemodelingofthealphahelixwithresidues158-163.Theresultsindicatethatformutantswithlittledisruptiononhelix158-163,Rosettamodelisaccurateenoughtopredictthebindingposebutformutantswithlargedisruption,theoppositeistrue.


https://doi.org/10.1101/2020.06.26.174524


Figure5.Comparisonofthebindingposeandbindingpocketresiduesof4Mj.aaRSmutantsfromX-raycrystalstructureandRosettamodel.CorrespondingUAAandbindingsiteresiduesareshownassticks.(A)PDBID:2HGZ.Crystal:green.Model:magenta.(B)PDBOD:4PBR.Crystal:orange.Model:lightpink.(C)PDBID:1ZH0.Crystal:white.Model:green.(D)PDBID:2AG6.Crystal:yellow.Model:magenta.Totestwhetherthecorrectbackboneconformationofresidues158-163helixhavingD158G/Pmutationcanbepredicted,thissegmentisdenovoremodeledusingKICloopmodelingmethod39inRosetta.1000modelsaregeneratedforeachofthe4mutantswithPDBID1ZH0,1ZH6,2AG6and3D6Uusinghomologymodelinpreviousstepasinput.ThebackboneRMSDtocrystalstructurearecalculatedandplottedagainsttotalenergy(Figure6).Scatterplotinfunnelshapecanbeobservedforall4mutants,whichindicatestheamountofbackboneconformationsamplingisenoughtofindlocalminimumenergypoint.Forthelowestenergystructure,theRMSDtocrystalstructureis3.65,2.41,3.74and4.39Åinthe4mutants.ThelowestRMSDtocrystalstructurein1000modelsofthe4mutantsis0.94,0.47,0.49and0.48,respectively.Thermsd_to_crystalforthetop10modelsrankedbytotal_energy_scoreandtotal_scorerankforthetop10modelsrankedbyrmsd_to_crystalon4TyrRSmutantsareshowninFigure7Aand7B,respectively.Tofurtheranalyzethemodeledstructure,crystalstructure,inputRosettamodel,modelwithlowestRMSDtocrystal,modelwithlowestenergyaresuperimposed.Wecanseethatmodelfor1ZH6hasthemostidealfunnelplot(Figure6B)andmodelwiththelowestenergyhasleastdeviationfromcrystalstructure(Figure7D),though2.41ÅRMSDmaystillnotbeaccurateenoughforUAA-TyrRSbindingaffinityprediction.Fortheremaining3

mutants,modelstructureclosetothenativecrystalstructurehasbeengeneratedbutwecan’tpickitupduetotheerrorofRosettaenergyfunction(Figure7).

Figure6.Scatterplotforrmsd_to_crystalvstotal_energy_scorein1000modelsgeneratedbyRosettadenovoloopmodellingforTyrRSmutantswithPDBID1ZH0,1ZH6,2AG6and3D6U.

Figure7.StructureanalysisofRosettaloopdenovomodellingresultsonTyrRSmutantswithPDBID1ZH0,1ZH6,2AG6and3D6U,respectively.(A)Boxplotofrmsd_to_crystalforthetop10modelsrankedbytotal_energy_scoreon4TyrRSmutants.(B)Boxplotoftotal_scorerankforthetop10modelsrankedbyrmsd_to_crystalon4TyrRSmutants.(C,D,E,F)Structuresuperimpositionofcrystalstructure(ingreen),originalRosettamodelusingWTcrystalstructureastemplate(inblue),remodeledmodelwiththelowestRMSDtocrystalstructurewithinthe1000models(inmagenta)andremodeledmodelwiththelowestenergywithinthe1000models(inyellow)on4TyrRSmutants.

2.3MachinelearningmodelandperformanceanalysisInthepreviouspart,wehavemodeled6864UAA-TyrRSmutantpairsusingRosettaandfoundthattheenergyfunctionisnotaccurateenoughtodiscriminatethetrueUAAbinder


https://doi.org/10.1101/2020.06.26.174524


fromthefalse.Tofurtherimprovetheenergyfunction,weusemachinelearningtotrainamodeltolearnimportantenergytermandprotein-ligandinteractionsthatcontributemosttothebindingenergy.TheflowchartofthemachinelearningmodelandapplicationisshowninScheme2.Nexteachpartoftheflowchartwillbedescribed.

Scheme2.Flowchartofthedatapreparation,MLmodeltraining,predictionandapplicationworkflow.Figurelegendisonthetop.2.3.1Datapreparation

Wecollectedthechemicalstructureof52UAAs,sequenceandX-raycrystalstructureof132TyrRSmutantsfrompreviousliterature.EachUAAispairedtoeachTyrRSmutanttogeneratedatasetforMLmodeltraining.Thereare6864UAA-TyrRSmutantpairsandcomplexmodelisbuiltintheprevioussection.ForthespecificUAA-mutantpair,iftheUAAcanberecognizedbythemutant,thispairislabeledaspositiveandalltheremainingpairsaretakenasnegativesampleundertheassumptionthatTyrRSmutanthashighspecificityandorthogonalityfordifferentUAAsubstrate.Inthefinaldataset,wehave132positivesamplesand6732negativesamples.Thepositivetonegativeratiois1:51.

2.3.2Featureextraction

TheUAA-TyrRSmutantpaircanbedescribedfrom3aspects:UAAligand,mutantproteinandUAA-mutantcomplex.ThepresentationandfeaturevectorizationmethodarelistedinTable3.ThefeatureisextractedfromthechemicalstructureofUAA,sequenceofTyrRSmutantand3DstructureofUAA-mutantcomplexmodelbuiltusingRosetta.Thedimensionofallfeaturesis2644.

2.3.3MLModeltraining

10%ofdataisusedastestset(687samples).Theremaining90%data(6177samples)isusedforMLmodeltraining,whichisrandomlysplittedintotrainingsetandvalidationsetwitharatioof4:1for5-foldcrossvalidation(CV).Featureengineering,modelselectionandhyperparametertuningarecarriedoutthroughpycaret40,alow-codemachinelearninglibrary.15machinelearningalgorithmsareusedtotrainthemodel:LightGradientBoostingMachine(lightGBM),CatBoostClassifier,ExtraTreesClassifier,ExtremeGradientBoosting,LogisticRegression,RidgeClassifier,RandomForestClassifier,QuadraticDiscriminantAnalysis,KNeighborsClassifier,AdaBoostClassifier,GradientBoostingClassifier,LinearDiscriminantAnalysis,DecisionTreeClassifier,SVM-LinearKernel,NaiveBayesClassifier.Differentmetricsareusedtoevaluatethebinaryclassificationmodelperformance:accuracy,AUC,recall,precisionandF1.

2.3.4MLModelevaluationandexplanation

Differentfeaturecombinationsareexploredinthemodeltrainingprocess.Thefeature_type_index,valuesofbestmetricsandcorrespondingmodelnamehavingthebestperformancearelistedinTable4.Fordifferentmetrics,thealgorithmhavingbestperformanceisdifferent.Ingeneral,theperformanceofmodelusingallfeaturesisbetterthanusingsingletypeoffeature(Figure8).WhenusingtheUAAdpocketdescriptoronly,QuadraticDiscriminantAnalysisandExtraTreesClassifiermodelshavebestrecallandprecision,respectively,whiletheirF1isnotthehighest.LightGBMmodelhasthebestaccuracy,AUCandprecisioninsomefeaturecombinations.

Table3.FeaturerepresentationmethodforUAAsmallmoleculeligandandTyrRSproteinreceptorusedinthisstudy

Method Description Dimension ReferencePROFEAT-ligand Smallmolecule1Dand2Ddescriptors. 406 41PROFEAT-receptor Proteinsequencedescriptorsusingaminoacidbiophysicalpropertiesand

compositions.1437 41

Rosettaenergyscoreanddecomposition

Rosettainterface_Etotalscoreandenergyterm(fa_atr,fa_rep,fa_sol,fa_elec,fa_pair,hbond_sr_bb,hbond_lr_bb,hbond_bb_sc,hbond_sc)decomposedintoeachbindingpocketresidue.

6+540 22,42

dpocket Dpocket(describingpocket)extractsseveraldescriptorsusingatom,aminoacid,alphasphereandvolumeinformationfromtheligandbindingpocket.

35 43

rfscore Amachinelearning-basedscorefunctionforprotein-ligandcomplex. 216 44


https://doi.org/10.1101/2020.06.26.174524


Table4.Comparisonof5-foldCVperformanceondifferentMLmodelsanddifferentperformancemetricsforUAAspecificityprediction

Features and combination

Feature_type_index

Model_with_best_accuracy

Best_accuracy

Model_with_best_AUC

Best_AUC

Model_with_best_recall

Best_recall

Model_with_best_precision

Best_precision

Model_with_best_F1

Best_F1

Rosetta_6_score_terms 1 SVM - Linear Kerne

0.979 CatBoost Classifier

0.7369 Decision Tree Classifier

0.0615 Naive Bayes 0.176 Quadratic Discriminant

Analysis

0.0777

Rosetta_energy_decomposed

2 Extra Trees Classifier

0.9796 Light Gradient Boosting Machine

0.7874 Naive Bayes 0.8077 Random Forest Classifier

0.4000 Decision Tree Classifier

0.1465

UAA_pocket_descripotr 3 Extra Trees Classifier


0.8042 Quadratic Discriminant

Analysis

1.0000 Extra Trees Classifier

0.7167 Extra Trees Classifier

0.1612

Rosetta_6_score_terms+energy_decomposed

4 Light Gradient Boosting Machine


0.8193 Linear Discriminant

Analysis

0.1846 K Neighbors Classifier


Analysis

0.1836

All_features_above 5 Light Gradient Boosting Machine


0.8222 Naive Bayes 0.4769 Light Gradient Boosting Machine


Analysis

0.1654

Figure8.Barplotofthebestmodelperformanceondifferentfeaturecombinationsanddifferentmetrics.Feature_type_indexisthesameasthatinTable4.WechooselightGBM45modelforfurtheranalysissinceit’satree-basedensemblemodelwhichcanavoidoverfittingandhasbeenwidelyusedinothermachinelearningmodelapplications.Featureimportanceofthemodelcanbeeasilyexplained.TheperformanceoflightGBMmodelon5-foldCVandtestsetiscompared(Table5).Wecanseethemetricsarebetterontestsetthanthaton5-foldCV.

Table5.PerformanceoflightGBMmodelon5-foldcrossvalidationsetandtestsetusingallfeatures(feature_type_index5).lightGBM Accuracy AUC Recall Precision F1

5-foldCV 0.9794 0.7593 0.0385 0.35 0.0686

testset 0.9796 0.8549 0.0667 1 0.125

SHAPmethod46isusedtocalculatethefeatureimportanceanditscontributiontothebinaryclasslabel(Figure9).Residue_70_fa_solandresidue_34_fa_solareconsideredasthemostimportantfeaturesbythelightGBMmodelwhichindicatesthatthesolvationenergyofresidue34and70maybeimportant.Rosetta_interface_Eandresidue_158_fa_elecare

alsoimportantwhichisinaccordancewithourdomainknowledgethatscorefunctionisusefultodiscriminatethepositivesampletosomeextent.

Figure9.SHAPvaluesoftheimportantfeaturesforthelightGBMmodel.TheaverageSHAPvalueofspecificfeatureinallsamplesareshown.Theimpactonnegativelabelisinblueandtheimpactonpositivelabelisinred.TocomparethepredictionaccuracybetweenlightGBMmodelandRosettascore,ROCandPRcurveontestset(Figure10Aand10B)areplottedforboth2methods.BetterROCandPRcurveareobservedforlightGBMMLmodel.AUCoflightGBMmodelis0.84,higherthan0.77ofRosettascore.Asasimulationtestforrealwetlabexperiment,wecalculatewhenthenumberofmutantskforweb-labexpisgivenandfixed,howmanytruepositivesamplesexistinthekmutants(Figure10Cand10D).Forexample,ifwewanttotest50mutantsinwetlabexperimentoutof687mutants,therewillbe1,2and11truepositivemutantsforrandomsample,RosettascorepredictionandlightGBMmodelprediction,respectively.Thesuccessrateis2%,4%and22%,respectively,whichmeansRosettascorepredictionhas2-foldelevationonenrichmentratiooftruepositivemutants,whilelightGBMmodelhas11-


https://doi.org/10.1101/2020.06.26.174524


foldelevation.TheTyrRSmutantscreeningwillsignificantlybenefitfromtheimprovementofpredictionaccuracyusingmachinelearningmodel.

Figure10.(A)Receiveroperatingcharacteristiccurve,(B)Precision-recallcurveand(C,D)Numberoftruepositivesamplesvsnumberofmutantsselectedforwet-labexperimentoflightGBMmodelandRosettainterface_Escoreonlypredictionontestset.Thescaleofxaxisislimitedto0-100infigureD.

3Discussion

3.1SigniJicanceoftheworkforgeneticcodeexpansionandcomputationalproteindesignIntheWieldofcomputationalproteindesign,itisagreatchallengetointroducemutationsintoproteinstochangethesubstratespeciWicityfordifferentligandsinaparticularligand-receptorcomplexsystem.Therehavebeenmanyreports,suchasusingRosetta47andOSPREY48todesignproteinstoswitchthesubstrateandaccommodatespeciWicligands.Asamodelsystem,Mj.TyrRShasbemutatedanddesignedtobindUAAwithdifferentchemicalstructuresasreportedbefore49.Thissystemcanbeusedasabenchmarkforsubstrate-speciWicproteindesign,asalargenumberofmutantshavebeenreported.ComprehensivestudyofthissystemwillbeofgreatsigniWicancetotheWieldofproteindesign.ThisworkfocusesonUAA-Mj.TyrRScomplexsystemwhichhasbeenwidelyusedingeneticcodeexpansionresearchWield.Throughtheeffortsofmanyresearchgroupsinthepastover15years,alargenumberofUAAanditscorrespondingTyrRSmutantshasbeenidentiWiedafterspendingalotofhumanandmaterialresources,butsofar,thereisnosystematicstudytointegratethesedataforthedevelopmentandtestofUAAvirtualscreeningmethod.Thispapersummarizesallthe

reportedUAAandMj.TyrRSmutants,systematicallyanalyzesthem,andattemptstoestablisharelativelyaccuratemodeltopredicttheUAA-speciWicrecognitionofdifferentmutants,establishingthefoundationforlarge-scale,high-throughputvirtualscreening.Theexistingmethodsforpredictingproteinligandinteractionareprobablynotsuitableforthissystem.Thereasonisthattheadaptabilityandaccuracyofscorefunctionsfordifferentligand-receptorsystemsaredifferent.Forexample,theimportanceofhydrogenbondinonesystemmaybehigherthananother.MethoddevelopedforafWinitypredictionofdrug-targetproteininteractioncannotbeappliedontheMj.TyrRSsystemwithoutmodiWication.Inthispaper,wecalibratetheRosettascorefunctionforthisspeciWicsystemtogetbetterpredictionperformance.BesidesRosettamolecularmodeling,inthefutureothermethodssuchasmoleculardocking,MM-PBSA50,TI51,andFEP52canalsobeintegratedwithMLinthesimilarwaytogivemoreaccuratepredictionresults.

3.2PreviousworkforTyrRSmutantsubstratespeciJicitypredictionandcomputationallibrarydesignSeveralmethodsaimingtoacceleratethescreeningofaaRSmutantsordesignmorefocusinglibraryusingcomputationalchemistrymethodsuchasmolecularmodelingandMM-PBSA53-55havebeenreported.TheseresultsshowthatmolecularmodelingcanbeusedtopredictTyrRSsubstratespeciWicity,buttheamountofUAAandTyrRSmutantsusedinpreviousstudiesistoosmalltogiveconvincingconclusion.Inthiswork,weconsideredallUAAandTyrRSmutantsreportedbefore.WiththousandsofUAA-mutantpairsastrainingandvalidationdata,amoresolidconclusioncanbemadethatmolecularmodelingintegratedwithMLcanbeusefulinthevirtualscreeningofTyrRSmutant.

3.3DiscussiononthepredictionresultThoughthebackbonedisruptionofhelix158-163can’tbepreciselymodeled,wethinkitsinWluenceonthepredictionaccuracymaynotbelargebecausethereareonly58D158Gmutantsand13D158Pmutantswithafractionof42%intotal171mutants(SupplementaryWileS1).Besides,descriptorsextractedfromproteinsequenceisnotaffectedbythebackbonedisruptionofstructure.KnowledgeandinformationlearnedfrommutantsequencebyMLmodelcouldcorrecttheerrorinthehomologymodelandincreasethepredictionaccuracy.

3.4LimitationoftheworkLimitationofthecurrentmolecularmodelingmethodFirstly,forthehomologymodelingofMj.TyrRSmutants,thepositionofaminoacidsidechainandrotamerinthebindingpocketcanberecoveredwell,butforthosewithlarge


https://doi.org/10.1101/2020.06.26.174524


backbonechanges,itishardtomodelthebackboneconformationalignedwelltocrystalstructure.ThoughthecloseconformationcanbesampledbyRosettaloopremodeling,thescorefunctionisnotpreciseenoughtopickitout.TheinaccuracyofstructurepredictiondecreasesthepredictionaccuracyofUAA-speciWicmutant.Secondly,therearewatermoleculesintheUAAbindingpocketofMj.TyrRS.TheinWluenceofwaterisnotconsideredbycurrentmolecularmodelingmethod.LimitationofthemachinelearningmethodThegeneralizationabilityofMLmodelmaybelowbecausethenumberofreportedUAAandmutantpairissmallcomparedtothehugechemicalspaceofUAAandsequencespaceofMj.TyrRSmutants.CurrentlyitisdifWiculttocoverthediversiWiedchemicalstructureandproteinsequence.Oneofthesolutionsisdeepmutationalscanning56,whichusesnextgenerationsequencingtogetphenotypeofover1millionmutantsinasingleexperimentandgeneratesenoughdataformachinelearninganddeeplearningmodeltrainingandprediction.Stillwet-labexperimentsareneededtovalidatethemodelforpracticalusage.

4ConclusionTogetfurtherknowledgeoftheMj.TyrRSstructureandacceleratethetime-costlyscreeningprocessofmutantforspeciWicUAA,wecollectedalltheUAAsandMj.TyrRSreportedbefore,analyzedthestructureandsequencedifferencebetweenthemutantsandfoundthatsomemutantshavealternativebackboneconformationonalphahelixresidue158-163withD158G/Pmutation,whichmakesaccuratemutantmodellingmoredifWicult.RosettamodelingandmachinelearningareintegratedtogivemoreaccuratepredictionresultsformutantselectivitytowardsdifferentUAAs.Differentfeaturecombinationsandmachinelearningalgorithmsaretestedforhighermodelperformance.Lightgbmmodelischosenandthefeatureimportanceandcontributiontothebinaryclasslabeliscalculatedtoexplaintheknowledgelearnedbythemodel.AfterthecalibrationofRosettascorefunctionusinglightGBMmodel,theenrichmentratiooftargetmutantiselevatedby11-foldcomparedwithrandommutation.Wet-labexperimentisinprogresstovalidatethemodel.Weanticipatethatthisproof-of-conceptworkWlowwillbeofgreathelpinthescreeningofMj.TyrRSandproteindesignWield.

5 Methods

5.1MolecularmodellingusingRosettaPreparationoftheUAAligandPreparationoftheUAAligandwasperformedasdescribedintheRosettatutorialforligandpreparation57.BrieWlyspeaking,

thechemicalstructureofUAAisdrawnusingMarvinJS58andconvertedtosmilesformat.CheminformaticstoolRDKit59isusedtogeneratelow-energyconformationsandperformenergyminimizationbasedonthesmilesofUAA.TheligandparameterWileusedinRosettaismadebymolWile_to_params.py57.UAA-Mj.TyrRSmutantcomplexmodelbuildingTheUAA-Mj.TyrRSmutantcomplexmodelisbuiltusingRosettaEnzymedesign22applicationwithdefaultparametersasdescribedinRosettatutorial60.BrieWlyspeaking,theaminoacidmutationinthemutantsiswritteninresWileandpassedintotheapplicationalongwiththeUAAligandparameterWile.CatalyticresiduesoftheaaRSformattinghydrogenbondwithUAAisWixedusingconstraintWile.Foreachcomplex,nstruct=10isusedandthestructurewiththelowesttotalscoreiskeptforfurtheranalysis.DenovoloopmodelingusingKICmethodTheresidues158-163segmentofMj.TyrRSisdenovomodeledusingloopmodelingKICmethodasdescribedinRosettatutorial61withdefaultparameters.1000modelsaregeneratedforeachinputstructure.TheRMSDbetweenmodeledstructureandcrystalstructureiscalculatedbyin-housescriptwrittenusingbiopython62.

5.2 MachinelearningDescriptorextractionPROFEAT,dpocketandrfscoredescriptorsofUAAligand,Mj.TyrRSmutantproteinandUAA-Mj.TyrRSmutantcomplexiscalculatedonhttp://www.descriptordb.com/withdefaultparameters.FeatureengineeringFeatureengineering,modelselectionandhyperparameteroptimizationareperformedbyPyCaret40.90%ofthefulldataisusedin5-foldCVandtherest10%ofdataisusedastestset.Thefeaturespaceistransformedusing‘zscore’method.‘ignore_low_variance’optionissettoTrueandallcategoricalfeatureswithstatisticallyinsignificantvariancesareremovedfromthedataset.Featureselectionisusedand‘feature_selection_threshold’issetto0.8.

Dataavailability

Alldatasetsandparametersgeneratedoranalyzedduringthestudyareavailablefromthecorrespondingauthoronreasonablerequest.

FundingThis work was supported by National Natural ScienceFoundationofChina[61431017].

Authorcontributions


https://doi.org/10.1101/2020.06.26.174524


YingfeiSunconceivedthestudyandeditedthepaper.BingyaDuan conducted the experiments and analyzed the results.Bingya Duan and Yingfei Sun co-wrote the manuscript. Allauthorsreviewedthemanuscript.

CorrespondingauthorCorrespondencetoYingfeiSun.

CompetinginterestsTheauthorsdeclarenocompetinginterests.

References1. J.W.Chin,Expandingandreprogrammingthegeneticcodeofcells

andanimals.AnnuRevBiochem83,379-408(2014).2. F. Yang et al., Phospho-selective mechanisms of arrestin

conformations and functions revealed by unnatural amino acidincorporationand(19)F-NMR.NatCommun6,8202(2015).

3. F. Li et al., A genetically encoded 19F NMR probe for tyrosinephosphorylation.AngewChemIntEdEngl52,3958-3962(2013).

4. K. Yokoyama, U. Uhlin, J. Stubbe, Site-Specific Incorporation of 3-Nitrotyrosine as a Probe of pK(a) Perturbation of Redox-ActiveTyrosines inRibonucleotideReductase. JAmChemSoc132, 8385-8397(2010).

5. I.N.Ugwumbaetal., ImprovingaNaturalEnzymeActivitythroughIncorporationofUnnaturalAminoAcids. JAmChemSoc133,326-333(2011).

6. I. Drienovska, G. Roelfes, Expanding the enzyme universe withgenetically encoded unnatural amino acids. Nat Catal 3, 193-202(2020).

7. X. H. Liu et al., A genetically encoded photosensitizer proteinfacilitates the rational design of a miniature photocatalytic CO2-reducingenzyme.NatChem10,1201-1206(2018).

8. I.Drienovskaetal.,Designofanenantioselectiveartificialmetallo-hydrataseenzymecontaininganunnaturalmetal-bindingaminoacid.ChemSci8,7228-7235(2017).

9. Q.Lietal.,DevelopingCovalentProteinDrugsviaProximity-EnabledReactiveTherapeutics.Cell,(2020).

10. V.K.Vyas,R.D.Ukawala,M.Ghate,C.Chintha,HomologyModelingaFastToolforDrugDiscovery:CurrentPerspectives.IndianJPharmSci74,1-17(2012).

11. A. W. Senior et al., Improved protein structure prediction usingpotentialsfromdeeplearning.Nature577,706-+(2020).

12. X. Y. Meng, H. X. Zhang, M. Mezei, M. Cui, Molecular Docking: APowerful Approach for Structure-Based Drug Discovery. CurrComput-AidDrug7,146-157(2011).

13. B.Kuhlman,P.Bradley,Advancesinproteinstructurepredictionanddesign.NatRevMolCellBio20,681-697(2019).

14. R. F. Li et al., Computational redesign of enzymes for regio- andenantioselectivehydroamination.NatChemBiol14,664-+(2018).

15. A.Zhavoronkovetal.,DeeplearningenablesrapididentificationofpotentDDR1kinaseinhibitors.NatBiotechnol37,1038-+(2019).

16. Z.Wu,S.B.J.Kan,R.D.Lewis,B.J.Wittmann,F.H.Arnold,Machinelearning-assisted directed protein evolution with combinatoriallibraries(vol116,pg8852,2019).PNatlAcadSciUSA117,788-789(2020).

17. J.A.Ruffolo,C.Guerra,S.P.Mahajan,J.Sulam,J.J.J.b.Gray,GeometricPotentialsfromDeepLearningImprovePredictionofCDRH3LoopStructures.(2020).

18. J.Gravesetal.,AReviewofDeepLearningMethodsforAntibodies.Antibodies(Basel)9,(2020).

19. C.Wang,Y.Zhang, Improvingscoring-docking-screeningpowersofprotein-ligandscoringfunctionsusingrandomforest.JComputChem38,169-177(2017).

20. J.X.Wang,H.L.Cao, J.Z.H.Zhang,Y.F.Qi,ComputationalProteinDesignwithDeepLearningNeuralNetworks.SciRep-Uk8,(2018).

21. H. Cao, J.Wang, L. He, Y. Qi, J. Z. Zhang, DeepDDG: Predicting theStabilityChangeofProteinPointMutationsUsingNeuralNetworks.JChemInfModel59,1508-1514(2019).

22. F. Richter, A. Leaver-Fay, S. D. Khare, S. Bjelic, D. Baker, De novoenzymedesignusingRosetta3.PLoSOne6,e19230(2011).

23. Schrodinger,LLC.(2015).

24. T.Kobayashietal.,StructuralbasisfororthogonaltRNAspecificitiesof tyrosyl-tRNAsynthetases forgenetic codeexpansion.NatStructBiol10,425-432(2003).

25. Y.Zhang,L.Wang,P.G.Schultz,I.A.Wilson,Crystalstructuresofapowild-type M. jannaschii tyrosyl-tRNA synthetase (TyrRS) and anengineered TyrRS specific for O-methyl-L-tyrosine.Protein Sci14,1340-1349(2005).

26. J. M. Turner, J. Graziano, G. Spraggon, P. G. Schultz, Structuralplasticityofanaminoacyl-tRNAsynthetaseactivesite.ProcNatlAcadSciUSA103,6483-6488(2006).

27. W.Liu,L.Alfonta,A.V.Mack,P.G.Schultz,Structuralbasis for therecognitionofpara-benzoyl-L-phenylalaninebyevolvedaminoacyl-tRNAsynthetases.AngewChemIntEdEngl46,6073-6075(2007).

28. J. M. Turner, J. Graziano, G. Spraggon, P. G. Schultz, Structuralcharacterization of a p-acetylphenylalanyl aminoacyl-tRNAsynthetase.JAmChemSoc127,14976-14977(2005).

29. D. D. Young et al., An evolved aminoacyl-tRNA synthetase withatypical polysubstrate specificity. Biochemistry 50, 1894-1900(2011).

30. E.M. Tippmann,W. Liu, D. Summerer, A. V.Mack, P. G. Schultz, Agenetically encoded diazirine photocrosslinker in Escherichia coli.Chembiochem8,2210-2214(2007).

31. J.Xie,W.Liu,P.G.Schultz,Ageneticallyencodedbidentate,metal-bindingaminoacid.AngewChemIntEdEngl46,9239-9242(2007).

32. R.B.Cooleyetal.,Structuralbasisofimprovedsecond-generation3-nitro-tyrosinetRNAsynthetases.Biochemistry53,1916-1924(2014).

33. J. Wang et al., A biosynthetic route to photoclick chemistry onproteins.JAmChemSoc132,14812-14818(2010).

34. X. Liu et al., Significant expansion of fluorescent protein sensingabilitythroughthegeneticincorporationofsuperiorphoto-inducedelectron-transfer quenchers. J Am Chem Soc 136, 13094-13097(2014).

35. R.B.Cooley,P.A.Karplus,R.A.Mehl,Gleaningunexpectedfruitsfromhard-won synthetases: probing principles of permissivity in non-canonical amino acid-tRNA synthetases. Chembiochem 15, 1810-1819(2014).

36. X. Luo et al., Genetically encoding phosphotyrosine and itsnonhydrolyzable analog in bacteria. Nat Chem Biol 13, 845-849(2017).

37. S.Kawashima etal.,AAindex:aminoacid indexdatabase,progressreport2008.NucleicAcidsRes36,D202-205(2008).

38. F. Pedregosa et al., Scikit-learn: Machine Learning in Python. 12,2825-2830(2011).

39. A. Stein, T. Kortemme, Improvements to robotics-inspiredconformationalsamplinginrosetta.PLoSOne8,e63090(2013).

40. M.Ali,PyCaret:Anopensource,low-codemachinelearninglibraryinPython.(2020).

41. Z.R.Lietal.,PROFEAT:awebserverforcomputingstructuralandphysicochemicalfeaturesofproteinsandpeptidesfromaminoacidsequence.NucleicAcidsRes34,W32-37(2006).

42. R. F. Alford et al., The Rosetta All-Atom Energy Function forMacromolecular Modeling and Design. J Chem Theory Comput 13,3031-3048(2017).

43. P.Schmidtke,V.LeGuilloux, J.Maupetit,P.Tuffery, fpocket:onlinetools for protein ensemble pocket detection and tracking.NucleicAcidsRes38,W582-589(2010).

44. P. J. Ballester, J. B. Mitchell, A machine learning approach topredicting protein-ligand binding affinity with applications tomoleculardocking.Bioinformatics26,1169-1175(2010).

45. G. Ke et al., in neural information processing systems. (2017), pp.3149-3157.

46. S.Lundbergetal.,FromlocalexplanationstoglobalunderstandingwithexplainableAIfortrees.2,56-67(2020).

47. R.Moretti,B.J.Bender,B.Allison,J.Meiler,RosettaandtheDesignofLigandBindingSites.MethodsMolBiol1414,47-62(2016).

48. M.A.Hallenetal.,OSPREY3.0:Open-sourceproteinredesignforyou,withpowerfulnewfeatures.JComputChem39,2494-2507(2018).

49. J.W.Chin,Expandingandreprogrammingthegeneticcode.Nature550,53-60(2017).

50. S. Genheden, U. Ryde, The MM/PBSA and MM/GBSA methods toestimateligand-bindingaffinities.ExpertOpinDrugDis10,449-461(2015).

51. T. S.Lee,Y.Hu,B. Sherborne,Z.Guo,D.M.York,TowardFastandAccurate Binding Affinity Predictionwith pmemdGTI: An EfficientImplementation of GPU-Accelerated Thermodynamic Integration. JChemTheoryComput13,3077-3084(2017).


https://doi.org/10.1101/2020.06.26.174524


52. Z. Cournia, B. Allen, W. Sherman, Relative Binding Free EnergyCalculations in Drug Discovery: Recent Advances and PracticalConsiderations. Journal of Chemical Information and Modeling 57,2911-2937(2017).

53. W.Ren,T.M.Truong,H.W.Ai,StudyoftheBindingEnergiesbetweenUnnatural Amino Acids and Engineered Orthogonal Tyrosyl-tRNASynthetases.SciRep-Uk5,(2015).

54. V. Opuu, G. Nigro, E. Schmitt, Y.Mechulam, T. Simonson, Adaptivelandscape flattening allows the design of both enzyme:substratebindingandcatalyticpower.771824(2019).

55. T. Baumann et al., Computational Aminoacyl-tRNA SynthetaseLibraryDesignforPhotocagedTyrosine.IntJMolSci20,(2019).

56. D. M. Fowler, S. Fields, Deep mutational scanning: a new style ofproteinscience.NatMethods11,801-807(2014).

57. P. Hosseinzadeh, Preparing Ligands,https://www.rosettacommons.org/demos/latest/tutorials/prepare_ligand/prepare_ligand_tutorial.(2016).

58. https://marvinjs-demo.chemaxon.com/latest/.59. G. Landrum, RDKit: Open-source cheminformatics,

http://www.rdkit.org.(2020).60. F. Richter, Enzyme design application,

https://www.rosettacommons.org/docs/latest/application_documentation/design/enzyme-design.(2010).

61. A. Stein, Next-generation kinematic loop modeling and torsion-restricted sampling,https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/loop_modeling/next-generation-KIC.(2015).

62. P. J. Cock et al., Biopython: freely available Python tools forcomputationalmolecularbiologyandbioinformatics.Bioinformatics25,1422-1423(2009).


https://doi.org/10.1101/2020.06.26.174524


integration of machine learning improves the prediction ... · 26/06/2020 · substrate...

Documents