integration of machine learning improves the prediction ... · 26/06/2020 · substrate...
TRANSCRIPT
Integration of Machine Learning Improves the Prediction Accuracy of Molecular Modelling for M. jannaschii Tyrosyl-tRNA Synthetase
Substrate Specificity BingyaDuan
SchoolofElectronic,ElectricalandCommunicationEngineering
UniversityofChineseAcademyofSciencesBeijingChina
YingfeiSun*SchoolofElectronic,ElectricalandCommunication
EngineeringUniversityofChineseAcademyofSciences
AbstractDesignofenzymebindingpockettoaccommodatesubstrateswith different chemical structure is a great challenge.Traditionally, thousandsevenmillionsofmutantshave tobescreenedinwet-labexperimenttofindaligand-specificmutantand large amount of time and resources is consumed. Toaccelerate the screening process, here we propose a novelworkflowthroughintegrationofmolecularmodelinganddata-drivenmachinelearningmethodtogeneratemutantlibrarieswithhighenrichmentratioforrecognitionofspecificsubstrate.M.jannaschiityrosyl-tRNAsynthetase(Mj.TyrRS)isusedasanexamplesystemtogiveaproofofconceptsincethesequenceandstructureofmanyunnaturalaminoacidspecificMj.TyrRSmutantshavebeenreported.BasedonthecrystalstructuresofdifferentMj.TyrRSmutantsandRosettamodeling result,wefind D158G/P is the critical residue which influences thebackbonedisruptionofhelixwithresidue158-163.Ourresultsshowthatcomparedwithrandommutation,Rosettamodelingandscorefunctioncalculationcanelevatetheenrichmentratioof desired mutants by 2-fold in a test library having 687mutants, while after calibration by machine learning modeltrainedusingknowndataofMj.TyrRSmutantsandligand,theenrichment ratio can be elevated by 11-fold. Thismolecularmodeling and machine learning-integrated workflow isanticipated to significantly benefit to the Mj. tyrRS mutantscreeningandsubstantiallyreducethetimeandcostofweb-labexperiment. Besides, this novel process will have broadapplicationinthefieldofcomputationalproteindesign.
CCSConcepts• Applied computing • Life and medical sciences •Computationalbiology•Molecularstructuralbiology
Keywords
Tyrosyl-tRNA synthetase, Genetic code expansion, Enzymesubstrate speciWicity, Rosetta, Molecular modelling, Machinelearning
1Introduction
Geneticcodeexpansion technology1hasbeenwidelyused inbiologyresearchwhichcanbeappliedonmonitoringproteinconformationalchange2causedbyPTM3asbiophysicalprobe4,improvingenzymeactivity5anddesigningproteinswithnovelcatalytic functions6-8. Through this technology, we canincorporate artiWiciallydesignedunnatural aminoacid (UAA)into almost any speciWic site of target protein. Usually anorthogonal tRNA and amino acyl-tRNA synthetase (aaRS)mutantpair is necessary for recognitionof speciWicUAAandsubsequentlyacylligationtotRNA.Thoughtherearemorethan100UAAsand their correspondingorthogonal aaRSmutantshave been reported, UAAs with novel chemical structure,biophysicalandbiologicalfunctionwillsigniWicantlybeneWittothe biology research andprotein therapeutic9. Pursue of theUAA’scorrespondingaaRSbyresearchershasneverstopped.Traditionally, an aaRS mutant recognizing speciWic UAA isselectedbyseveralroundsofpositiveandnegativescreeningfrom large and diverse aaRS mutant libraries rationallydesignedfromwildtypeaaRS1.Thescreeningprocessisverytedious, manpower-costing and error-prone. As a result, amore rapidandefWicientmethod to identify aaRSmutant forspeciWicUAAisinurgentneed.Fromtheperspectiveofcomputationalchemistry,WindingaaRSmutant for speciWic UAA is a protein design problem. ThesequencespaceofwildtypeaaRSasreceptorisexploredwiththe goal of Winding the mutant having lowest binding freeenergy to the UAA ligand. In recent years, computationalchemistrybased-molecularmodellingmethod,suchasproteinhomology modelling10, protein structure prediction11,molecular docking12 and protein design13 has been rapidly
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
developed.Largenumberofsuccessfulcasesonenzymedesignforsubstrateselectivity14arereported.Recently there have been great advances on artiWicialintelligence(AI)technology,suchasmachinelearninganddeeplearning. AI has been used in computer vision, speechrecognition,machinetranslation,small-moleculedrugdesign15,protein engineering16, antibody structure prediction17 andantibody design18. In general AI can extract representativefeaturesofproteinsequenceandstructurefromlargeamountofchemicalmoleculeandproteindata,learninternalpatternswhich can’t explicitly spotted by human and utilizeexperimentalvalidatedpropertiessuchasbindingafWinitydata,IC50andenzymeactivityaslabelstotrainamodelwhichcanbeusedtopredictwhichsamplehastargetpropertyofinterestfromhugenumberofmoleculesanddiversiWiedproteinmutantlibraries. Forexample, improved score functionofmoleculardocking software19, improved protein design performance20and accurate prediction of thermostability for proteinmutants21havebeenachievedthroughmachinelearninganddeeplearning.Combinedwithcomputationalchemistry-basedmethod,data-drivenAImethodisgivingsatisfactorysolutionandaccuratepredictiononmanybiologicalproblems.In thiswork,we collected all theM. jannaschii tyrosyl-tRNAsynthetase (Mj. TyrRS)mutants reported in the literature tocompareandanalyzethesequenceandstructuralfeatureanddifferencebetweenmutantsandwildtypeMj.TyrRS.Thenweuse RosettaEnzymeDesignmethod22 tomodel thestructureandpredictbindingposeofUAA-Mj.TyrRSmutantpair.Finall,machine learning model is integrated to calibrate the scorefunctionforaaRSsubstrateselectivitypredictionandmutantdesign to achieve better prediction accuracy. We think theimprovedMj.TyrRS-UAAselectivitypredictionmodelwillbeextremely useful for in silico screening of UAA-speciWic aaRSmutants.
2Results
2.1CrystalstructurescomparisonbetweenMj.TyrRSmutantsandWTFirstly,wecollectedtheinformationofallUAAandcorrespondingMj.TyrRSmutantsofwhichhigh-resolutionX-raycrystalstructurehasbeenreleased,asshowninScheme1,
Table1andSupplementaryWileS1.Intotalthereare52UAAsand132TyrRSmutants.TheX-raycrystalstructureof18TyrRSmutantshasbeenreported(Table1).ThediversityofUAAs(Scheme1)islarge.Bothofsmallandlarge,polarandnon-polar,hydrophilicandhydrophobicsubstitutiongroupsofp-hydroxyofTyrareincludedintheseUAAs.IntheWTofMj.TyrRS,hydrogenbondnetworkbetweentyrosinehydroxygroupandpocketresiduesstabilizesthetyrosineligandandlowersthebindingfreeenergy(Figure1).InordertorecognizeotherUAAsinscheme1,mutationswithdifferentaminoacidtypesandphysicochemicalpropertieshavetobeintroducedtoaccommodatedifferentchemicalproperties.
Scheme1.ChemicalstructureoftheUAAsrecognizedbyMj.TyrRSmutantsofwhichX-raycrystalstructurehasbeensolved.
Figure1.CrystalstructureoftheWTMj.TyrRSandtyrosinecomplex.PDBID:1J1U.ThefigureisrenderedbyPymol23.(A)Mj.TyrRSisshownascartoon.Tyrosineligandisshownassticks.Surroundingresiduesareshownaslines.(B)Thebindingpocketofligandtyrosine.TyrosineformshydrogenbondnetworkwithTyr32,Asp158andGln155,whichstabilizesthehydroxygroupoftyrosine.
Table1.UAAindexandname,aaRSmutation,PDBIDandX-raycrystalstructurebackboneRMSDofUAAbindingpocketresidues.Crystalstructuresofmutantswiththesamesequenceisonlyshownonce.
UAA Index
Full name of UAA Abbrevation of UAA
Mj. TyrRS mutation PDB IDa Average backbone RMSD of pocket
residues between mutant and WT
Large mutant backbone fluctuation compared with WT?
Reference
000 L-tyrosine Tyr Wild type 1J1U 0 No 24 001 p-Methoxy-L-
phenylalanine pMeF Y32Q D158A E107T L162P 1U7X 1.02 No 25
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
003 3-(2-Naphthyl)-L-alanine H-2Nal-OH
Y32L D158P I159A L162Q A167V 1ZH0 2.20 Yes 26
004 3-Iodo-L-tyrosine 3IY H70A D158T 2ZP1 0.36 No 005 p-Benzoyl-L-phenylalanine pBpa Y32G E107P D158G I159T 2HGZ 0.38 No 27 009 p-Acetyl-L-phenylalanine pAcF Y32L D158G I159C L162R 1ZH6 1.27 Yes 28 015 p-Cyano-L-phenylalanine pCNF Y32L L65V F108W Q109M D158G
I159A 3QE4 0.68 No 29
018 p-(3-trifluoromethyl-3H-diazirin-3-yl)-phenylalanine
TfmdPhe Y32I H70F E107S Q109M D158P I159L L162E
3D6U 2.15 Yes 30
020 bipyridylalanine BpyAla Y32G L65Y H70A Q155E D158G I159W L162S
2PXH 0.40 No 31
035 3-Nitro-L-tyrosine 3NT Y32H H70C D158S I159A L162R 4NDA 0.39 No 32 042 p-Bromo-L-phenylalanine pBrF Y32L E107S D158P I159L L162E 2AG6 2.12 Yes 26 049 p-(2-tetrazole)-L-
phenylalanine p-Tpa Y32L L65I Q109M D158G L162V
V164G 3N2Y 0.33 No 33
053 3,5-difluoro tyrosine F2Y Y32R L65Y H70G F108N Q109C D158N L162S
4HJX 0.84 No 3
058 3-o-methyl tyrosine OMeTyr Y32E L65S H70G Q109G D158N L162V
4HPW 0.88 No
062 3,5-dichloride tyrosine Cl2Y Y32L L65I H70G F108I Q109L Y114G D158S L162M
4NX2 0.76 No 34
065 4-(2-bromoisobutyramido)-
phenylalanine
BibaF Y32G L65E F108W Q109M D158S L162K
4PBR 0.25 No 35
066 4-trans-cyclooctene-amidopheylalanine
Tco-amF Y32G L65E F108W Q109M D158S L162K
4PBT 0.57 No 35
068 O-phosphotyrosine pTyr Y32S L65A F108K Q109H D158G L162K
5U36 2.33 Yes 36
Forareliablemodelofthemutantstructure,thebackbonefluctuationintroducedbymutationshouldbesmallcomparedwiththatofWT.Ithasbeenreportedthatsomemutationscanchangethebackboneconformation36,sonextwealignedthecrystalstructuresofMj.TyrRSmutantsandWTandthencalculatedtheRMSDofbackboneN,CA,C,OofbindingpocketresiduesbetweenmutantandWTtocomparethestructuraldifference.AsshowninTable1andFigure2A,structureswithPDBID1ZHO,1ZH6,2AG6,3D6U,3D6VhaveaveragebackboneRMSDlargerthan1.2,ofwhichbackboneconformationischangedbymutationcomparedwithWT.Othermutantshavesmallbackbonefluctuationwhichcanbeignoredwhenbuildingthehomologymodel.4ofthe16mutantstructureshavelargebackbonefluctuation(Table1).InFigure2B,backboneRMSDofdifferentbindingpocketresiduesarecompared.Wecanseeresidues158~163havelargerRMSDthanotherresidues,whichareonthealphahelixnearUAAligand.4representativemutantstructuresarechosenandsuperimposedonWTstructure,asshowninFigure3.Inthestructureof1ZH0and2AG6thebackbonechangeofalphahelixislargewhilein2HGZand4PBRthechangeissmall.ThisimpliesitmaybedifficulttoobtainaccurateUAAandpocketresiduebindingposepredictionresultformutantbearinglargebackboneconformationchangeinhomologymodel.
Figure2.CrystalstructurebackboneheavyatomRMSDofUAAbindingpocketresidues(12intotal)betweenmutants(21intotal)andWT.(A)Foreachmutant,backboneRMSDofpocketresiduesiscalculatedandplottedusingboxenplotshowingdifferentquantiles.(B)Foreachpocketresidue,backboneRMSDfromallmutantsandWTiscalculatedandshown.
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
Figure3.Comparisonofthealphahelixcontainingresidues155-166between4representativeMj.TyrRSmutantsandWTX-raycrystalstructures.WTstructureiscoloredinmagenta,lightpink,green,magentafor(A),(B),(C)and(D),respectively.Howdifferentmutationsinducethebackboneconformationchangeremainsunclear.ItwasreportedthatinpTyrMj.TyrRSD158GmutationactedasahelixbreakerandcausedtherearrangeofthehelixandopenedupthepockettoaccommodatethebulkyUAA36.ThoughpCNFMj.TyrRShasthesameD158Gmutation,noobviousbackboneconformationchangeisobserved(Table1).Tofurtherinvestigatethecriticalresiduesaffectsthehelixconformation,wecalculatedAAindex37(531typesofnumericalindicesrepresentingvariousphysicochemicalandbiochemicalpropertiesofaminoacids)ofresidues158~163forthe17Mj.TyrRSinTable1.Totallythereare531*6=3186featuresforeachTyrRSmutant.Weperformedanalysisofvariance(ANOVA)bysklearn38tofindwhichresidueandAAindexpropertywillcontributemosttothediscriminationofTyrRShelixbackbonedisruption.Thetoplargest10f_valuearefromresidue158andhavep_value<0.001.The10AAindextypesarerelatedtointrinsicsecondarystructurepropensitiesoftheaminoacids(Table2),whichimpliesthatresidue158mayinfluencesthesecondarystructure.AAindextypeandvalueversusbackbonedisruptionornotfor17pdbsandresidue158isshowninFigure4.WecanseethatthedistributionoftheAAindexvalueisdifferentforTyrRSwithdifferenthelixbackbone.Eachofthe10AAindexcanbeusedtoseparatetheTyrRSmutantwithhelixbackbonedisruptionfromothers.ThisindicatesthatD158P/Gmutationisahelixbreakerandcandisruptthebackboneconformation,whichshouldbetakenintoconsiderationwhenbuildinghomologymodelfortheTyrRSmutantstoobtainaccuratestructuremodel.Table2.Residuesiteandtop10AAindexcontributesmosttothediscriminationofTyrRShelixbackbonedisruption.
Position AAindex type f_value p_value
158 MUNV940104 27.74547 0.000063
158 ROBB760104 27.262113 0.000069
158 MUNV940105 26.026208 0.000089
158 BLAM930101 25.948507 0.00009
158 BUNA790103 25.905193 0.000091
158 QIAN880134 25.607837 0.000097
158 MUNV940101 24.323819 0.000126
158 ONEK900101 23.726363 0.000144
158 RACS820114 23.355244 0.000156
158 QIAN880112 22.67982 0.000181
Figure4.BoxplotofAAindextypeandvalueversusbackbonedisruptionornotfor17pdbsandresidue158.EachsubplothasadifferentAAindextype.Thereare17datapointsineachplot,whichistheAAindexvalueofresidue158from17pdbs.1isforbackbonedisruptionwhile0isnot.
2.2UAA-Mj.TyrRSmutantcomplexmodellingusingRosettaTofacilitatetheTyrRSwet-labscreeningprocessandreducethetimeandresourcecost,molecularmodellingiscarriedouttopredictwhichTyrRSmutantcanrecognizethetargetUAA.FirstweuseRosettaEnzymeDesign22tomodelthebackbonechangeandAAsidechainpackingtoaccommodatespecificUAAligandforeachofthe6864UAA-mutantpaircomplex(52UAAstimes132TyrRSmutants).ThecrystalstructureofWTMj.TyrRS(PDBID1J1U)isusedasinputtemplate.RepresentativemodelingresultsareshowninFigure5.FormutantswithPDBID2HGZand4PBR,noobviousbackbonedisruptionisobservedincrystalstructure(Table1).ThepredictedbindingposeinFigure5A2HGZisaccurate,whileinFigure5B4PBRthepredictedorientationofUAAliganddeviatesfromthetruepositionincrystalstructureandleadstoinaccuratesidechainpackingofresidueLys162andSer158,whichmaybecausedbywrongselectionofBibaFconformation.FormutantswithPDBID1ZH0and2AG6,largebackbonedisruptionisobservedincrystalstructure(Table1).ThoughtheUAAligandpositionandconformation(Figure5Cand5D)isaccuratelymodeled,thesidechainofAAsurroundingthebindingpocketdeviatesalotfromthecrystalstructuremainlyduetoinaccuratemodelingofthealphahelixwithresidues158-163.Theresultsindicatethatformutantswithlittledisruptiononhelix158-163,Rosettamodelisaccurateenoughtopredictthebindingposebutformutantswithlargedisruption,theoppositeistrue.
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
Figure5.Comparisonofthebindingposeandbindingpocketresiduesof4Mj.aaRSmutantsfromX-raycrystalstructureandRosettamodel.CorrespondingUAAandbindingsiteresiduesareshownassticks.(A)PDBID:2HGZ.Crystal:green.Model:magenta.(B)PDBOD:4PBR.Crystal:orange.Model:lightpink.(C)PDBID:1ZH0.Crystal:white.Model:green.(D)PDBID:2AG6.Crystal:yellow.Model:magenta.Totestwhetherthecorrectbackboneconformationofresidues158-163helixhavingD158G/Pmutationcanbepredicted,thissegmentisdenovoremodeledusingKICloopmodelingmethod39inRosetta.1000modelsaregeneratedforeachofthe4mutantswithPDBID1ZH0,1ZH6,2AG6and3D6Uusinghomologymodelinpreviousstepasinput.ThebackboneRMSDtocrystalstructurearecalculatedandplottedagainsttotalenergy(Figure6).Scatterplotinfunnelshapecanbeobservedforall4mutants,whichindicatestheamountofbackboneconformationsamplingisenoughtofindlocalminimumenergypoint.Forthelowestenergystructure,theRMSDtocrystalstructureis3.65,2.41,3.74and4.39Åinthe4mutants.ThelowestRMSDtocrystalstructurein1000modelsofthe4mutantsis0.94,0.47,0.49and0.48,respectively.Thermsd_to_crystalforthetop10modelsrankedbytotal_energy_scoreandtotal_scorerankforthetop10modelsrankedbyrmsd_to_crystalon4TyrRSmutantsareshowninFigure7Aand7B,respectively.Tofurtheranalyzethemodeledstructure,crystalstructure,inputRosettamodel,modelwithlowestRMSDtocrystal,modelwithlowestenergyaresuperimposed.Wecanseethatmodelfor1ZH6hasthemostidealfunnelplot(Figure6B)andmodelwiththelowestenergyhasleastdeviationfromcrystalstructure(Figure7D),though2.41ÅRMSDmaystillnotbeaccurateenoughforUAA-TyrRSbindingaffinityprediction.Fortheremaining3
mutants,modelstructureclosetothenativecrystalstructurehasbeengeneratedbutwecan’tpickitupduetotheerrorofRosettaenergyfunction(Figure7).
Figure6.Scatterplotforrmsd_to_crystalvstotal_energy_scorein1000modelsgeneratedbyRosettadenovoloopmodellingforTyrRSmutantswithPDBID1ZH0,1ZH6,2AG6and3D6U.
Figure7.StructureanalysisofRosettaloopdenovomodellingresultsonTyrRSmutantswithPDBID1ZH0,1ZH6,2AG6and3D6U,respectively.(A)Boxplotofrmsd_to_crystalforthetop10modelsrankedbytotal_energy_scoreon4TyrRSmutants.(B)Boxplotoftotal_scorerankforthetop10modelsrankedbyrmsd_to_crystalon4TyrRSmutants.(C,D,E,F)Structuresuperimpositionofcrystalstructure(ingreen),originalRosettamodelusingWTcrystalstructureastemplate(inblue),remodeledmodelwiththelowestRMSDtocrystalstructurewithinthe1000models(inmagenta)andremodeledmodelwiththelowestenergywithinthe1000models(inyellow)on4TyrRSmutants.
2.3MachinelearningmodelandperformanceanalysisInthepreviouspart,wehavemodeled6864UAA-TyrRSmutantpairsusingRosettaandfoundthattheenergyfunctionisnotaccurateenoughtodiscriminatethetrueUAAbinder
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
fromthefalse.Tofurtherimprovetheenergyfunction,weusemachinelearningtotrainamodeltolearnimportantenergytermandprotein-ligandinteractionsthatcontributemosttothebindingenergy.TheflowchartofthemachinelearningmodelandapplicationisshowninScheme2.Nexteachpartoftheflowchartwillbedescribed.
Scheme2.Flowchartofthedatapreparation,MLmodeltraining,predictionandapplicationworkflow.Figurelegendisonthetop.2.3.1Datapreparation
Wecollectedthechemicalstructureof52UAAs,sequenceandX-raycrystalstructureof132TyrRSmutantsfrompreviousliterature.EachUAAispairedtoeachTyrRSmutanttogeneratedatasetforMLmodeltraining.Thereare6864UAA-TyrRSmutantpairsandcomplexmodelisbuiltintheprevioussection.ForthespecificUAA-mutantpair,iftheUAAcanberecognizedbythemutant,thispairislabeledaspositiveandalltheremainingpairsaretakenasnegativesampleundertheassumptionthatTyrRSmutanthashighspecificityandorthogonalityfordifferentUAAsubstrate.Inthefinaldataset,wehave132positivesamplesand6732negativesamples.Thepositivetonegativeratiois1:51.
2.3.2Featureextraction
TheUAA-TyrRSmutantpaircanbedescribedfrom3aspects:UAAligand,mutantproteinandUAA-mutantcomplex.ThepresentationandfeaturevectorizationmethodarelistedinTable3.ThefeatureisextractedfromthechemicalstructureofUAA,sequenceofTyrRSmutantand3DstructureofUAA-mutantcomplexmodelbuiltusingRosetta.Thedimensionofallfeaturesis2644.
2.3.3MLModeltraining
10%ofdataisusedastestset(687samples).Theremaining90%data(6177samples)isusedforMLmodeltraining,whichisrandomlysplittedintotrainingsetandvalidationsetwitharatioof4:1for5-foldcrossvalidation(CV).Featureengineering,modelselectionandhyperparametertuningarecarriedoutthroughpycaret40,alow-codemachinelearninglibrary.15machinelearningalgorithmsareusedtotrainthemodel:LightGradientBoostingMachine(lightGBM),CatBoostClassifier,ExtraTreesClassifier,ExtremeGradientBoosting,LogisticRegression,RidgeClassifier,RandomForestClassifier,QuadraticDiscriminantAnalysis,KNeighborsClassifier,AdaBoostClassifier,GradientBoostingClassifier,LinearDiscriminantAnalysis,DecisionTreeClassifier,SVM-LinearKernel,NaiveBayesClassifier.Differentmetricsareusedtoevaluatethebinaryclassificationmodelperformance:accuracy,AUC,recall,precisionandF1.
2.3.4MLModelevaluationandexplanation
Differentfeaturecombinationsareexploredinthemodeltrainingprocess.Thefeature_type_index,valuesofbestmetricsandcorrespondingmodelnamehavingthebestperformancearelistedinTable4.Fordifferentmetrics,thealgorithmhavingbestperformanceisdifferent.Ingeneral,theperformanceofmodelusingallfeaturesisbetterthanusingsingletypeoffeature(Figure8).WhenusingtheUAAdpocketdescriptoronly,QuadraticDiscriminantAnalysisandExtraTreesClassifiermodelshavebestrecallandprecision,respectively,whiletheirF1isnotthehighest.LightGBMmodelhasthebestaccuracy,AUCandprecisioninsomefeaturecombinations.
Table3.FeaturerepresentationmethodforUAAsmallmoleculeligandandTyrRSproteinreceptorusedinthisstudy
Method Description Dimension ReferencePROFEAT-ligand Smallmolecule1Dand2Ddescriptors. 406 41PROFEAT-receptor Proteinsequencedescriptorsusingaminoacidbiophysicalpropertiesand
compositions.1437 41
Rosettaenergyscoreanddecomposition
Rosettainterface_Etotalscoreandenergyterm(fa_atr,fa_rep,fa_sol,fa_elec,fa_pair,hbond_sr_bb,hbond_lr_bb,hbond_bb_sc,hbond_sc)decomposedintoeachbindingpocketresidue.
6+540 22,42
dpocket Dpocket(describingpocket)extractsseveraldescriptorsusingatom,aminoacid,alphasphereandvolumeinformationfromtheligandbindingpocket.
35 43
rfscore Amachinelearning-basedscorefunctionforprotein-ligandcomplex. 216 44
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
Table4.Comparisonof5-foldCVperformanceondifferentMLmodelsanddifferentperformancemetricsforUAAspecificityprediction
Features and combination
Feature_type_index
Model_with_best_accuracy
Best_accuracy
Model_with_best_AUC
Best_AUC
Model_with_best_recall
Best_recall
Model_with_best_precision
Best_precision
Model_with_best_F1
Best_F1
Rosetta_6_score_terms 1 SVM - Linear Kerne
0.979 CatBoost Classifier
0.7369 Decision Tree Classifier
0.0615 Naive Bayes 0.176 Quadratic Discriminant
Analysis
0.0777
Rosetta_energy_decomposed
2 Extra Trees Classifier
0.9796 Light Gradient Boosting Machine
0.7874 Naive Bayes 0.8077 Random Forest Classifier
0.4000 Decision Tree Classifier
0.1465
UAA_pocket_descripotr 3 Extra Trees Classifier
0.9799 CatBoost Classifier
0.8042 Quadratic Discriminant
Analysis
1.0000 Extra Trees Classifier
0.7167 Extra Trees Classifier
0.1612
Rosetta_6_score_terms+energy_decomposed
4 Light Gradient Boosting Machine
0.9801 CatBoost Classifier
0.8193 Linear Discriminant
Analysis
0.1846 K Neighbors Classifier
0.6500 Linear Discriminant
Analysis
0.1836
All_features_above 5 Light Gradient Boosting Machine
0.9794 CatBoost Classifier
0.8222 Naive Bayes 0.4769 Light Gradient Boosting Machine
0.4000 Linear Discriminant
Analysis
0.1654
Figure8.Barplotofthebestmodelperformanceondifferentfeaturecombinationsanddifferentmetrics.Feature_type_indexisthesameasthatinTable4.WechooselightGBM45modelforfurtheranalysissinceit’satree-basedensemblemodelwhichcanavoidoverfittingandhasbeenwidelyusedinothermachinelearningmodelapplications.Featureimportanceofthemodelcanbeeasilyexplained.TheperformanceoflightGBMmodelon5-foldCVandtestsetiscompared(Table5).Wecanseethemetricsarebetterontestsetthanthaton5-foldCV.
Table5.PerformanceoflightGBMmodelon5-foldcrossvalidationsetandtestsetusingallfeatures(feature_type_index5).lightGBM Accuracy AUC Recall Precision F1
5-foldCV 0.9794 0.7593 0.0385 0.35 0.0686
testset 0.9796 0.8549 0.0667 1 0.125
SHAPmethod46isusedtocalculatethefeatureimportanceanditscontributiontothebinaryclasslabel(Figure9).Residue_70_fa_solandresidue_34_fa_solareconsideredasthemostimportantfeaturesbythelightGBMmodelwhichindicatesthatthesolvationenergyofresidue34and70maybeimportant.Rosetta_interface_Eandresidue_158_fa_elecare
alsoimportantwhichisinaccordancewithourdomainknowledgethatscorefunctionisusefultodiscriminatethepositivesampletosomeextent.
Figure9.SHAPvaluesoftheimportantfeaturesforthelightGBMmodel.TheaverageSHAPvalueofspecificfeatureinallsamplesareshown.Theimpactonnegativelabelisinblueandtheimpactonpositivelabelisinred.TocomparethepredictionaccuracybetweenlightGBMmodelandRosettascore,ROCandPRcurveontestset(Figure10Aand10B)areplottedforboth2methods.BetterROCandPRcurveareobservedforlightGBMMLmodel.AUCoflightGBMmodelis0.84,higherthan0.77ofRosettascore.Asasimulationtestforrealwetlabexperiment,wecalculatewhenthenumberofmutantskforweb-labexpisgivenandfixed,howmanytruepositivesamplesexistinthekmutants(Figure10Cand10D).Forexample,ifwewanttotest50mutantsinwetlabexperimentoutof687mutants,therewillbe1,2and11truepositivemutantsforrandomsample,RosettascorepredictionandlightGBMmodelprediction,respectively.Thesuccessrateis2%,4%and22%,respectively,whichmeansRosettascorepredictionhas2-foldelevationonenrichmentratiooftruepositivemutants,whilelightGBMmodelhas11-
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
foldelevation.TheTyrRSmutantscreeningwillsignificantlybenefitfromtheimprovementofpredictionaccuracyusingmachinelearningmodel.
Figure10.(A)Receiveroperatingcharacteristiccurve,(B)Precision-recallcurveand(C,D)Numberoftruepositivesamplesvsnumberofmutantsselectedforwet-labexperimentoflightGBMmodelandRosettainterface_Escoreonlypredictionontestset.Thescaleofxaxisislimitedto0-100infigureD.
3Discussion
3.1SigniJicanceoftheworkforgeneticcodeexpansionandcomputationalproteindesignIntheWieldofcomputationalproteindesign,itisagreatchallengetointroducemutationsintoproteinstochangethesubstratespeciWicityfordifferentligandsinaparticularligand-receptorcomplexsystem.Therehavebeenmanyreports,suchasusingRosetta47andOSPREY48todesignproteinstoswitchthesubstrateandaccommodatespeciWicligands.Asamodelsystem,Mj.TyrRShasbemutatedanddesignedtobindUAAwithdifferentchemicalstructuresasreportedbefore49.Thissystemcanbeusedasabenchmarkforsubstrate-speciWicproteindesign,asalargenumberofmutantshavebeenreported.ComprehensivestudyofthissystemwillbeofgreatsigniWicancetotheWieldofproteindesign.ThisworkfocusesonUAA-Mj.TyrRScomplexsystemwhichhasbeenwidelyusedingeneticcodeexpansionresearchWield.Throughtheeffortsofmanyresearchgroupsinthepastover15years,alargenumberofUAAanditscorrespondingTyrRSmutantshasbeenidentiWiedafterspendingalotofhumanandmaterialresources,butsofar,thereisnosystematicstudytointegratethesedataforthedevelopmentandtestofUAAvirtualscreeningmethod.Thispapersummarizesallthe
reportedUAAandMj.TyrRSmutants,systematicallyanalyzesthem,andattemptstoestablisharelativelyaccuratemodeltopredicttheUAA-speciWicrecognitionofdifferentmutants,establishingthefoundationforlarge-scale,high-throughputvirtualscreening.Theexistingmethodsforpredictingproteinligandinteractionareprobablynotsuitableforthissystem.Thereasonisthattheadaptabilityandaccuracyofscorefunctionsfordifferentligand-receptorsystemsaredifferent.Forexample,theimportanceofhydrogenbondinonesystemmaybehigherthananother.MethoddevelopedforafWinitypredictionofdrug-targetproteininteractioncannotbeappliedontheMj.TyrRSsystemwithoutmodiWication.Inthispaper,wecalibratetheRosettascorefunctionforthisspeciWicsystemtogetbetterpredictionperformance.BesidesRosettamolecularmodeling,inthefutureothermethodssuchasmoleculardocking,MM-PBSA50,TI51,andFEP52canalsobeintegratedwithMLinthesimilarwaytogivemoreaccuratepredictionresults.
3.2PreviousworkforTyrRSmutantsubstratespeciJicitypredictionandcomputationallibrarydesignSeveralmethodsaimingtoacceleratethescreeningofaaRSmutantsordesignmorefocusinglibraryusingcomputationalchemistrymethodsuchasmolecularmodelingandMM-PBSA53-55havebeenreported.TheseresultsshowthatmolecularmodelingcanbeusedtopredictTyrRSsubstratespeciWicity,buttheamountofUAAandTyrRSmutantsusedinpreviousstudiesistoosmalltogiveconvincingconclusion.Inthiswork,weconsideredallUAAandTyrRSmutantsreportedbefore.WiththousandsofUAA-mutantpairsastrainingandvalidationdata,amoresolidconclusioncanbemadethatmolecularmodelingintegratedwithMLcanbeusefulinthevirtualscreeningofTyrRSmutant.
3.3DiscussiononthepredictionresultThoughthebackbonedisruptionofhelix158-163can’tbepreciselymodeled,wethinkitsinWluenceonthepredictionaccuracymaynotbelargebecausethereareonly58D158Gmutantsand13D158Pmutantswithafractionof42%intotal171mutants(SupplementaryWileS1).Besides,descriptorsextractedfromproteinsequenceisnotaffectedbythebackbonedisruptionofstructure.KnowledgeandinformationlearnedfrommutantsequencebyMLmodelcouldcorrecttheerrorinthehomologymodelandincreasethepredictionaccuracy.
3.4LimitationoftheworkLimitationofthecurrentmolecularmodelingmethodFirstly,forthehomologymodelingofMj.TyrRSmutants,thepositionofaminoacidsidechainandrotamerinthebindingpocketcanberecoveredwell,butforthosewithlarge
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
backbonechanges,itishardtomodelthebackboneconformationalignedwelltocrystalstructure.ThoughthecloseconformationcanbesampledbyRosettaloopremodeling,thescorefunctionisnotpreciseenoughtopickitout.TheinaccuracyofstructurepredictiondecreasesthepredictionaccuracyofUAA-speciWicmutant.Secondly,therearewatermoleculesintheUAAbindingpocketofMj.TyrRS.TheinWluenceofwaterisnotconsideredbycurrentmolecularmodelingmethod.LimitationofthemachinelearningmethodThegeneralizationabilityofMLmodelmaybelowbecausethenumberofreportedUAAandmutantpairissmallcomparedtothehugechemicalspaceofUAAandsequencespaceofMj.TyrRSmutants.CurrentlyitisdifWiculttocoverthediversiWiedchemicalstructureandproteinsequence.Oneofthesolutionsisdeepmutationalscanning56,whichusesnextgenerationsequencingtogetphenotypeofover1millionmutantsinasingleexperimentandgeneratesenoughdataformachinelearninganddeeplearningmodeltrainingandprediction.Stillwet-labexperimentsareneededtovalidatethemodelforpracticalusage.
4ConclusionTogetfurtherknowledgeoftheMj.TyrRSstructureandacceleratethetime-costlyscreeningprocessofmutantforspeciWicUAA,wecollectedalltheUAAsandMj.TyrRSreportedbefore,analyzedthestructureandsequencedifferencebetweenthemutantsandfoundthatsomemutantshavealternativebackboneconformationonalphahelixresidue158-163withD158G/Pmutation,whichmakesaccuratemutantmodellingmoredifWicult.RosettamodelingandmachinelearningareintegratedtogivemoreaccuratepredictionresultsformutantselectivitytowardsdifferentUAAs.Differentfeaturecombinationsandmachinelearningalgorithmsaretestedforhighermodelperformance.Lightgbmmodelischosenandthefeatureimportanceandcontributiontothebinaryclasslabeliscalculatedtoexplaintheknowledgelearnedbythemodel.AfterthecalibrationofRosettascorefunctionusinglightGBMmodel,theenrichmentratiooftargetmutantiselevatedby11-foldcomparedwithrandommutation.Wet-labexperimentisinprogresstovalidatethemodel.Weanticipatethatthisproof-of-conceptworkWlowwillbeofgreathelpinthescreeningofMj.TyrRSandproteindesignWield.
5 Methods
5.1MolecularmodellingusingRosettaPreparationoftheUAAligandPreparationoftheUAAligandwasperformedasdescribedintheRosettatutorialforligandpreparation57.BrieWlyspeaking,
thechemicalstructureofUAAisdrawnusingMarvinJS58andconvertedtosmilesformat.CheminformaticstoolRDKit59isusedtogeneratelow-energyconformationsandperformenergyminimizationbasedonthesmilesofUAA.TheligandparameterWileusedinRosettaismadebymolWile_to_params.py57.UAA-Mj.TyrRSmutantcomplexmodelbuildingTheUAA-Mj.TyrRSmutantcomplexmodelisbuiltusingRosettaEnzymedesign22applicationwithdefaultparametersasdescribedinRosettatutorial60.BrieWlyspeaking,theaminoacidmutationinthemutantsiswritteninresWileandpassedintotheapplicationalongwiththeUAAligandparameterWile.CatalyticresiduesoftheaaRSformattinghydrogenbondwithUAAisWixedusingconstraintWile.Foreachcomplex,nstruct=10isusedandthestructurewiththelowesttotalscoreiskeptforfurtheranalysis.DenovoloopmodelingusingKICmethodTheresidues158-163segmentofMj.TyrRSisdenovomodeledusingloopmodelingKICmethodasdescribedinRosettatutorial61withdefaultparameters.1000modelsaregeneratedforeachinputstructure.TheRMSDbetweenmodeledstructureandcrystalstructureiscalculatedbyin-housescriptwrittenusingbiopython62.
5.2 MachinelearningDescriptorextractionPROFEAT,dpocketandrfscoredescriptorsofUAAligand,Mj.TyrRSmutantproteinandUAA-Mj.TyrRSmutantcomplexiscalculatedonhttp://www.descriptordb.com/withdefaultparameters.FeatureengineeringFeatureengineering,modelselectionandhyperparameteroptimizationareperformedbyPyCaret40.90%ofthefulldataisusedin5-foldCVandtherest10%ofdataisusedastestset.Thefeaturespaceistransformedusing‘zscore’method.‘ignore_low_variance’optionissettoTrueandallcategoricalfeatureswithstatisticallyinsignificantvariancesareremovedfromthedataset.Featureselectionisusedand‘feature_selection_threshold’issetto0.8.
Dataavailability
Alldatasetsandparametersgeneratedoranalyzedduringthestudyareavailablefromthecorrespondingauthoronreasonablerequest.
FundingThis work was supported by National Natural ScienceFoundationofChina[61431017].
Authorcontributions
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
YingfeiSunconceivedthestudyandeditedthepaper.BingyaDuan conducted the experiments and analyzed the results.Bingya Duan and Yingfei Sun co-wrote the manuscript. Allauthorsreviewedthemanuscript.
CorrespondingauthorCorrespondencetoYingfeiSun.
CompetinginterestsTheauthorsdeclarenocompetinginterests.
References1. J.W.Chin,Expandingandreprogrammingthegeneticcodeofcells
andanimals.AnnuRevBiochem83,379-408(2014).2. F. Yang et al., Phospho-selective mechanisms of arrestin
conformations and functions revealed by unnatural amino acidincorporationand(19)F-NMR.NatCommun6,8202(2015).
3. F. Li et al., A genetically encoded 19F NMR probe for tyrosinephosphorylation.AngewChemIntEdEngl52,3958-3962(2013).
4. K. Yokoyama, U. Uhlin, J. Stubbe, Site-Specific Incorporation of 3-Nitrotyrosine as a Probe of pK(a) Perturbation of Redox-ActiveTyrosines inRibonucleotideReductase. JAmChemSoc132, 8385-8397(2010).
5. I.N.Ugwumbaetal., ImprovingaNaturalEnzymeActivitythroughIncorporationofUnnaturalAminoAcids. JAmChemSoc133,326-333(2011).
6. I. Drienovska, G. Roelfes, Expanding the enzyme universe withgenetically encoded unnatural amino acids. Nat Catal 3, 193-202(2020).
7. X. H. Liu et al., A genetically encoded photosensitizer proteinfacilitates the rational design of a miniature photocatalytic CO2-reducingenzyme.NatChem10,1201-1206(2018).
8. I.Drienovskaetal.,Designofanenantioselectiveartificialmetallo-hydrataseenzymecontaininganunnaturalmetal-bindingaminoacid.ChemSci8,7228-7235(2017).
9. Q.Lietal.,DevelopingCovalentProteinDrugsviaProximity-EnabledReactiveTherapeutics.Cell,(2020).
10. V.K.Vyas,R.D.Ukawala,M.Ghate,C.Chintha,HomologyModelingaFastToolforDrugDiscovery:CurrentPerspectives.IndianJPharmSci74,1-17(2012).
11. A. W. Senior et al., Improved protein structure prediction usingpotentialsfromdeeplearning.Nature577,706-+(2020).
12. X. Y. Meng, H. X. Zhang, M. Mezei, M. Cui, Molecular Docking: APowerful Approach for Structure-Based Drug Discovery. CurrComput-AidDrug7,146-157(2011).
13. B.Kuhlman,P.Bradley,Advancesinproteinstructurepredictionanddesign.NatRevMolCellBio20,681-697(2019).
14. R. F. Li et al., Computational redesign of enzymes for regio- andenantioselectivehydroamination.NatChemBiol14,664-+(2018).
15. A.Zhavoronkovetal.,DeeplearningenablesrapididentificationofpotentDDR1kinaseinhibitors.NatBiotechnol37,1038-+(2019).
16. Z.Wu,S.B.J.Kan,R.D.Lewis,B.J.Wittmann,F.H.Arnold,Machinelearning-assisted directed protein evolution with combinatoriallibraries(vol116,pg8852,2019).PNatlAcadSciUSA117,788-789(2020).
17. J.A.Ruffolo,C.Guerra,S.P.Mahajan,J.Sulam,J.J.J.b.Gray,GeometricPotentialsfromDeepLearningImprovePredictionofCDRH3LoopStructures.(2020).
18. J.Gravesetal.,AReviewofDeepLearningMethodsforAntibodies.Antibodies(Basel)9,(2020).
19. C.Wang,Y.Zhang, Improvingscoring-docking-screeningpowersofprotein-ligandscoringfunctionsusingrandomforest.JComputChem38,169-177(2017).
20. J.X.Wang,H.L.Cao, J.Z.H.Zhang,Y.F.Qi,ComputationalProteinDesignwithDeepLearningNeuralNetworks.SciRep-Uk8,(2018).
21. H. Cao, J.Wang, L. He, Y. Qi, J. Z. Zhang, DeepDDG: Predicting theStabilityChangeofProteinPointMutationsUsingNeuralNetworks.JChemInfModel59,1508-1514(2019).
22. F. Richter, A. Leaver-Fay, S. D. Khare, S. Bjelic, D. Baker, De novoenzymedesignusingRosetta3.PLoSOne6,e19230(2011).
23. Schrodinger,LLC.(2015).
24. T.Kobayashietal.,StructuralbasisfororthogonaltRNAspecificitiesof tyrosyl-tRNAsynthetases forgenetic codeexpansion.NatStructBiol10,425-432(2003).
25. Y.Zhang,L.Wang,P.G.Schultz,I.A.Wilson,Crystalstructuresofapowild-type M. jannaschii tyrosyl-tRNA synthetase (TyrRS) and anengineered TyrRS specific for O-methyl-L-tyrosine.Protein Sci14,1340-1349(2005).
26. J. M. Turner, J. Graziano, G. Spraggon, P. G. Schultz, Structuralplasticityofanaminoacyl-tRNAsynthetaseactivesite.ProcNatlAcadSciUSA103,6483-6488(2006).
27. W.Liu,L.Alfonta,A.V.Mack,P.G.Schultz,Structuralbasis for therecognitionofpara-benzoyl-L-phenylalaninebyevolvedaminoacyl-tRNAsynthetases.AngewChemIntEdEngl46,6073-6075(2007).
28. J. M. Turner, J. Graziano, G. Spraggon, P. G. Schultz, Structuralcharacterization of a p-acetylphenylalanyl aminoacyl-tRNAsynthetase.JAmChemSoc127,14976-14977(2005).
29. D. D. Young et al., An evolved aminoacyl-tRNA synthetase withatypical polysubstrate specificity. Biochemistry 50, 1894-1900(2011).
30. E.M. Tippmann,W. Liu, D. Summerer, A. V.Mack, P. G. Schultz, Agenetically encoded diazirine photocrosslinker in Escherichia coli.Chembiochem8,2210-2214(2007).
31. J.Xie,W.Liu,P.G.Schultz,Ageneticallyencodedbidentate,metal-bindingaminoacid.AngewChemIntEdEngl46,9239-9242(2007).
32. R.B.Cooleyetal.,Structuralbasisofimprovedsecond-generation3-nitro-tyrosinetRNAsynthetases.Biochemistry53,1916-1924(2014).
33. J. Wang et al., A biosynthetic route to photoclick chemistry onproteins.JAmChemSoc132,14812-14818(2010).
34. X. Liu et al., Significant expansion of fluorescent protein sensingabilitythroughthegeneticincorporationofsuperiorphoto-inducedelectron-transfer quenchers. J Am Chem Soc 136, 13094-13097(2014).
35. R.B.Cooley,P.A.Karplus,R.A.Mehl,Gleaningunexpectedfruitsfromhard-won synthetases: probing principles of permissivity in non-canonical amino acid-tRNA synthetases. Chembiochem 15, 1810-1819(2014).
36. X. Luo et al., Genetically encoding phosphotyrosine and itsnonhydrolyzable analog in bacteria. Nat Chem Biol 13, 845-849(2017).
37. S.Kawashima etal.,AAindex:aminoacid indexdatabase,progressreport2008.NucleicAcidsRes36,D202-205(2008).
38. F. Pedregosa et al., Scikit-learn: Machine Learning in Python. 12,2825-2830(2011).
39. A. Stein, T. Kortemme, Improvements to robotics-inspiredconformationalsamplinginrosetta.PLoSOne8,e63090(2013).
40. M.Ali,PyCaret:Anopensource,low-codemachinelearninglibraryinPython.(2020).
41. Z.R.Lietal.,PROFEAT:awebserverforcomputingstructuralandphysicochemicalfeaturesofproteinsandpeptidesfromaminoacidsequence.NucleicAcidsRes34,W32-37(2006).
42. R. F. Alford et al., The Rosetta All-Atom Energy Function forMacromolecular Modeling and Design. J Chem Theory Comput 13,3031-3048(2017).
43. P.Schmidtke,V.LeGuilloux, J.Maupetit,P.Tuffery, fpocket:onlinetools for protein ensemble pocket detection and tracking.NucleicAcidsRes38,W582-589(2010).
44. P. J. Ballester, J. B. Mitchell, A machine learning approach topredicting protein-ligand binding affinity with applications tomoleculardocking.Bioinformatics26,1169-1175(2010).
45. G. Ke et al., in neural information processing systems. (2017), pp.3149-3157.
46. S.Lundbergetal.,FromlocalexplanationstoglobalunderstandingwithexplainableAIfortrees.2,56-67(2020).
47. R.Moretti,B.J.Bender,B.Allison,J.Meiler,RosettaandtheDesignofLigandBindingSites.MethodsMolBiol1414,47-62(2016).
48. M.A.Hallenetal.,OSPREY3.0:Open-sourceproteinredesignforyou,withpowerfulnewfeatures.JComputChem39,2494-2507(2018).
49. J.W.Chin,Expandingandreprogrammingthegeneticcode.Nature550,53-60(2017).
50. S. Genheden, U. Ryde, The MM/PBSA and MM/GBSA methods toestimateligand-bindingaffinities.ExpertOpinDrugDis10,449-461(2015).
51. T. S.Lee,Y.Hu,B. Sherborne,Z.Guo,D.M.York,TowardFastandAccurate Binding Affinity Predictionwith pmemdGTI: An EfficientImplementation of GPU-Accelerated Thermodynamic Integration. JChemTheoryComput13,3077-3084(2017).
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint
52. Z. Cournia, B. Allen, W. Sherman, Relative Binding Free EnergyCalculations in Drug Discovery: Recent Advances and PracticalConsiderations. Journal of Chemical Information and Modeling 57,2911-2937(2017).
53. W.Ren,T.M.Truong,H.W.Ai,StudyoftheBindingEnergiesbetweenUnnatural Amino Acids and Engineered Orthogonal Tyrosyl-tRNASynthetases.SciRep-Uk5,(2015).
54. V. Opuu, G. Nigro, E. Schmitt, Y.Mechulam, T. Simonson, Adaptivelandscape flattening allows the design of both enzyme:substratebindingandcatalyticpower.771824(2019).
55. T. Baumann et al., Computational Aminoacyl-tRNA SynthetaseLibraryDesignforPhotocagedTyrosine.IntJMolSci20,(2019).
56. D. M. Fowler, S. Fields, Deep mutational scanning: a new style ofproteinscience.NatMethods11,801-807(2014).
57. P. Hosseinzadeh, Preparing Ligands,https://www.rosettacommons.org/demos/latest/tutorials/prepare_ligand/prepare_ligand_tutorial.(2016).
58. https://marvinjs-demo.chemaxon.com/latest/.59. G. Landrum, RDKit: Open-source cheminformatics,
http://www.rdkit.org.(2020).60. F. Richter, Enzyme design application,
https://www.rosettacommons.org/docs/latest/application_documentation/design/enzyme-design.(2010).
61. A. Stein, Next-generation kinematic loop modeling and torsion-restricted sampling,https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/loop_modeling/next-generation-KIC.(2015).
62. P. J. Cock et al., Biopython: freely available Python tools forcomputationalmolecularbiologyandbioinformatics.Bioinformatics25,1422-1423(2009).
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted June 28, 2020. . https://doi.org/10.1101/2020.06.26.174524doi: bioRxiv preprint