computational-statistical tradeoffs in robust estimation · 2020-01-03 ·...
TRANSCRIPT
Computational-StatisticalTradeoffsin
RobustEstimation
IliasDiakonikolas(USC)
(basedonjointworkwithD.KaneandA.Stewart)
Canwedeveloplearning/estimationalgorithmsthatarerobust toaconstantfractionofcorruptions inthedata?
ROBUSTHIGH-DIMENSIONALESTIMATION
ContaminationModel:Letbeafamilyofhigh-dimensionaldistributions.Wesaythatadistributionis- corruptedwithrespecttoifthereexistssuchthat
F
F 2 FF✏
• Input:samplegeneratedbymodelwithunknown• Goal:estimateparameterssothat
THEUNSUPERVISEDLEARNINGPROBLEM
Question1:Isthereanefficient learningalgorithm?
Unknown θ* samples ✓
✓⇤
✓ ✓ ⇡ ✓⇤
Main performance criteria:• Sample size• Running time• Robustness
Question2:Aretheretradeoffs betweenthesecriteria?
ROBUSTLYLEARNINGAGAUSSIAN – PRIORWORK
BasicProblem:Givenan- corruptedversionofanunknownd-dimensionalunknownmeanGaussian
efficiently computeahypothesisdistributionsuchthat
✏
• Extensivelystudiedinrobuststatisticssincethe1960’s.Tillrecently,knownefficientestimatorsgeterror
• RecentAlgorithmicProgress:-- [Lai-Rao-Vempala’16]
-- [D-Kamath-Kane-Li-Moitra-Stewart’16]
ROBUSTLYLEARNINGAGAUSSIAN
BasicProblem:Givenan- corruptedversionofanunknownd-dimensionalunknownmeanGaussian
efficiently computeahypothesisdistributionsuchthat
✏
erroristheinformation-theoreticallybestpossible.
ROBUSTLEARNING– OPENQUESTION
SummaryofPriorWork: Thereisatimealgorithmforrobustlylearningwithinerror
OpenQuestion:Isthereatimealgorithmforrobustlylearningwithinerror?Howabout?
PartI:Introduction
� UnsupervisedLearninginHighDimension� StatisticalQuery(SQ)LearningModel� OurResults
PartII:ComputationalSQLowerBounds
� GenericSQLowerBoundTechnique� TwoApplications:LearningGMMs,RobustlyLearningaGaussian
OUTLINE
PartIII:Extensions
PartIV:SummaryandConclusions
STATISTICALQUERIES[KEARNS’93]
𝑥", 𝑥$, … , 𝑥& ∼ 𝐷 over𝑋
STATISTICALQUERIES[KEARNS’93]
𝑣" − 𝐄-∼. 𝜙" 𝑥 ≤ 𝜏𝜏 istoleranceofthequery;𝜏 = 1/ 𝑚�
𝜙7
𝑣"𝜙$𝑣$
𝑣7SQalgorithmSTAT.(𝜏) oracle
𝐷
𝜙": 𝑋 → −1,1
Problem𝑃 ∈ SQCompl 𝑞,𝑚 :IfexistsaSQalgorithmthatsolves𝑃 using𝑞 queriestoSTAT.(𝜏 = 1/ 𝑚� )
POWEROFSQ ALGORITHMS(?)RestrictedModel:Hopetoproveunconditionalcomputationallowerbounds.
PowerfulModel:WiderangeofalgorithmictechniquesinMLareimplementableusingSQs*:
• PACLearning:AC0,decisiontrees,linearseparators,boosting.
• UnsupervisedLearning:stochasticconvexoptimization,moment-basedmethods,k-meansclustering,EM,…[Feldman-Grigorescu-Reyzin-Vempala-Xiao/JACM’17]
Onlyknownexception:Gaussianeliminationoverfinitefields(e.g.,learningparities).
Forallproblemsinthistalk,strongestknownalgorithmsareSQ.
METHODOLOGYFORSQ LOWERBOUNDSStatisticalQueryDimension:
• Fixed-distributionPACLearning[Blum-Furst-Jackson-Kearns-Mansour-Rudich’95;…]
• GeneralStatisticalProblems[Feldman-Grigorescu-Reyzin-Vempala-Xiao’13,…,Feldman’16]
PairwisecorrelationbetweenD1 andD2 withrespecttoD:
Fact:Sufficestoconstructalargesetofdistributionsthatarenearlyuncorrelated.
PartI:Introduction
� UnsupervisedLearninginHighDimension� StatisticalQuery(SQ)LearningModel� OurResults
PartII:ComputationalSQLowerBounds
� GenericSQLowerBoundTechnique� TwoApplications:LearningGMMs,RobustlyLearningaGaussian
OUTLINE
PartIII:SummaryandConclusions
STATISTICALQUERYLOWERBOUNDFORROBUSTLYLEARNINGAGAUSSIAN
Theorem:SupposeAnySQalgorithmthatlearnsan- corruptedGaussianwithinstatisticaldistanceerror
requireseither:• SQqueriesofaccuracyor• Atleast
manySQqueries.
Take-away: Anyasymptoticimprovementinerrorguaranteeoverpriorworkrequiressuper-polynomialtime.
GENERALLOWERBOUNDCONSTRUCTION
GeneralTechniqueforSQLowerBounds:LeadstoTightLowerBounds
forarangeofHigh-dimensionalEstimationTasks
ConcreteApplicationsofourTechnique:
• RobustlyLearningtheMeanandCovariance
• LearningGaussianMixtureModels(GMMs)
• Statistical-ComputationalTradeoffs
• RobustlyTestingaGaussian
APPLICATIONS:CONCRETESQ LOWERBOUNDSUnifiedtechniqueyieldingarangeofapplications
LearningProblem UpperBound SQLowerBound
RobustGaussianMeanEstimation
Error:
[DKKLMS’16]
RuntimeLowerBound:
forfactorM improvementinerror.
RobustGaussianCovarianceEstimation
Error:
[DKKLMS’16]
Learningk-GMMs(withoutnoise)
Runtime:
[MV’10, BS’10]
Runtime LowerBound:
Robustk-SparseMeanEstimation
Samplesize:
[Li’17,DBS’17]
If samplesizeisruntimelowerbound:
RobustCovarianceEstimationinSpectral
Norm
Samplesize:
[DKKLMS’16]
If samplesizeisruntimelowerbound:
GAUSSIANMIXTUREMODEL(GMM)
• GMM:Distributiononwithprobabilitydensityfunction
• ExtensivelystudiedinstatisticsandTCS
KarlPearson(1894)
GAUSSIANMIXTUREMODEL(GMM)
• GMM:Distributiononwithprobabilitydensityfunction
• ExtensivelystudiedinstatisticsandTCS
KarlPearson(1894)
LEARNINGGMMS- PRIORWORK(I)
TwoRelatedLearningProblemsParameterEstimation:Recovermodelparameters.
• SeparationAssumptions:Clustering-basedTechniques[Dasgupta’99,Dasgupta-Schulman’00,Arora-Kanan’01,Vempala-Wang’02,Achlioptas-McSherry’05,Brubaker-Vempala’08]
SampleComplexity:(BestKnown)Runtime:
• NoSeparation:MomentMethod[Kalai-Moitra-Valiant’10,Moitra-Valiant’10,Belkin-Sinha’10,Hardt-Price’15]
SampleComplexity:(BestKnown)Runtime:
SEPARATIONASSUMPTIONS
• Clusteringispossibleonlywhenthecomponentshaveverylittleoverlap.
• Formally,wewantthetotalvariationdistancebetweencomponentstobecloseto1.
• AlgorithmsforlearningsphericalGMMSworkunderthisassumption.
• Fornon-sphericalGMMs,knownalgorithmsrequirestrongerassumptions.
LEARNINGGMMS- PRIORWORK(II)
DensityEstimation:Recoverunderlyingdistribution(withinstatisticaldistance).
[Feldman-O’Donnell-Servedio’05,Moitra-Valiant’10,Suresh-Orlitsky-Acharya-Jafarpour’14,Hardt-Price’15,Li-Schmidt’15]
SampleComplexity:
(BestKnown)Runtime:
Fact:ForseparatedGMMs,densityestimationandparameterestimationareequivalent.
LEARNINGGMMS– OPENQUESTION
Summary:Thesamplecomplexityofdensityestimationfork-GMMsis.Thesamplecomplexityofparameterestimationforseparated k-GMMsis.
Question:Isthereatime learningalgorithm?
STATISTICALQUERYLOWERBOUNDFORLEARNINGGMMS
Theorem: Supposethat.AnySQalgorithmthatlearnsseparatedk-GMMsovertoconstanterrorrequireseither:• SQqueriesofaccuracy
or• Atleast
manySQqueries.
Take-away: ComputationalcomplexityoflearningGMMsisinherentlyexponentialinnumberofcomponents.
PartI:Introduction
� UnsupervisedLearninginHighDimension� StatisticalQuery(SQ)LearningModel� OurResults
PartII:ComputationalSQLowerBounds
� GenericSQLowerBoundTechnique� TwoApplications:LearningGMMs,RobustlyLearningaGaussian
OUTLINE
PartIII:SummaryandConclusions
GENERALRECIPEFOR(SQ)LOWERBOUNDS
OurgenerictechniqueforprovingSQLowerBounds:
� Step#1:ConstructdistributionthatisstandardGaussianinalldirectionsexcept.
� Step#2:Constructtheunivariateprojectioninthedirectionsothatitmatchesthefirstm momentsof
� Step#3:Considerthefamilyofinstances
Non-GaussianComponentAnalysis [Blanchardetal.2006]
HIDDENDIRECTIONDISTRIBUTION
Definition: Foraunitvectorv andaunivariatedistributionwithdensityA,considerthehigh-dimensionaldistribution
Example:
GENERICSQLOWERBOUND
Definition: Foraunitvectorv andaunivariatedistributionwithdensityA,considerthehigh-dimensionaldistribution
Proposition:Supposethat:• A matchesthefirstm momentsof• Wehaveaslongasv, v’ arenearly
orthogonal.
ThenanySQalgorithmthatlearnsanunknown withinerrorrequireseitherqueriesofaccuracyormanyqueries.
WHYISFINDINGAHIDDENDIRECTIONHARD?
Observation:Low-DegreeMomentsdonothelp.
• A matchesthefirstm momentsof• Thefirstm momentsofareidenticaltothoseof• Degree-(m+1) moment tensor has entries.
Claim:Randomprojectionsdonothelp.
• Todistinguishbetweenand,wouldneedexponentiallymanyrandomprojections.
ONE-DIMENSIONALPROJECTIONSAREALMOSTGAUSSIAN
KeyLemma:LetQ bethedistributionof,where.Then,wehavethat:
PROOFOFKEYLEMMA(I)
PROOFOFKEYLEMMA(I)
PROOFOFKEYLEMMA(II)
where istheoperatorover
GaussianNoise(Ornstein-Uhlenbeck)Operator
EIGENFUNCTIONS OFORNSTEIN-UHLENBECK OPERATOR
LinearOperator actingonfunctions
Fact(Mehler’66):
• denotesthedegree-i Hermite polynomial.• Notethatareorthonormalwithrespect
totheinnerproduct
GENERICSQLOWERBOUND
Definition: Foraunitvectorv andaunivariatedistributionwithdensityA,considerthehigh-dimensionaldistribution
Proposition:Supposethat:• A matchesthefirstm momentsof• Wehaveaslongasv, v’ arenearly
orthogonal.
ThenanySQalgorithmthatlearnsanunknown withinerrorrequireseitherqueriesofaccuracyormanyqueries.
PartI:Introduction
� UnsupervisedLearninginHighDimension� StatisticalQuery(SQ)LearningModel� OurResults
PartII:ComputationalSQLowerBounds
� GenericSQLowerBoundTechnique� Application:LearningGMMs
OUTLINE
PartIII:SummaryandConclusions
Theorem: AnySQalgorithmthatlearnsseparatedk-GMMsovertoconstanterrorrequireseitherSQqueriesofaccuracyoratleastmanySQqueries.
APPLICATION:SQ LOWERBOUNDFORGMMS (I)
Wanttoshow:
byusingourgenericproposition:
Proposition:Supposethat:• A matchesthefirstm momentsof• Wehaveaslongasv, v’ arenearly
orthogonal.
ThenanySQalgorithmthatlearnsanunknownwithinerrorrequireseitherqueriesofaccuracyormanyqueries.
APPLICATION:SQ LOWERBOUNDFORGMMS (II)
Proposition:Supposethat:• A matchesthefirstm momentsof• Wehaveaslongasv, v’ arenearly
orthogonal.
ThenanySQalgorithmthatlearnsanunknownwithinerrorrequireseitherqueriesofaccuracyormanyqueries.
Lemma:ThereexistsaunivariatedistributionA thatisak-GMMwithcomponentsAi such that:• A agreeswithonthefirst2k-1 moments.• Eachpairofcomponentsareseparated.• Wheneverv andv’ arenearlyorthogonal
APPLICATION:SQ LOWERBOUNDFORGMMS (III)Lemma:ThereexistsaunivariatedistributionA thatisak-GMMwithcomponentsAi such that:• A agreeswithonthefirst2k-1 moments.• Eachpairofcomponentsareseparated.• Wheneverv andv’ arenearlyorthogonal
APPLICATION:SQ LOWERBOUNDFORGMMS (III)High-DimensionalDistributionslooklike“parallelpancakes”:
Efficientlylearnablefork=2. [Brubaker-Vempala’08]
PartI:Introduction
� UnsupervisedLearninginHighDimension� StatisticalQuery(SQ)LearningModel� OurResults
PartII:ComputationalSQLowerBounds
� GenericSQLowerBoundTechnique� TwoApplications:LearningGMMs,RobustlyLearningaGaussian
OUTLINE
PartIII:SummaryandConclusions
SUMMARYANDFUTUREDIRECTIONS
• GeneralTechniquetoProveSQLowerBounds• Robustnesscanmakehigh-dimensionalestimationhardercomputationallyandinformation-theoretically.
FutureDirections:
• FurtherApplicationsofourFrameworkList-DecodableMeanEstimation[D-Kane-Stewart’18]DiscreteProductDistributions[D-Kane-Stewart’18]RobustRegression[D-Kong-Stewart’18]AdversarialExamples[Bubeck-Price- Razenshteyn’18]
• AlternativeEvidenceofComputationalHardness?
Thanks!AnyQuestions?