efficiency vs robustness in high-dimensional statisticsrongge/stoc2017ml/ilias_diakonikolas... ·...
TRANSCRIPT
Canwedeveloplearningalgorithmsthatarerobust toaconstantfractionofcorruptions inthedata?
CONTEXT
ContaminationModel:Letbeafamilyofhigh-dimensionaldistributions.WesaythatasetofN samplesis-corruptedfromifitisgeneratedasfollows:• N samplesaredrawnfromanunknown• Anomniscientadversaryinspectsthesesamplesand
changesarbitrarilyan-fractionofthem.
F
F 2 F
F✏
✏
RobustEstimatorsinHighDimensionswithouttheComputationalIntractability(D,Kamath,Kane,Li,Moitra,Stewart,FOCS2016)
AgnosticEstimationofMeanandCovariance(Lai,Rao,Vempala,FOCS2016)
SURVEYOFTWORECENTWORKS:
PARAMETERESTIMATIONGivensamplesfromanunknowndistribution
e.g.,a1-DGaussian
howdoweaccuratelyestimateitsparameters?
PARAMETERESTIMATIONGivensamplesfromanunknowndistribution
e.g.,a1-DGaussian
howdoweaccuratelyestimateitsparameters?
empiricalmean: empiricalvariance:
Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)
R.A.Fisher J.W.Tukey
Whatabouterrors inthemodelitself?(1960)
ROBUSTPARAMETERESTIMATIONGivencorrupted samplesfroma1-DGaussian:
canweaccuratelyestimateitsparameters?
=+idealmodel noise observedmodel
Dotheempiricalmeanandempiricalvariancework?
No!
=+idealmodel noise observedmodel
Asinglecorruptedsamplecanarbitrarilycorrupttheestimates
Dotheempiricalmeanandempiricalvariancework?
No!
=+idealmodel noise observedmodel
Asinglecorruptedsamplecanarbitrarilycorrupttheestimates
Butthemedian andmedianabsolutedeviationdowork
Fact[Folklore]:Given- corruptedsamplesfroma1-DGaussian
withhighconstantprobabilitywehavethat:
where
Whataboutrobustestimationinhigh-dimensions?
PRIORWORK(1960- 2016)
• Vastliteratureonrobustestimatorsinstatisticscommunity.
• SampleComplexityvs RobustnessinHigh-Dimensions(e.g.,no tradeoffforrobustlearningofGaussian)
PRIORWORK(1960- 2016)
• Vastliteratureonrobustestimatorsinstatisticscommunity.
• SampleComplexityvs RobustnessinHigh-Dimensions(e.g.,no tradeoffforrobustlearningofGaussian)
• ComputationalEfficiencyvs Robustness?
PRIORWORK(1960- 2016)
Allknownestimatorsareeitherhardtocomputeorlosepolynomial factorsinthedimension
• Vastliteratureonrobustestimatorsinstatisticscommunity.
• SampleComplexityvs RobustnessinHigh-Dimensions(e.g.,no tradeoffforrobustlearningofGaussian)
• ComputationalEfficiencyvs Robustness?
PartI:Introduction
� CaseStudy:RobustMeanEstimation� NewAlgorithmicResults
PartII:AgnosticallyLearningaGaussian� ComparisonbetweentwoApproaches
� RecursiveDimension-Reduction[LRV’16]� FilteringTechnique[DKKLMS’16]�MoreRecentDevelopments
OUTLINE
PartIII:SummaryandConclusions
PartI:Introduction
� CaseStudy:RobustMeanEstimation� NewAlgorithmicResults
PartII:AgnosticallyLearningaGaussian� ComparisonbetweentwoApproaches
� RecursiveDimension-Reduction[LRV’16]� FilteringTechnique[DKKLMS’16]�MoreRecentDevelopments
OUTLINE
PartIII:SummaryandConclusions
BasicProblem:Givenan- corruptedsetofsamplesfromad-dimensionalunknownmeanGaussian
efficientlycomputeaparametersuchthat
✏
PREVIOUSAPPROACHESFORROBUSTESTIMATION
ErrorGuarantee
RunningTime
Tukey Median NP-Hard
GeometricMedian
Tournament
Pruning
UnknownMean
PartI:Introduction
� CaseStudy:RobustMeanEstimation� NewAlgorithmicResults
PartII:AgnosticallyLearningaGaussian� ComparisonbetweentwoApproaches
� RecursiveDimension-Reduction[LRV’16]� FilteringTechnique[DKKLMS’16]�MoreRecentDevelopments
OUTLINE
PartIII:SummaryandConclusions
MAINRESULTFORTHISTALK
Theorem: Therearealgorithmswiththefollowingbehavior:Givenandasetofcorrupted samplesfromad-
dimensionalGaussian,,thealgorithmsruninpolynomialtimeandfindaparameterthatsatisfies:• [LRV’16]:
inadditive adversarymodel.
• [DK+’16]:
instrong adversarymodel.
✏ > 0
FURTHERALGORITHMICRESULTS
[DKKLMS’16] EfficientRobustLearningAlgorithmswithDimension-Independent ErrorGuarantees:• MeanandCovarianceEstimationunderBoundedMoment
Assumptions• MixturesofProductDistributions/SphericalGaussians• ParameterEstimationinGraphicalModels [D-Kane-Stewart’16]
[LRV’16] Milddimension-dependent error:• MeanandCovarianceEstimationunderBoundedMoment
Assumptions• IndependentComponentAnalysis,SVD
PartI:Introduction
� CaseStudy:RobustMeanEstimation� NewAlgorithmicResults
PartII:AgnosticallyLearningaGaussian� ComparisonbetweentwoApproaches
� RecursiveDimension-Reduction[LRV’16]� FilteringTechnique[DKKLMS’16]�MoreRecentDevelopments
OUTLINE
PartIII:SummaryandConclusions
COMPARISONOFTWOAPPROACHES
Commonalities:
• SpectralAlgorithms:Lookatspectrumofempiricalcovariancetorobustlyestimatethemean
• CertificateforRobustnessofEmpiricalEstimator:SpectralNormofEmpiricalCovarianceisSmall
Exploitingthecertificate:
• [LRV16]:Find“good”largesubspace.
• [DK+16]:Checkconditiononentirespace.Ifviolated,filteroutliers.
CERTIFICATEFORROBUSTNESSOFEMPIRICALESTIMATOR
Detectwhentheempiricalestimatormay becompromised
=uncorrupted=corrupted
Thereisadirectionoflarge(>1)variance
KeyLemma:IfX1, X2, …, XN isan-corruptedsetofsamplesfromand,thenfor
(1) (2)
withhighprobabilitywehave:• [LRV’16]:
inadditive adversarymodel.
• [DK+’16]:
instrong adversarymodel.
Take-away:Anadversaryneedstomessupthesecondmomentinordertocorruptthefirstmoment
PartI:Introduction
� CaseStudy:RobustMeanEstimation� NewAlgorithmicResults
PartII:AgnosticallyLearningaGaussian� ComparisonbetweentwoApproaches
� RecursiveDimension-Reduction[LRV’16]� FilteringTechnique[DKKLMS’16]�MoreRecentDevelopments
OUTLINE
PartIII:SummaryandConclusions
RECURSIVEAPPROACH[LRV’16]
Two-StepProcedure:
Step#1:Findlargesubspacewhereempiricalmeanworks.
Step#2:Recurse oncomplement.(Ifdimensionis1,useempiricalmedian.)
Combineresults.
Canreducedimensionbyfactorof2ineachrecursivestep.
FINDINGAGOODSUBSPACE(I)
• GoodsubspaceG:onewheretheempiricalmeanworks.
ByKeyLemma,sufficientconditionis:
ProjectionofempiricalcovarianceonG hasnolargeeigenvalues.
• AlsowantG tobe“high-dimensional”.
Howdowefindsuchasubspace?
FINDINGAGOODSUBSPACE(II)
GoodSubspaceLemma:LetX1, X2, …, XN beanadditively- corruptedsetofsamplesfrom.Afterweakoutlierremoval,wehavethat
Corollary: Let W bethespanofthebottomd/2 eigenvaluesof.ThenW isagoodsubspace.
RECURSIVEDIMENSION-REDUCTIONALGORITHM
Algorithmworksasfollows:
• Removegrossoutliers(e.g.,pruning).
• LetW, V bethespanofbottomd/2 andupperd/2 eigenvaluesofrespectively.
• UseempiricalmeanonW.
• Recurse onV.(Ifthedimensionis1,usemedian.)
levelsoftherecursionfinalerrorof
PartI:Introduction
� CaseStudy:RobustMeanEstimation� NewAlgorithmicResults
PartII:AgnosticallyLearningaGaussian� ComparisonbetweentwoApproaches
� RecursiveDimension-Reduction[LRV’16]� FilteringTechnique[DKKLMS’16]�MoreRecentDevelopments
OUTLINE
PartIII:SummaryandConclusions
FILTERINGAPPROACH[DKKLMS’16]
Two-StepProcedure:
Step#1:Detectiftheempiricalestimatormaybecompromised
Step#2:Ifitis, filteroutoutliers
Iterateonnewdataset.
Generalrecipethatworksforfairlygeneralsettings.
Willshowhowitworksforunknownmeancase.
FILTERINGALGORITHM
Eitheroutputempiricalmean,orremovemanyoutliers.
FilteringApproach:Supposethat:
Letv bethedirectionofmaximumvariance.
FILTERINGALGORITHM
Eitheroutputempiricalmean,orremovemanyoutliers.
FilteringApproach:Supposethat:
Letv bethedirectionofmaximumvariance.[Klivans-Long-Servedio’09]
FILTERINGALGORITHM
Eitheroutputempiricalmean,orremovemanyoutliers.
FilteringApproach:Supposethat:
Letv bethedirectionofmaximumvariance.
FILTERINGALGORITHM
Eitheroutputempiricalmean,orremovemanyoutliers.
FilteringApproach:Supposethat:
Letv bethedirectionofmaximumvariance.
v
FILTERINGALGORITHM
Eitheroutputempiricalmean,orremovemanyoutliers.
FilteringApproach:Supposethat:
Letv bethedirectionofmaximumvariance.
v
T
FILTERINGALGORITHM
Eitheroutputempiricalmean,orremovemanyoutliers.
FilteringApproach:Supposethat:
Letv bethedirectionofmaximumvariance.• Projectallthepointsonthedirectionofv.• FindathresholdT suchthat
• Throwawayallpointssuchthat
• Iterateonnewdataset.
Prx⇠uS [|v · x�median(v · x)| > T ] � 3e�T
2/2
|v · x�median(v · x)| > T
Eventuallytheempiricalmeanworks
FILTERINGALGORITHM
Eitheroutputempiricalmean,orremovemanyoutliers.
FilteringApproach:Supposethat:
Wefilteroutmorecorruptedthangoodpoints.
Afteranumberofiterations,we haveremovedallcorruptedpoints.
GENERALITYOFFILTERINGAPPROACH
• Focusofinitialversionwasonspecificdistributionfamilies(e.g.,Gaussian,discreteproductdistributions).
Errorguarantee:
• Filterapproachworksunderweakerconcentrationassumptionswithappropriate(tight)guarantees.E.g.,Under2nd momentassumption:Under4th momentassumption:Undersub-gaussian assumption…
• Samplecomplexitynear-optimalforallthesecases.
O(p✏)
O(✏3/4)
kµ� bµk2 = O(✏plog(1/✏))
O(✏plog(1/✏))
PartI:Introduction
� CaseStudy:RobustMeanEstimation� NewAlgorithmicResults
PartII:AgnosticallyLearningaGaussian� ComparisonbetweentwoApproaches
� RecursiveDimension-Reduction[LRV’16]� FilteringTechnique[DKKLMS’16]�MoreRecentDevelopments
OUTLINE
PartIII:SummaryandConclusions
SUBSEQUENTWORK:ROBUSTMEANESTIMATION
Summarysofar:Thefilteringalgorithmrobustlyestimatesthemeanofuptoerrorinstrong adversarymodel.
SUBSEQUENTWORK:ROBUSTMEANESTIMATION
Summarysofar:Thefilteringalgorithmrobustlyestimatesthemeanofuptoerrorinstrong adversarymodel.
Question: Canweefficientlyachieveerror?
SUBSEQUENTWORK:ROBUSTMEANESTIMATION
Summarysofar:Thefilteringalgorithmrobustlyestimatesthemeanofuptoerrorinstrong adversarymodel.
Question: Canweefficientlyachieveerror?
[D-Kane-Stewart’16]:Algorithmwithruntimeinstrongadversarymodel.
SUBSEQUENTWORK:ROBUSTMEANESTIMATION
Summarysofar:Thefilteringalgorithmrobustlyestimatesthemeanofuptoerrorinstrong adversarymodel.
Question: Canweefficientlyachieveerror?
[D-Kane-Stewart’16]:Algorithmwithruntimeinstrongadversarymodel.
[D-Kane-Stewart’16]:NoStatisticalQuery(SQ)algorithmcandobetter.Specifically,errorrequiressuper-polynomialtimein1/ .
SUBSEQUENTWORK:ROBUSTMEANESTIMATION
Summarysofar:Thefilteringalgorithmrobustlyestimatesthemeanofuptoerrorinstrong adversarymodel.
Question: Canweefficientlyachieveerror?
[D-Kane-Stewart’16]:Algorithmwithruntimeinstrongadversarymodel.
[D-Kane-Stewart’16]:NoStatisticalQuery(SQ)algorithmcandobetter.Specifically,errorrequiressuper-polynomialtimein1/ .
[DKKLMS’17]: timealgorithmwitherrorinadditive adversarialmodel.
PartI:Introduction
� CaseStudy:RobustMeanEstimation� NewAlgorithmicResults
PartII:AgnosticallyLearningaGaussian� ComparisonbetweentwoApproaches
� RecursiveDimension-Reduction[LRV’16]� FilteringTechnique[DKKLMS’16]�MoreRecentDevelopments
OUTLINE
PartIII:SummaryandConclusions
OPENPROBLEMS
SubsequentWork:
[D-Kane-Stewart’16]KnownStructureBayesNets
[Li/Du-Balakrishan-Singh’17]Sparsemodels(e.g.,sparsePCA)
[Charikar-Steinhardt-Valiant’17]List-decodablelearning
[D-Kane-Stewart’17]RobustClassification
• Pickyourfavoritehigh-dimensionallearningproblemforwhicha(non-robust)efficientalgorithmisknown.
• Makeitrobust!