hdlss discrimination

8
HDLSS Discrimination FLD in Increasing Dimensions: Low dimensions (d = 2-9): Visually good separation Small angle between FLD and Optimal Good generalizability Medium Dimensions (d = 10-26): Visual separation too good?!? Larger angle between FLD and Optimal Worse generalizability Feel effect of sampling noise

Upload: palani

Post on 05-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

HDLSS Discrimination. FLD in Increasing Dimensions: Low dimensions ( d = 2-9 ): Visually good separation Small angle between FLD and Optimal Good generalizability Medium Dimensions ( d = 10-26 ): Visual separation too good ?!? Larger angle between FLD and Optimal Worse generalizability - PowerPoint PPT Presentation

TRANSCRIPT

Title

HDLSS DiscriminationFLD in Increasing Dimensions:Low dimensions (d = 2-9):Visually good separationSmall angle between FLD and OptimalGood generalizabilityMedium Dimensions (d = 10-26):Visual separation too good?!?Larger angle between FLD and OptimalWorse generalizabilityFeel effect of sampling noise1HDLSS DiscriminationFLD in Increasing Dimensions:High Dimensions (d=27-37):Much worse angleVery poor generalizabilityBut very small within class variationPoor separation between classesLarge separation / variation ratio2HDLSS DiscriminationFLD in Increasing Dimensions:At HDLSS Boundary (d=38):38 = degrees of freedom(need to estimate 2 class means)Within class variation = 0 ?!?Data pile up, on just two pointsPerfect separation / variation ratio?But only feels microscopic noise aspectsSo likely not generalizableAngle to optimal very large3HDLSS DiscriminationFLD in Increasing Dimensions:Just beyond HDLSS boundary (d=39-70):Improves with higher dimension?!?Angle gets betterImproving generalizability?More noise helps classification?!?

4HDLSS DiscriminationFLD in Increasing Dimensions:Far beyond HDLSS bounry (d=70-1000):Quality degradesProjections look terrible(populations overlap)And Generalizability falls apart, as wellMaths worked out by Bickel & Levina (2004)Problem is estimation of d x d covariance matrix

5HDLSS DiscriminationSimple Solution:Mean Difference (Centroid) MethodRecall not classically recommendedUsually no better than FLDSometimes worseBut avoids estimation of covarianceMeans are very stableDont feel HDLSS problem

6HDLSS DiscriminationMean Difference (Centroid) Method

Same Data,

Movie overdims

7HDLSS DiscriminationMean Difference (Centroid) MethodFar more stable over dimensionsBecause is likelihood ratio solution(for known variance - Gaussians)Doesnt feel HDLSS boundaryEventually becomes too good?!?Widening gap between clusters?!?Careful: angle to optimal growsSo lose generalizability (since noise incs)HDLSS data present some odd effects8