sigclust gaussian null distribution - simulation now simulate from null distribution using: where...

143
SigClust Gaussian null distribution - Simulation Now simulate from null distribution using: where (indep.) Again rotation invariance makes this work j i j N X , 0 ~ , i d i X X , , 1

Upload: aubrie-perry

Post on 18-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

SigClust Gaussian null distribution - Simulation

Now simulate from null distribution using

where (indep)

Again rotation invariance makes this work

(and location invariance)

jij NX 0~

id

i

X

X

1

SigClust Gaussian null distribution - Simulation

Then compare data CI

With simulated null population CIs

bull Spirit similar to DiProPermbull But now significance happens for

smaller values of CI

An example (details to follow)

P-val = 00045

SigClust Real Data Results

Summary of Perou 500 SigClust ResultsLum amp Norm vs Her2 amp Basal p-val = 10-19

Luminal A vs B p-val = 00045Her 2 vs Basal p-val = 10-10

Split Luminal A p-val = 10-7

Split Luminal B p-val = 0058Split Her 2 p-val = 010Split Basal p-val = 0005

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions

Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless

nlim

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Real Reasons

Approximation provides insightsCan find simple underlying structureIn complex situations

Thus various flavors are fine

Even desirable (find additional insights)

0limlimlimlim dndn

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

Where are Data

Near Peak of Density

Thanks to psycnetapaorg

d

dd

d

IN

Z

Z

Z 0~1

HDLSS Asymptotics Simple Paradoxes

As

-Data lie roughly on surface of sphere

with radius

- Yet origin is point of highest density

- Paradox resolved by

density w r t Lebesgue Measure

d

)1(pOdZ

d

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

SigClust Gaussian null distribution - Simulation

Then compare data CI

With simulated null population CIs

bull Spirit similar to DiProPermbull But now significance happens for

smaller values of CI

An example (details to follow)

P-val = 00045

SigClust Real Data Results

Summary of Perou 500 SigClust ResultsLum amp Norm vs Her2 amp Basal p-val = 10-19

Luminal A vs B p-val = 00045Her 2 vs Basal p-val = 10-10

Split Luminal A p-val = 10-7

Split Luminal B p-val = 0058Split Her 2 p-val = 010Split Basal p-val = 0005

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions

Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless

nlim

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Real Reasons

Approximation provides insightsCan find simple underlying structureIn complex situations

Thus various flavors are fine

Even desirable (find additional insights)

0limlimlimlim dndn

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

Where are Data

Near Peak of Density

Thanks to psycnetapaorg

d

dd

d

IN

Z

Z

Z 0~1

HDLSS Asymptotics Simple Paradoxes

As

-Data lie roughly on surface of sphere

with radius

- Yet origin is point of highest density

- Paradox resolved by

density w r t Lebesgue Measure

d

)1(pOdZ

d

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

An example (details to follow)

P-val = 00045

SigClust Real Data Results

Summary of Perou 500 SigClust ResultsLum amp Norm vs Her2 amp Basal p-val = 10-19

Luminal A vs B p-val = 00045Her 2 vs Basal p-val = 10-10

Split Luminal A p-val = 10-7

Split Luminal B p-val = 0058Split Her 2 p-val = 010Split Basal p-val = 0005

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions

Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless

nlim

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Real Reasons

Approximation provides insightsCan find simple underlying structureIn complex situations

Thus various flavors are fine

Even desirable (find additional insights)

0limlimlimlim dndn

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

Where are Data

Near Peak of Density

Thanks to psycnetapaorg

d

dd

d

IN

Z

Z

Z 0~1

HDLSS Asymptotics Simple Paradoxes

As

-Data lie roughly on surface of sphere

with radius

- Yet origin is point of highest density

- Paradox resolved by

density w r t Lebesgue Measure

d

)1(pOdZ

d

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

SigClust Real Data Results

Summary of Perou 500 SigClust ResultsLum amp Norm vs Her2 amp Basal p-val = 10-19

Luminal A vs B p-val = 00045Her 2 vs Basal p-val = 10-10

Split Luminal A p-val = 10-7

Split Luminal B p-val = 0058Split Her 2 p-val = 010Split Basal p-val = 0005

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions

Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless

nlim

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Real Reasons

Approximation provides insightsCan find simple underlying structureIn complex situations

Thus various flavors are fine

Even desirable (find additional insights)

0limlimlimlim dndn

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

Where are Data

Near Peak of Density

Thanks to psycnetapaorg

d

dd

d

IN

Z

Z

Z 0~1

HDLSS Asymptotics Simple Paradoxes

As

-Data lie roughly on surface of sphere

with radius

- Yet origin is point of highest density

- Paradox resolved by

density w r t Lebesgue Measure

d

)1(pOdZ

d

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions

Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless

nlim

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Real Reasons

Approximation provides insightsCan find simple underlying structureIn complex situations

Thus various flavors are fine

Even desirable (find additional insights)

0limlimlimlim dndn

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

Where are Data

Near Peak of Density

Thanks to psycnetapaorg

d

dd

d

IN

Z

Z

Z 0~1

HDLSS Asymptotics Simple Paradoxes

As

-Data lie roughly on surface of sphere

with radius

- Yet origin is point of highest density

- Paradox resolved by

density w r t Lebesgue Measure

d

)1(pOdZ

d

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asymptotics

Modern Mathematical Statistics Based on asymptotic analysis Real Reasons

Approximation provides insightsCan find simple underlying structureIn complex situations

Thus various flavors are fine

Even desirable (find additional insights)

0limlimlimlim dndn

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

Where are Data

Near Peak of Density

Thanks to psycnetapaorg

d

dd

d

IN

Z

Z

Z 0~1

HDLSS Asymptotics Simple Paradoxes

As

-Data lie roughly on surface of sphere

with radius

- Yet origin is point of highest density

- Paradox resolved by

density w r t Lebesgue Measure

d

)1(pOdZ

d

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

Where are Data

Near Peak of Density

Thanks to psycnetapaorg

d

dd

d

IN

Z

Z

Z 0~1

HDLSS Asymptotics Simple Paradoxes

As

-Data lie roughly on surface of sphere

with radius

- Yet origin is point of highest density

- Paradox resolved by

density w r t Lebesgue Measure

d

)1(pOdZ

d

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asymptotics Simple Paradoxes

As

-Data lie roughly on surface of sphere

with radius

- Yet origin is point of highest density

- Paradox resolved by

density w r t Lebesgue Measure

d

)1(pOdZ

d

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asymptotics Simple Paradoxes

- Paradox resolved by

density w r t Lebesgue Measure

Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asymptotics Simple Paradoxes

As

Important Philosophical Consequence

ldquoAverage Peoplerdquo

Parents Lament

Why Canrsquot I Have Average Children

Theorem Impossible (over many factors)

d )1(pOdZ

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asymptotics Simple Paradoxes

Distance tends to non-random constant

bullFactor since

Can extend to

)1(221 pOdZZ

nZZ

1

222

121 XsdXsdXXsd 2

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asymptotics Simple Paradoxes

For dimrsquoal Standard Normal distrsquon

indep of

High dimrsquoal Angles (as )

- Everything is orthogonal

d

d

dd INZ 0~2

)(90 2121

dOZZAngle p

1Z

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Subspace Generated by Data

Hyperplane through 0

of dimension

Points are ldquonearly equidistant to 0rdquo

amp dist

Within plane can

ldquorotate towards Unit Simplexrdquo

All Gaussian data sets are

ldquonear Unit Simplex Verticesrdquo

ldquoRandomnessrdquo appears

only in rotation of simplex

n

d ddn INZZ 0~1

d

d

Hall Marron amp Neeman (2005)

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Representrsquon

Assume let

Study Hyperplane Generated by Data

dimensional hyperplane

Points are pairwise equidistant dist

Points lie at vertices of

ldquoregular hedronrdquo

Again ldquorandomness in datardquo is only in rotation

Surprisingly rigid structure in random data

1n

d ddn INZZ 0~1

d2d2~

n

>

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Simulation View Shows ldquoRigidity after Rotationrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Now Recall HDLSS Simulation Results

Comparing DWD SVM amp Others from 102114

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Discrimrsquon Simulations

Main idea

Comparison of

bull SVM (Support Vector Machine)

bull DWD (Distance Weighted Discrimination)

bull MD (Mean Difference aka Centroid)

Linear versions across dimensions

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Discrimrsquon Simulations

Overall Approachbull Study different known phenomena

ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding

bull Common Sample Sizes

bull But wide range of dimensions25 nn

16004001004010d

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Discrimrsquon Simulations

Spherical Gaussians

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Discrimrsquon Simulations

Outlier Mixture

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Discrimrsquon Simulations

Wobble Mixture

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Discrimrsquon Simulations

Nested Spheres

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Discrimrsquon Simulations

hellip

Interesting Phenomenon

All methods come together

in very high dimensions

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Discrimrsquon Simulations

Can we say more about

All methods come together

in very high dimensions

Mathematical Statistical Question

Mathematics behind this

(Use Geometric Representation)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Explanation of Observed (Simulation) Behavior

ldquoeverything similar for very high d rdquo

bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical Represenrsquotion

Straightforward Generalizations

non-Gaussian data only need moments

non-independent use ldquomixing conditionsrdquo

Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)

All based on simple ldquoLaws of Large Numbersrdquo

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large in sense

For assume ie

(min possible)

(much weaker than previous mixing conditionshellip)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Background

In classical multivariate analysis the statistic

Is called the ldquoepsilon statisticrdquo

And is used to test ldquosphericityrdquo of distrsquon

ie ldquoare all covrsquonce eigenvalues the samerdquo

d

jj

d

jj

d1

2

2

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

d

jj

d

jj

d1

2

2

1

11d

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

d

jj

d

jj

d1

2

2

1

11d1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

d

jj

d

jj

d1

2

2

1

11d1

d

1

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Can show epsilon statistic

Satisfies

bull For spherical Normal

bull Single extreme eigenvalue gives

bull So assumption is very mild

bull Much weaker than mixing conditions

d

jj

d

jj

d1

2

2

1

11d

1 d

1

d

1

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

1 d

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Ahn Marron Muller amp Chi (2007) Assume 2nd Moments

Assume no eigenvalues too large

Then

Not so strong as before

1 d

dOXX pji )1(

)1(221 pOdZZ

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Can we improve on

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

ddddiININX 10050050~

dOXX pji )1(

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Can we improve on

John Kent example Normal scale mixture

Wonrsquot get

ddddiININX 10050050~

dOXX pji )1(

)1(pjiOdCXX

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

3rd Paper on HDLSS Asymptotics

Get Geometrical Representation using

bull 4th Moment Assumption

bull Stronger Covariance Matrix (only) Assumrsquon

Yata amp Aoshima (2012)

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

ddddiININX 10050050~

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

2nd Paper on HDLSS Asymptotics

Notes on Kentrsquos Normal Scale Mixture

bull Data Vectors are indeprsquodent of each other

bull But entries of each have strong dependrsquoce

bull However can show entries have cov = 0

bull Recall statistical folklore

Covariance = 0 Independence

ddddiININX 10050050~

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

(Note Not Using Multivariate Gaussian)

YX

10~ NYX

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

bull Random Variables and

bull Make both Gaussian

bull With strong dependence

bull Yet 0 covariance

Given define

YX

10~ NYX

0c

cXX

cXXY

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example c to make cov(XY) = 0

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

0cov YXc

c 0cov YX

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Simple Example

bull Distribution is degenerate

bull Supported on diagonal lines

bull Not abs cont wrt 2-d Lebesgue meas

bull For small have

bull For large have

bull By continuity with

0cov YXc

c

c 0cov YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

YX

0cov YX

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

YX

0cov YX

X Y

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

0 Covariance is not independence

Result

bull Joint distribution of and ndash Has Gaussian marginals

ndash Has

ndash Yet strong dependence of and

ndash Thus not multivariate Gaussian

Shows Multivariate Gaussian means more

than Gaussian Marginals

YX

0cov YX

X Y

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion

1 DWD more stable than SVM(based on deeper limiting distributions)

(reflects intuitive idea feeling sampling variation)(something like mean vs median)

Hall Marron Neeman (2005)

2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)

3 Inefficiency of DWD for uneven sample size(motivates weighted version)

Qiao et al (2010)

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

(Study Properties of PCA

In Estimating Eigen-Directions amp -Values)

[Assume Data are Mean Centered]

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues 11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

Note Critical Parameter

11 21 dddd d

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

Turns out Direction Doesnrsquot Matter

11 21 dddd d

1u

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Math Stat of PCA

Consistency amp Strong Inconsistency

Spike Covariance Model Paul (2007)

For Eigenvalues

1st Eigenvector

How Good are Empirical Versions

as Estimates

11 21 dddd d

1u

11 ˆˆˆ uddd

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Consistency (big enough spike)

For 1

0ˆ 11 uuAngle

HDLSS Math Stat of PCA

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Consistency (big enough spike)

For

Strong Inconsistency (spike not big enough)

For

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

HDLSS Math Stat of PCA

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

HDLSS Math Stat of PCA

1

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Intuition Random Noise ~ d12

For (Recall on Scale of Variance)

Spike Pops Out of Pure Noise Sphere

For

Spike Contained in Pure Noise Sphere

HDLSS Math Stat of PCA

1

1

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Consistency of eigenvalues

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Consistency of eigenvalues

Eigenvalues Inconsistent

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

nn

dL

d

2

11

HDLSS Math Stat of PCA

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Consistency of eigenvalues

Eigenvalues Inconsistent

But Known Distribution

Consistent when as Well

nn

dL

d

2

11

n

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

PCA Conditions Same since Noise Still

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

21dOp

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon amp PCA Consist

John Kent example

Can only say

not deterministic

But for Geo Reprsquon need some Mixing Cond

HDLSS Math Stat of PCA

dddddd ININX 10002

10

2

1~

21212121

21

10

)(

pwd

pwddOX p

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon

Conclude Need some Mixing Condition

HDLSS Math Stat of PCA

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Idea From Probability Theory

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Idea From Probability Theory

Recall Standard Asymptotic Results as

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)

Mixing Conditions

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Idea From Probability Theory

Recall Standard Asymptotic Results as

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

(Usually Ignore )

Mixing Conditions

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Idea From Probability Theory

Law of Large Numbers

Central Limit Theorem

Both have Technical Assumptions

Eg Independent and Ident Distrsquod

Mixing Conditions

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Idea From Probability Theory

Mixing Conditions

Explore Weaker Assumptions to Still Get

Law of Large Numbers

Central Limit Theorem

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

Mixing Conditions

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Conditions

bull A Whole Area in Probability Theory

bull a Large Literature

bull A Comprehensive Reference

Bradley (2005 update of 1986 version)

bull Better Newer References

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Condition Used Here

Rho ndash Mixing

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

For Sigma-Fields Generated bybull bull bull Note Gap of Lag

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Mixing Conditions

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Mixing Condition Used Here

Rho ndash Mixing

For Random Variables Define

Where

Assume

Idea Uncorrelated at Far Lags

Mixing Conditions

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Assume Entries of Data Vectors

Are -mixing

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon

Hall Marron and Neeman (2005)

Drawback Strong Assumption

(In JRSS-B since

Biometrika Refused)

HDLSS Math Stat of PCA

d

j

X

X

X

X

2

1

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon

Series of Technical Improvements

bull Ahn Marron Muller amp Chi (2007)

bull Aoshima (2010) Yata amp Aoshima (2012)

(Fully Covariance Based

No Mixing)

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon

Tricky Point Classical Mixing Conditions

Require Notion of Time Ordering

Not Always Clear eg Microarrays

HDLSS Math Stat of PCA

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Note Not Gaussian

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Standardized

Version

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Conditions for Geo Reprsquon

Condition from Jung amp Marron (2009)

where

Define

Assume Ǝ a permutation

So that is ρ-mixing

HDLSS Math Stat of PCA

ddX 0~ tdddd UU

dtddd XUZ 21

d

ddZ

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Careful look at

PCA Consistency - spike

(Reality Check Suggested by Reviewer)

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

HDLSS Math Stat of PCA

1

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Careful look at

PCA Consistency - spike

Independent of Sample Size

So true for n = 1 ()

Reviewers Conclusion Absurd shows

assumption too strong for practice

HDLSS Math Stat of PCA

1

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Recall

RNAseq

Data From

82312

d ~ 1700

n = 180

HDLSS Math Stat of PCA

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Manually

Brushed

Clusters

Clear

Alternate

Splicing

Not

Noise

Functional Data Analysis

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

HDLSS Math Stat of PCA

1

1

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Recall Theoretical Separation

Strong Inconsistency - spike

Consistency - spike

Mathematically Driven Conclusion

Real Data Signals Are This Strong

HDLSS Math Stat of PCA

1

1

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

An Interesting Objection

Should not Study Angles in PCA

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

An Interesting Objection

Should not Study Angles in PCA

Recall for Consistency

For Strong Inconsistency

HDLSS Math Stat of PCA

1

0ˆ 11 uuAngle

1

011 90ˆ uuAngle

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

HDLSS Math Stat of PCA

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores

What we study in PCA scatterplots

HDLSS Math Stat of PCA

ivji xPsjˆˆ

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

An Interesting Objection

Should not Study Angles in PCA

Because PC Scores (ie projections)

Not Consistent

For Scores and

Can Show (Random)

Thanks to Dan Shen

HDLSS Math Stat of PCA

ivji xPsjˆˆ ivji xPs

j

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

HDLSS Math Stat of PCA

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS

PCA

Often

Finds

Signal

Not Pure

Noise

HDLSS Math Stat of PCA

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Same Realization for

HDLSS Math Stat of PCA

jji

ji Rs

s

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

PC Scores (ie projections)

Not Consistent

So how can PCA find Useful Signals in Data

Key is ldquoProportional Errorsrdquo

Axes have Inconsistent Scales

But Relationships are Still Useful

HDLSS Math Stat of PCA

jji

ji Rs

s

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

HDLSS Deep Open Problem

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

In PCA Consistency

Strong Inconsistency - spike

Consistency - spike

What happens at boundary ()

Ǝ interesting Limit Distnrsquos

Jung Sen amp Marron (2012)

HDLSS Deep Open Problem Result

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Recall

Flexibility

From

Kernel

Embedding

Idea

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Interesting Question

Behavior in Very High Dimension

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Interesting Question

Behavior in Very High Dimension

Answer El Karoui (2010)

bull In Random Matrix Limit

bull Kernel Embedded Classifiers ~

~ Linear Classifiers

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

HDLSS Asymptotics amp Kernel Methods

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

Interesting Question

Behavior in Very High Dimension

Implications for DWD

Recall Main Advantage is for High d

So not Clear Embedding Helps

Thus not yet Implemented in DWD

HDLSS Asymptotics amp Kernel Methods

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results

HDLSS Additional Results

Batch Adjustment Xuxin Liu

Recall Intuition from above

Key is sizes of biological subtypes

Differing ratio trips up mean

But DWD more robust

Mathematics behind this

  • SigClust Gaussian null distribution - Simulation
  • SigClust Gaussian null distribution - Simulation (2)
  • An example (details to follow)
  • SigClust Real Data Results
  • HDLSS Asymptotics
  • HDLSS Asymptotics (2)
  • HDLSS Asymptotics Simple Paradoxes
  • HDLSS Asymptotics Simple Paradoxes (2)
  • HDLSS Asymptotics Simple Paradoxes (3)
  • HDLSS Asymptotics Simple Paradoxes (4)
  • HDLSS Asymptotics Simple Paradoxes (5)
  • HDLSS Asymptotics Simple Paradoxes (6)
  • HDLSS Asyrsquos Geometrical Representrsquon
  • HDLSS Asyrsquos Geometrical Representrsquon (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion
  • HDLSS Asyrsquos Geometrical Represenrsquotion (2)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (3)
  • HDLSS Discrimrsquon Simulations
  • HDLSS Discrimrsquon Simulations (2)
  • HDLSS Discrimrsquon Simulations (3)
  • HDLSS Discrimrsquon Simulations (4)
  • HDLSS Discrimrsquon Simulations (5)
  • HDLSS Discrimrsquon Simulations (6)
  • HDLSS Discrimrsquon Simulations (7)
  • HDLSS Discrimrsquon Simulations (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (4)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (5)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (6)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (7)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (8)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (9)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (10)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (11)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (12)
  • 2nd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (2)
  • 2nd Paper on HDLSS Asymptotics (3)
  • 2nd Paper on HDLSS Asymptotics (4)
  • 2nd Paper on HDLSS Asymptotics (5)
  • 2nd Paper on HDLSS Asymptotics (6)
  • 2nd Paper on HDLSS Asymptotics (7)
  • 2nd Paper on HDLSS Asymptotics (8)
  • 2nd Paper on HDLSS Asymptotics (9)
  • 2nd Paper on HDLSS Asymptotics (10)
  • 2nd Paper on HDLSS Asymptotics (11)
  • 2nd Paper on HDLSS Asymptotics (12)
  • 2nd Paper on HDLSS Asymptotics (13)
  • 2nd Paper on HDLSS Asymptotics (14)
  • 3rd Paper on HDLSS Asymptotics
  • 2nd Paper on HDLSS Asymptotics (15)
  • 2nd Paper on HDLSS Asymptotics (16)
  • 2nd Paper on HDLSS Asymptotics (17)
  • 2nd Paper on HDLSS Asymptotics (18)
  • 0 Covariance is not independence
  • 0 Covariance is not independence (2)
  • 0 Covariance is not independence (3)
  • 0 Covariance is not independence (4)
  • 0 Covariance is not independence (5)
  • 0 Covariance is not independence (6)
  • 0 Covariance is not independence (7)
  • 0 Covariance is not independence (8)
  • 0 Covariance is not independence (9)
  • 0 Covariance is not independence (10)
  • 0 Covariance is not independence (11)
  • 0 Covariance is not independence (12)
  • 0 Covariance is not independence (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (13)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (14)
  • HDLSS Asyrsquos Geometrical Represenrsquotion (15)
  • HDLSS Math Stat of PCA
  • HDLSS Math Stat of PCA (2)
  • HDLSS Math Stat of PCA (3)
  • HDLSS Math Stat of PCA (4)
  • HDLSS Math Stat of PCA (5)
  • HDLSS Math Stat of PCA (6)
  • HDLSS Math Stat of PCA (7)
  • HDLSS Math Stat of PCA (8)
  • HDLSS Math Stat of PCA (9)
  • HDLSS Math Stat of PCA (10)
  • HDLSS Math Stat of PCA (11)
  • HDLSS Math Stat of PCA (12)
  • HDLSS Math Stat of PCA (13)
  • HDLSS Math Stat of PCA (14)
  • HDLSS Math Stat of PCA (15)
  • HDLSS Math Stat of PCA (16)
  • HDLSS Math Stat of PCA (17)
  • HDLSS Math Stat of PCA (18)
  • Mixing Conditions
  • Mixing Conditions (2)
  • Mixing Conditions (3)
  • Mixing Conditions (4)
  • Mixing Conditions (5)
  • Mixing Conditions (6)
  • Mixing Conditions (7)
  • Mixing Conditions (8)
  • Mixing Conditions (9)
  • Mixing Conditions (10)
  • Mixing Conditions (11)
  • Mixing Conditions (12)
  • Mixing Conditions (13)
  • Mixing Conditions (14)
  • Mixing Conditions (15)
  • Mixing Conditions (16)
  • Mixing Conditions (17)
  • HDLSS Math Stat of PCA (19)
  • HDLSS Math Stat of PCA (20)
  • HDLSS Math Stat of PCA (21)
  • HDLSS Math Stat of PCA (22)
  • HDLSS Math Stat of PCA (23)
  • HDLSS Math Stat of PCA (24)
  • HDLSS Math Stat of PCA (25)
  • HDLSS Math Stat of PCA (26)
  • HDLSS Math Stat of PCA (27)
  • HDLSS Math Stat of PCA (28)
  • HDLSS Math Stat of PCA (29)
  • HDLSS Math Stat of PCA (30)
  • Functional Data Analysis
  • HDLSS Math Stat of PCA (31)
  • HDLSS Math Stat of PCA (32)
  • HDLSS Math Stat of PCA (33)
  • HDLSS Math Stat of PCA (34)
  • HDLSS Math Stat of PCA (35)
  • HDLSS Math Stat of PCA (36)
  • HDLSS Math Stat of PCA (37)
  • HDLSS Math Stat of PCA (38)
  • HDLSS Math Stat of PCA (39)
  • HDLSS Math Stat of PCA (40)
  • HDLSS Math Stat of PCA (41)
  • HDLSS Math Stat of PCA (42)
  • HDLSS Math Stat of PCA (43)
  • HDLSS Math Stat of PCA (44)
  • HDLSS Deep Open Problem
  • HDLSS Deep Open Problem (2)
  • HDLSS Asymptotics amp Kernel Methods
  • HDLSS Asymptotics amp Kernel Methods (2)
  • HDLSS Asymptotics amp Kernel Methods (3)
  • HDLSS Asymptotics amp Kernel Methods (4)
  • HDLSS Asymptotics amp Kernel Methods (5)
  • HDLSS Asymptotics amp Kernel Methods (6)
  • HDLSS Asymptotics amp Kernel Methods (7)
  • HDLSS Asymptotics amp Kernel Methods (8)
  • HDLSS Asymptotics amp Kernel Methods (9)
  • HDLSS Additional Results