Download - New Statistical Algorithms for Analyzing Multi Batch CSF ...pages.stat.wisc.edu/~hzhou/NIPS16-ppt.pdf · Figure:MMD rejects this case, but ours does not. 6/24. CSF: Same participant’s

New Statistical Algorithms for Analyzing Multi BatchCSF Data with Systematic Variations

Hao Zhou

Statistics

CPCP-talk

Joint work with Sathya N. Ravi, Vamsi K. Ithapu, Sterling C. Johnson,Grace Wahba, Vikas Singh

1 / 24

CSF: Same participant’s values may change across batches

Dataset quick introduction

12 CSF protein levels of 701 subjects were collected in two differentbatches (measured on two different time points), 413 in batch 1, and288 in batch 2.

A subset of 85 individuals have both batch data (measured asdifferent values), others only available in one batch

2 / 24

Domain Adaptation and variability across CSF batches

Domain Adaptation (DA): In many real world datasets, training/testing(or source/target) samples may come from different “domains”.

3 / 24

Domain adaptation ideas could be used on CSF issue.Inputs/features in source and target domains denoted by xs and xt .Outputs/labels denoted by ys and yt .Transform source and target domains: match the feature/covariatedistributions across domains

A simple example of grades on a class

A binary setup where Pr(xs) 6= Pr(xt), Pr(ys |xs) 6= Pr(yt |xt)(Hint: solve by xs = 100− xs).

0.000

0.005

0.010

0.015

0.020

0.025

0 25 50 75 100

grade

dens

ity

domainsourcetarget

Feature density functions

0.25

0.50

0.75

0 25 50 75 100

grade

stud

y w

ell

cond_probsourcetarget

Study well Conditional Probability

4 / 24

Use MMD as distance measure of distributions

A statistic that measures the distance between two distributions.Maximum Mean Discrepancy (MMD) (Gretton et al 2012)

MMD(xs , xt) = ‖ 1

m

m∑i=1

K (x it , ·)−1

n

n∑i=1

K (x is , ·)‖H (1)

The objective function of our estimation problem (minimal MMD).

minλ∈Ωλ

minβ∈Ωβ

‖ 1

m

m∑i=1

K (g(x it , β), ·)− 1

n

n∑i=1

K (h(x is , λ), ·)||H (2)

5 / 24

Our hypothesis and how its different from MMDH0 : There exists a λ and β such that Pr(g(xt , β)) = Pr(h(xs , λ)).HA : No λ and β exists such that Pr(g(xt , β)) = Pr(h(xs , λ)).

0.0

0.1

0.2

0.3

0.4

−2 0 2 4

feature

dens

ity

distrN(0,1)N(1,1)

Density functions of two distributions

Figure: MMD rejects this case, but ours does not.6 / 24

CSF: Same participant’s values may change across batches

Dataset details

12 CSF protein levels of 701 subjects were collected in two differentbatches (measured on two different time points), 413 in batch 1, and288 in batch 2.

. . . includes sAppα, sAppβ, 1-38-Tr, 1-40-Tr, 1-42-Tr, MCP-1, YKL40,NFL, Ab-42, hTau, PTau, Neurogranin

A subset of 85 individuals have both batch data, others only availablein one batch

Linear standardization transformation between the two batches servesas a ‘gold’ standard.

Our algorithm does not use information about corresponding samplesbut compares two batches’ distributions directly.

7 / 24

Calculate difference of batch 2 and transformed batch 1We transform batch 1 data of those individuals having both batchdata.We then calculate `1 relative error of transformed batch 1 data andbatch 2 data of those individuals.

0.25

0.50

0.75

1.00

1.25

1 2 3 4 5 6 7 8 9 10 11 12

protein

mea

n re

lativ

e er

ror

methodAll (ours)NoneSubset (ours)gold standard

Relative L1 error

8 / 24

Predict Hippocampal Volumes based on transformed CSF

We used the ’transformed’ CSF data from the two batches andperformed a multiple regression to predict L/R Hippocampal Volume.

Performance measured by correlation between predicted and actualHippocampal Volume. 10-fold cross validation is used to formtraining and testing datasets.

Model Left Right

gold standard 0.46± 0.15 0.37±0.16Subset (ours) 0.48± 0.15 0.39± 0.15

All (ours) 0.48± 0.15 0.40± 0.15

9 / 24

Plot transformed batch 1 & 2 data of participants

0 1000 2000

0

1000

2000

sAppalfa

Paired

All

0 1000 2000

0

500

1000

1500

sAppbeta

Paired

All

0 2000 4000

0

2000

4000

60001-38-Tr

Paired

All

0 5000 10000

0

5000

100001-40-Tr

Paired

All

0 500 1000

0

500

10001-42-Tr

Paired

All

0 500 1000

0

500

1000MCP-1

Paired

All

×105

0 2 4

×105

0

2

4YKL40

Paired

All

0 2000 4000

0

2000

4000NFL

Paired

All

batch1

0 1000 2000

ba

tch

2

0

1000

2000

Ab1-42

Paired

All

0 500 1000

0

500

1000hTau

Paired

All

0 50 100

0

50

100Ptau

Paired

All

0 500 1000

0

500

1000

Neurogranin

Paired

All

10 / 24

Plot transformed batch 1 & 2 data of common persons

1− 38− Tr

0

1000

2000

3000

4000

0 1000 2000 3000 4000batch_1

batc

h_2

pairedall

1−38−Tr

0

1000

2000

3000

4000

5000

0 1000 2000 3000batch_1

batc

h_2

pairednone

1−38−Tr

11 / 24


MCP − 1

200

400

600

800

200 400 600 800batch_1

batc

h_2

pairedall

MCP−1

500

1000

500 1000batch_1

batc

h_2

pairednone

MCP−1

12 / 24


NFL

500

1000

1500

2000

500 1000 1500 2000batch_1

batc

h_2

pairedall

NFL

500

1000

1500

2000

500 1000 1500 2000batch_1

batc

h_2

pairednone

NFL

13 / 24

Remind of our method

The objective function of our estimation problem (minimal MMD).

M(λ, β) = minλ∈Ωλ

minβ∈Ωβ

‖ 1

m

m∑i=1

K (g(x it , β), ·)− 1

n

n∑i=1

K (h(x is , λ), ·)||H

(3)

The hypothesis testingI H0 : There exists a λ and β such that Pr(g(xt , β)) = Pr(h(xs , λ)).I HA : No λ and β exists such that Pr(g(xt , β)) = Pr(h(xs , λ)).

14 / 24

Assumptions

(A1)‖K (h(xs , λ1), ·)− K (h(xs , λ2), ·)‖ ≤ Lhd(λ1, λ2)rh ∀xs ;λ1, λ2 ∈ Ωλ

(A2)‖K (g(xt , β1), ·)− K (g(xt , β2), ·)‖ ≤ Lgd(β1, β2)rg ∀xt ;β1, β2 ∈ Ωβ

15 / 24

Hypothesis Testing Consistency

Theorem (Hypothesis Testing)

(a) Whenever H0 is true, with probability at least 1− α,

0 ≤M(λ, β) ≤√

2K (m + n) logα−1

mn+

2√K√n

+2√K√m

(4)

(b) Whenever HA is true, with probability at least 1− ε,

M(λ, β)−M∗(λA, βA) ≤√

2K (m + n) log ε−1

mn+

2√K√n

+2√K√m

≥ −√K√n

(4 +

√C (h,ε) +

dλ2rh

log n

)−√K√m

(4 +

√C (g ,ε) +

dβ2rg

logm

)(5)

C (h,ε) = log(2|Ωλ|) + log ε−1 + dλrh

log Lh√K

16 / 24

Convergence Consistency

Theorem (MMD Convergence)

Under H0

‖ExsK (h(xs , λ), ·)− ExtK (g(xt , β), ·)‖H → 0

in rate min(√

log n√n,√

log m√m

).

Theorem (Consistency)

Under H0, the estimators λ and β are consistent.

17 / 24

Assumptions

(A1)‖K (h(xs , λ1), ·)− K (h(xs , λ2), ·)‖ ≤ Lhd(λ1, λ2)rh ∀xs ;λ1, λ2 ∈ Ωλ

(A2)‖K (g(xt , β1), ·)− K (g(xt , β2), ·)‖ ≤ Lgd(β1, β2)rg ∀xt ;β1, β2 ∈ Ωβ

18 / 24

Simulation for test power

xs indicated by legend, xt ∼ N(10, 4), Model is xt = λ1 ∗ xs + λ2

xs ∼ N(0, 1), xt ∼ N(10, 4), Model is indicated by legend.

Sample Size (Log2 scale)

4 6 8 10

Acce

pta

nce

Ra

te

0

0.2

0.4

0.6

0.8

1

1.2Normal target vs. different sources

Normal(0,1)

Laplace(0,1)

Exponential(1)


4 6 8 10A

cce

pta

nce

Ra

te

0

0.2

0.4

0.6

0.8

1

1.2Models linear in parameters

a*x 2+b*x+c

a*log(|x|)+b

19 / 24

Simulation for estimation error

xs ∼ N(0, 1), xt ∼ N(10, 4), Model is xt = λ1 × xs + λ2

The L1 error is |λ1 − 2| for slope curve and |λ2 − 10| for intercept curve.


2 4 6 8 10 12

L1

Err

or

0

0.2

0.4

0.6

0.8

1

1.2Estimation Errors normal vs. normal

Slope

Intercept

20 / 24

An Ellipsoid Constraint

Theorem (Linear transformation)

Under H0, identity g(·) with h = φ(xs)Tλ, we have

Ωλ := λ; | 1n∑n

i=1 ‖x it − φ(x is)Tλ)‖2 ≤ 3∑p

k=1Var(xt,k) + ε. For any

ε, α > 0 and sufficiently large sample size, a neighborhood of λ0 iscontained in Ωλ with probability at least 1− α.

Observe that subscript k in xt,k above denotes the kth dimensional featureof xt .

21 / 24

Signomial Geometric Programming (SGP)

Monomial:exp(aT y + b)Posynomial:

∑K0k=1 exp(aT0ky + b0k)

Signomial Geometric Programming:

miny

K0∑k=1

exp(aT0ky + b0k)−L0∑l=1

exp(cT0l y + d0l) (6)

s.t.

Ki∑k=1

exp(aTiky + bik)−Li∑l=1

exp(cTil y + dil) ≤ 0 (7)

Idea

min f (x)⇔ sup γ s.t. f (x)− γ ≥ 0. Relaxation on ”NonnegativeSignomial” constraint.

Series of convex problems that give tighter bounds.

22 / 24

Conclusions

A statistical framework to harmonize CSF measurements acrossbatches/sites

Assumption: Same ”concept” is captured across sites

Constructions for hypothesis tests

Participants don’t need to be represented twice in different batchesfor calibration

23 / 24

The End, Thank You

24 / 24

Download - New Statistical Algorithms for Analyzing Multi Batch CSF ...pages.stat.wisc.edu/~hzhou/NIPS16-ppt.pdf · Figure:MMD rejects this case, but ours does not. 6/24. CSF: Same participant’s

Top Related