putting lots of things in order: r-values for ranking in large-scale...

65
Putting lots of things in order: r-values for ranking in large-scale inference Michael Newton Nick Henderson Statistics Day, CDC

Upload: others

Post on 24-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Putting lots of things in order: r-values for ranking in large-scale inference

Michael Newton Nick Henderson

Statistics Day, CDC

Page 2: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods
Page 3: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

a general, unresolved statistics problem

Page 4: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

−4 −2 0 2 4 6 8scale

error-free measurement

Page 5: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

−4 −2 0 2 4 6 8

●●

scale

measurement

Page 6: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

−4 −2 0 2 4 6 8

●●

scale

measurement

estimate +/- 2 SE

Page 7: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

−4 −2 0 2 4 6 8

multivariate

Page 8: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

−4 −2 0 2 4 6 8

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

multivariate

incr

easin

g pa

ramet

er

what we want

Page 9: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

−4 −2 0 2 4 6 8

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

multivariate

incr

easin

g es

timat

e

what we get

Page 10: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

● ●● ● ●● ●●●● ●●● ●●●●● ● ● ●●●● ●●● ●● ●● ●● ●● ●● ●● ●●●● ● ● ●● ●●●● ● ●● ●●●● ●● ●● ● ●● ●● ● ● ●● ●● ● ●● ● ●● ●● ●● ●● ●●● ● ● ●● ●● ●● ●● ●●● ●● ●● ●● ●●● ● ●●● ● ●●● ●● ●●●● ●●●●● ● ●● ●●● ● ●● ●●● ● ●●● ●● ●●●● ● ●● ●● ●● ● ●●● ● ●●●●● ●● ● ●● ●●●● ● ● ●●●●●●●●● ●● ● ●● ●●●● ●●●●●● ● ●●● ●●● ● ●●●●● ● ●● ●●● ●●● ●● ● ●●●●● ●●● ●● ●●● ● ●●●● ● ●●● ●● ● ●● ●● ●● ●● ●● ●●● ● ●● ● ●●●●● ●● ● ● ●● ●●● ●●● ● ●● ●● ●● ● ●●●●● ●● ●● ●● ●● ● ●●●●●● ● ●●●●●● ●●● ●● ● ● ●● ●● ●● ●●●●●● ●●● ● ●● ●● ● ●●● ●● ●●●●● ● ●●●● ●●●●● ●●●●● ●● ●●● ● ●●● ● ●●●● ●●● ●● ●● ●● ●●●● ●● ●●●● ●● ●●● ●●● ●●●●● ● ●●● ● ●●● ●● ●● ●● ●●●●● ● ●● ●● ●● ● ●●● ●●●●●● ●●● ●● ●●●●● ● ●● ●●●●● ● ●●●●●● ●● ●●●●● ●● ●● ●● ● ●● ●●●● ●● ●● ●● ●● ● ●●●● ●●● ● ●● ●●●●● ● ● ● ●●● ●●●● ●●● ●●● ●●● ● ●● ●●● ● ● ● ●●● ●● ●●● ●● ●● ● ●● ●●● ● ●●● ●●● ●●● ●● ●● ●● ●●● ●●● ●●● ●● ●● ●● ● ●● ●●● ●● ●●● ●● ●●● ●● ●● ● ●●● ●● ● ●● ●●● ●●● ●● ●● ●●● ● ●●● ●●● ●●● ●● ●● ●● ●●●● ●● ●● ●●● ●● ● ●●●●●● ●● ●●●●●●● ●● ●● ●●●● ●● ●●● ●● ●● ●● ● ●● ● ● ●●● ● ●● ●● ● ●● ●● ●● ●●● ●●● ● ●●●● ●●● ●●● ●● ●● ●●● ●● ● ● ●● ●● ●● ●● ●●●● ● ● ●●● ●● ●● ● ●●●● ●●● ● ●●●● ●●●●● ● ●● ●●● ● ●● ●● ●●● ●● ● ●● ●● ●●● ●● ●● ●● ●●●●● ● ● ●● ●● ●● ●●● ●● ●●● ● ●● ●● ●●● ● ●●●●● ●●●●●● ●●● ●●● ●● ● ●● ●●●● ●● ●●● ●●● ●●●● ● ●● ● ●●● ●● ●●● ● ●● ●●●● ● ●● ● ●●● ● ●●●● ● ●●● ● ● ●● ●●● ● ●●●● ● ●● ● ●●● ● ●● ●●● ● ● ●● ●●● ●● ●● ● ●●● ● ●● ●●● ● ●● ● ●●● ● ●●●● ●● ●●● ●●● ●●● ●●

−4 −2 0 2 4 6 8

● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ● ●● ●●●● ●● ●● ●●●● ●●●● ●●●● ●● ●●● ● ●● ●● ●● ●● ●● ●● ●●● ●●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●● ● ●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●●●●● ● ●●● ●●●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●●● ●●●● ● ●● ●● ●●●● ●● ●●●●●●●● ●● ●●● ● ●●●●● ●●● ●● ●● ●● ●●● ● ●● ●●●●● ● ●●●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●●●●●●●●● ●●●● ● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●●●●●● ●●●● ●●● ●● ●● ●● ●●●●● ● ●● ●●●●●●●●● ●● ● ●● ●●● ● ●● ●●● ● ●● ●●● ●● ●● ●●● ●● ●●●● ●● ●●●● ●● ●●● ● ●●● ●● ● ●● ●● ●●●●●● ●● ●● ●● ●●●● ●●●●● ●●● ●●●●●●● ● ●● ●● ●●● ● ●● ●● ●● ●●●● ●●● ●●●● ●●●● ●●●● ●●●●●● ●● ●●●● ●●● ●●●●● ●●●●● ●●● ●●● ●● ●●●● ●●●●● ●●●●●● ● ●● ●●●● ●●●● ●●●● ●● ●● ●● ●●●●●● ●●● ● ●●●● ●●● ●●●● ●●● ●●●● ●●●●● ●● ●●● ●● ●● ●●●●●●●● ●● ●●● ● ●●● ● ●● ●● ●● ●● ●● ●● ●●● ●●●●●●●●●● ●● ● ●●●●● ●● ● ●● ●● ●●● ●●● ●●●●●● ●●●● ●●●●●●●●●● ●● ●● ●● ●●●● ●●●●●●● ●● ●●●●●●●●●● ●●●●● ●●● ●● ● ●● ●● ●● ●● ●● ●●●●●●● ●●●● ●● ● ●●● ● ●●● ●●●●● ●●●●● ●● ● ●●● ● ●●● ●●●● ●●●●● ●●●●● ●●●● ●● ● ●● ●●●●●●● ●●● ●●●●● ●● ● ●●●●●●●●● ● ●●●● ●●●● ●● ●● ●●●●● ●● ●●●● ●●●● ●●●● ●●● ●● ●●● ● ●● ●● ●● ●●●●●● ●● ●● ●● ●●● ● ●●●●●●●●● ●●● ●●● ● ●●●●●●● ●● ●●● ●● ●●●●● ●●●● ●●●●● ●●●●● ●● ●● ●●●● ● ●●● ● ●●●● ●● ●●●● ●●●●● ●● ● ●●●● ●● ●● ●●● ●● ●● ●● ●●●●●● ●● ● ●●● ●● ●● ●●●● ●●●●●●●●●● ●● ●●● ●● ●●●●● ●●●●●● ●● ● ●● ●●● ●●●● ●● ● ●● ●●● ●● ●●●● ●●●● ●● ●● ●● ●●● ●●●●● ● ●●● ●●●●●● ●●●● ●●●● ●● ●●●● ● ●●● ●● ●●● ●● ●●● ●● ●● ●● ● ●●● ●● ●● ●● ● ●●● ●●●●●●●●●● ● ●●●●● ● ●●●●● ● ●●● ●●●● ● ●●● ● ●● ●●● ●●●●●●● ●●● ● ●●●●●● ●●● ●●●●●●● ●●● ● ●●●● ●● ●●●●● ●● ●● ●● ●●●● ● ●●●●●● ●● ●●● ●●●● ●●●● ●●●●●● ● ●● ●● ●●● ●●●● ●● ●●●● ●● ●● ●● ●●●● ●●●● ● ●●●●● ●● ●● ●● ● ●●● ● ●●● ●● ● ●●● ● ●● ●●●●●●● ●● ●● ● ●●● ●●●●●●●●●●●● ●● ● ●●● ● ●●●● ●●●●●●● ●●●● ●● ●● ●● ●●●● ● ●●● ●●●● ●● ●●●● ●● ●●● ●●●●● ●● ● ●●● ●●●● ●●● ●●● ● ●●● ● ●●● ● ●●●●●● ●●● ●● ●●● ●●● ●● ●●●●●●●● ●●●● ●● ● ●●●●●●● ● ●●● ●● ●●●●● ●● ●● ●●● ●●●● ●●●●● ●●● ● ●●● ●● ●● ●● ●● ●●●●●● ●● ●● ●●●● ●● ●● ●● ●●●● ● ●●●●● ●● ●●●●●●●● ● ●● ●●●●● ●● ●●● ●●● ● ●●● ●● ●● ●●●● ●●●● ●●●● ●● ● ●●● ●● ● ●●● ●●●● ●●● ●●● ●●●● ●●●● ●● ●● ●●● ●●● ●● ●● ●● ●●● ●●● ● ●●● ●●●● ●●●● ●● ●●●● ●●●● ●● ●● ●●●● ● ●●● ● ●●● ●● ●● ●●●●● ●●●●● ● ●●●●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●●●● ●● ●● ●● ● ●●●●● ● ●●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ● ●● ●●● ●● ● ●●●●●●● ●● ● ●●● ●● ●● ●● ●●●● ● ●●● ●● ●● ●● ●● ●● ●● ●● ● ●●● ●● ●●●● ●● ●●● ●●● ●● ●● ●● ●● ●● ●●●● ● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●

large scale

incr

easin

g es

timat

e

• regression effect• variance effect

Page 11: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

●●

●●

●●

●●

●●

950 960 970 980 990 1000

0.5

1.0

1.5

2.0

2.5

x[950:1000]

y[950:1000]

rank of point estimate (from bottom)

stan

dard

err

or

Page 12: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

• large-scale• not sparse• ranking/sorting/prioritizing• variance artifacts• agreement• empirical Bayes • r-values

outline

Page 13: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

● ● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

● ●

●●

●● ●

●●

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●● ●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●●

●●●

●●

●●

● ●

● ●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●● ●

●●●

●●

● ●

●●

●●

●●

●●●●●

●● ●

●●●

●●

●●

●●●

● ●●

●● ●

●●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●

●●

●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●

●● ● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●●

●●● ●

●●●●

●●

●●

● ●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●●

●●

● ●●

●●

●●●●

●●●

●●

●●

●●

● ●

●●

● ●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●● ●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

● ●●

●●

●● ●

● ●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

● ●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

● ●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●●

● ●

● ●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

● ●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●●

●● ●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

● ●

●● ●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

● ●●

●●●●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●●● ●

●●

●●

●● ●

●●● ●

●● ●

●●●● ●● ●

●●● ●●● ●●●

●●●●●●●●●● ●

●●●

●●

●●● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●● ●

●●

●●

●●

● ●

●●●

● ●

●●

●●

● ●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●0.01 0.02 0.05 0.10

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Standard Error

Log

Odd

s

Example Type 2 Diabetes (T2D) GWAS (Morris et al. 2012, Nat Gen)

• case/control (22,669 / 58,119)

• lots of T2D associated loci, but of small effect

(3371 SNPs shown)

• ?how to rank order?

log

odds

ratio

Page 14: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

● ● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

● ●

●●

●● ●

●●

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●● ●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●●

●●●

●●

●●

● ●

● ●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●● ●

●●●

●●

● ●

●●

●●

●●

●●●●●

●● ●

●●●

●●

●●

●●●

● ●●

●● ●

●●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●

●●

●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●

●● ● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●●

●●● ●

●●●●

●●

●●

● ●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●●

●●

● ●●

●●

●●●●

●●●

●●

●●

●●

● ●

●●

● ●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●● ●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

● ●●

●●

●● ●

● ●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

● ●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

● ●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●●

● ●

● ●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

● ●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●●

●● ●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

● ●

●● ●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

● ●●

●●●●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●●● ●

●●

●●

●● ●

●●● ●

●● ●

●●●● ●● ●

●●● ●●● ●●●

●●●●●●●●●● ●

●●●

●●

●●● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●● ●

●●

●●

●●

● ●

●●●

● ●

●●

●●

● ●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●0.01 0.02 0.05 0.10

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Standard Error

Log

Odd

s

Example Type 2 Diabetes (T2D) GWAS (Morris et al. 2012, Nat Gen)

• case/control (22,669 / 58,119)

• lots of T2D associated loci, but of small effect

(3371 SNPs shown)

• ?how to rank order?

log

odds(T2D|A)

odds(T2D |Ac)

log

odds

ratio

Page 15: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

● ● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

● ●

●●

●● ●

●●

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●● ●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●●

●●●

●●

●●

● ●

● ●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●● ●

●●●

●●

● ●

●●

●●

●●

●●●●●

●● ●

●●●

●●

●●

●●●

● ●●

●● ●

●●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●

●●

●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●

●● ● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●●

●●● ●

●●●●

●●

●●

● ●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●●

●●

● ●●

●●

●●●●

●●●

●●

●●

●●

● ●

●●

● ●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●● ●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

● ●●

●●

●● ●

● ●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

● ●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

● ●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●●

● ●

● ●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

● ●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●●

●● ●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

● ●

●● ●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

● ●●

●●●●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●●● ●

●●

●●

●● ●

●●● ●

●● ●

●●●● ●● ●

●●● ●●● ●●●

●●●●●●●●●● ●

●●●

●●

●●● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●● ●

●●

●●

●●

● ●

●●●

● ●

●●

●●

● ●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●0.01 0.02 0.05 0.10

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Standard Error

Log

Odd

s

Example Type 2 Diabetes (T2D) GWAS (Morris et al. 2012, Nat Gen)

• case/control (22,669 / 58,119)

• lots of T2D associated loci, but of small effect

(3371 SNPs shown)

• ?how to rank order?

log

odds

ratio

Page 16: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

10 20 50 100 200 500 1000

0.05

0.10

0.20

0.50

set size N

prop

ortio

n of

set d

etec

ted

by R

NA

i

Example gene-set enrichment, RNAi (Hao et al. 2013, PLoS Comp Bio)

• 984 human genes linked to influenza-virus replication

• functional content measured against Gene Ontology (5719 sets)

• ?how to rank order?

Page 17: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Example

• 461 NBA players (2013-2014)

• free throw percentage

• ?how to rank order?

●●●●●●●●●●●●●

● ●● ●●●●● ●●● ●●● ●● ●● ● ●●●● ● ● ●●●●● ● ●● ●● ● ●● ●●● ● ●● ●●●●● ● ●● ●● ● ●●● ●●●● ●●● ●● ● ●●●● ●●●●●●● ●● ●● ●●● ●● ●●●● ●● ● ●● ●●● ● ●● ●●● ● ●●● ●●● ●●●●● ● ●●● ●●● ● ●●●● ●● ●●●● ● ●●●● ●●●●●● ●● ● ●● ●● ● ●● ● ●● ● ●●● ● ●●● ● ●●● ●●● ●●●● ●●● ●● ● ●● ●● ●● ● ●● ● ●● ●● ●●●● ●● ●● ●● ●● ●● ● ●●● ●● ●●●●●●●●● ●●●●● ●● ● ●● ●●● ●● ●● ●● ●● ● ●● ●● ●●●● ●●● ●● ●● ●●●● ● ● ●● ●●●●●● ●● ●●●●●●● ● ●●● ●● ● ●●● ●● ●●● ●●●●● ● ●● ●●● ● ● ●●●●●●●● ●●● ●● ● ●●●● ●● ●● ●●●● ●● ● ● ●● ●● ● ●●●● ●● ●●● ●●●● ●●● ●● ● ●●●●● ●● ● ●●●●●●● ● ●●● ●● ●●● ●● ●●●● ●● ●●●● ● ●● ●●

●●●●●●●●●●● ●● ● ●●● ●● ●

●●●●

●●

●●

●●●● ●●●

1 5 10 50 100 500 1000

0.0

0.2

0.4

0.6

0.8

1.0

# Free Throw Attempts

Free

Thr

ow P

erce

ntag

e

Page 18: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

simulation

Xi

�2i✓i

N�✓i,�

2i

f = N(0, 1)

signals noise levels

measured signals

g = Gam(a, b)

units i = 1, 2, . . . , B

Page 19: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

simulation

(X1,�21), (X2,�

22), · · · , (XB ,�

2B)data pairs:

ranking statistic: R1, R2, . . . , RB

aim: highly rank units with largest signals

Page 20: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

p-value lead units by p-value are enriched for those with small variance

σ

0 1 2 3

p (�i| p-valuei p0.1 )

p (�i)

Page 21: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

p-value lead units by p-value are enriched for those with small variance

σ

0 1 2 3

p (�i| p-valuei p0.1 )

p (�i)

same for q-value!

Page 22: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

rank by “local” maximum likelihood estimate

other approaches

• estimated log odds ratio• proportion of gene set on gene list • free throw percentage

Page 23: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

lead units by MLE are enriched for those with large variance

p (�i|Xi � x0.1 )

σ

0 1 2 3

p (�i)

local MLE

Page 24: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

lead units by posterior mean are enriched for those with small variance

σ

0 1 2 3

p (�i)

p {�i|E(✓i|Xi,�i) � e0.1 }

local PM

Page 25: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Problems and solutions

We’ve found a generic empirical bayes ranking/selection method

Improved ranking and selection 13

Hall, P. and H. Miller (2010). Modeling the variability of rankings. The Annals of Statis-tics 38 (5), 2652–2677.

Hao, L., Q. He, Z. Wang, M. Craven, M. A. Newton, and P. Ahlquist (2013). Limitedagreement of independent rnai screens for virus-required host genes owes more to false-negative than false-positive factors. PLoS computational biology 9 (9), e1003235.

Jost, J. and X. Li-Jost (1998). Calculus of variations, Volume 64. Cambridge UniversityPress.

Kass, R. E. and A. E. Raftery (1995). Bayes factors. Journal of the American StatisticalAssociation 90 (430), pp. 773–795.

Kendziorski, C., M. Newton, H. Lan, and M. Gould (2003). On parametric empiricalbayes methods for comparing multiple groups using replicated gene expression profiles.Statistics in medicine 22 (24), 3899–3914.

Laird, N. M. and T. A. Louis (1989). Empirical bayes ranking methods. Journal of Educa-tional and Behavioral Statistics 14 (1), 29–46.

Lehmann, E. (1986). Testing statistical hypotheses (2nd ed.). Wiley series in probabilityand mathematical statistics: Probability and mathematical statistics. Wiley.

Leng, N., J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. Smits, J. D.Haag, M. N. Gould, R. M. Stewart, and C. Kendziorski (2013). Ebseq: an empiricalbayes hierarchical model for inference in rna-seq experiments. Bioinformatics 29 (8),1035–1043.

Lin, R., T. A. Louis, S. M. Paddock, and G. Ridgeway (2006). Loss function based rankingin two-stage, hierarchical models. Bayesian Analysis 1 (4), 915–946.

McCarthy, D. J. and G. K. Smyth (2009). Testing significance relative to a fold-changethreshold is a treat. Bioinformatics 25 (6), 765–771.

Morris, A. P., B. F. Voight, T. M. Teslovich, T. Ferreira, A. V. Segre, V. Steinthorsdottir,R. J. Strawbridge, H. Khan, H. Grallert, A. Mahajan, et al. (2012). Large-scale associationanalysis provides insights into the genetic architecture and pathophysiology of type 2diabetes. Nature genetics 44 (9), 981–990.

Niemi, J. (2010). Evaluating individual player contributions in basketball. In JSM Proceed-ings, Statistical Computing Section, Alexandria, VA, pp. 4914–4923. American StatisticalAssociation.

Noma, H., S. Matsui, T. Omori, and T. Sato (2010). Bayesian ranking and selection methodsusing hierarchical mixture models in microarray studies. Biostatistics 11 (2), 281–289.

Normand, S.-L. T., M. E. Glickman, and C. A. Gatsonis (1997). Statistical methods forprofiling providers of medical care: issues and applications. Journal of the AmericanStatistical Association 92 (439), 803–814.

Paddock, S. M. and T. A. Louis (2011). Percentile-based empirical distribution functionestimates for performance evaluation of healthcare providers. Journal of the Royal Sta-tistical Society: Series C (Applied Statistics) 60 (4), 575–589.

12 Henderson and Newton

5.3. Theorem 3By continuity, V!{t!!(!

2),!2} = "!. WLOG assume that #1 < #2, and note that by defini-tion as cumulative probabilities, V!1

(x,!2) ! V!2(x,!2) for any arguments. If, contrary to

the assertion of the theorem, t!!1(!2) = t!!2

(!2) = t, at some !2, then

V!2(t, !2)" V!1

(t, !2) = "!2" "!1

.

However, from the assumed condition, we have

V!2(t, !2)" V!1

(t, !2) =

! !2

!1

$V!(t, !2)

$#d# >

! !2

!1

d"!

d#d# = "!2

" "!1,

and thus a contradiction.Remark: Since by definition $V!(x,!2)/$# > 0, the condition $V!(x,!2)/$# > d"!/d#is automatically satisfied whenever "! is decreasing in #.

Acknowledgements

This research was supported in part by two grants from the US National Institutes of Health:R21 HG006568 and T32 GM074904. The authors thank Christina Kendziorski for criticalcomments on an earlier draft. Additional details on data analyses, threshold functions, andcomputation are provided in a Supplementary Material document. The R package rvaluesis available at http://www.stat.wisc.edu/~newton/.

References

Berger, J. O. and J. Deely (1988). A bayesian approach to ranking and selection of relatedmeans with alternatives to analysis-of-variance methodology. Journal of the AmericanStatistical Association 83 (402), 364–373.

Brijs, T., D. Karlis, F. Van den Bossche, and G. Wets (2007). A bayesian model for rankinghazardous road sites. Journal of the Royal Statistical Society: Series A (Statistics inSociety) 170 (4), 1001–1017.

Coelho, C. A. and J. T. Mexia (2007). On the distribution of the product and ratio ofindependent generalized gamma-ratio random variables. Sankhya: The Indian Journalof Statistics , 221–255.

de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler, and M. P. Calus (2013).Whole-genome regression and prediction methods applied to plant and animal breeding.Genetics 193 (2), 327–345.

Efron, B. (2010). Large-scale inference: empirical Bayes methods for estimation, testing,and prediction, Volume 1. Cambridge University Press.

Gelman, A., P. N. Price, et al. (1999). All maps of parameter estimates are misleading.Statistics in Medicine 18 (23), 3221–3234.

Gibbons, J. D., I. Olkin, and M. Sobel (1979). An introduction to ranking and selection.The American Statistician 33 (4), 185–195.

Page 26: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

helpful observation • a ranking method corresponds to

a family of threshold functions:

T = {t↵ : ↵ 2 (0, 1)}

• unit i ranked in top if ↵ Xi � t↵(�2i )

Page 27: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

helpful observation • a ranking method corresponds to

a family of threshold functions:

T = {t↵ : ↵ 2 (0, 1)}

• each one is a function t↵(�2)X

�2

• unit i ranked in top if ↵ Xi � t↵(�2i )

Page 28: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

helpful observation • a ranking method corresponds to

a family of threshold functions:

T = {t↵ : ↵ 2 (0, 1)}

• each one is a function t↵(�2)X

�2

• size constraint: P�Xi � t↵(�

2i ) = ↵ (marginal!!)

• unit i ranked in top if ↵ Xi � t↵(�2i )

Page 29: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

a. MLE b. p−value

c. posterior mean d. maximal agreement

T2D example rank by sweeping through the family

X

�2 �2

Page 30: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

f=N(0,1)

14 Henderson and NewtonTable 1. Threshold functions associated with various ranking criteria, nor-mal/normal modelcriteria ranking variable threshold function t!(!

2)MLE Xi u!

PV H0 : "i = 0 Xi/!i u!!PV H0 : "i = c (Xi ! c)/!i c+ u!!PM Xi/(!

2i + 1) u!(!

2 + 1)

PER P ("i " "|Xi,!2i ) u!

!

(!2 + 1)(2!2 + 1)

BF 1(Xi > 0)P (Xi|"

2

i,#i !=0)

P (Xi|"2

i,#i=0)

"

!2(!2 + 1)#

u! + log ("2+1)"2

$

max agreement r-value "!(!2 + 1)! u!

!

!2(!2 + 1)

Pyeon, D., M. A. Newton, P. F. Lambert, J. A. Den Boon, S. Sengupta, C. J. Marsit,C. D. Woodworth, J. P. Connor, T. H. Haugen, E. M. Smith, et al. (2007). Fundamen-tal di!erences in cell cycle deregulation in human papillomavirus–positive and humanpapillomavirus–negative head/neck and cervical cancers. Cancer research 67 (10), 4605–4619.

Shen, W. and T. A. Louis (1998). Triple-goal estimates in two-stage hierarchical models.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60 (2), 455–471.

Smyth, G. K. et al. (2004). Linear models and empirical bayes methods for assessingdi!erential expression in microarray experiments. Stat Appl Genet Mol Biol 3 (1), 3.

Storey, J. D. (2003). The positive false discovery rate: A bayesian interpretation and theq-value. The Annals of Statistics 31 (6), pp. 2013–2035.

Wright, D. L., H. S. Stern, and N. Cressie (2003). Loss functions for estimation of extremawith an application to disease mapping. Canadian Journal of Statistics 31 (3), 251–266.

Xie, M., K. Singh, and C.-H. Zhang (2009). Confidence intervals for population ranks in thepresence of ties and near ties. Journal of the American Statistical Association 104 (486),775–788.

Page 31: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

nota bene

Even though the unit-level parameters are unobserved, their distribution may be well estimated.

Kiefer Wolfowitz, 1956

P (✓i � ✓↵) =

Z 1

✓↵

f(✓) d✓ = ↵

✓↵We can estimate such that:

Page 32: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Agreement

P�Xi � t↵(�

2i ), ✓i � ✓↵

reported in top fraction

truly in top fraction

Page 33: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

maximal agreement

Improved ranking and selection 3

g(!2). The empirical Bayesian uses the full data set to estimate the prior distributions f(")and g(!2); we ignore the estimation error at this level, and focus on ranking units withinthe estimated population.

Relative to a single unit i, Xi might be the maximum likelihood estimator of "i, and !i

that estimator’s standard error. The independence assumption may be reasonable if somecare has been taken in this local analysis, for example, by variance-stabilizing transforma-tion. Typically, the variance !2

i is estimated rather than known exactly, and we examinethis case in Section 5. There are a number of important examples where Xi is either dis-crete or multivariate and we also take up these extensions in Section 5. We consider firstthe continuous model, involving prior distributions and sampling distributions all havingdensities with respect to Lebesgue measure. **some regularity** The canonical samplingmodel within this class has Xi|"i,!2

i ! Normal("i,!2i ).

We make some headway by associating each ranking/selection procedure with a familyT of thresholding functions T = {t! : # " (0, 1)}. Each t! is a function t!(!2) havingthe interpretation that unit i is reported to be in the top # fraction of units if and only ifXi # t!(!2

i ). This interpretation is supported by the size constraint, namely, that marginalto all parameters and data,

P!

Xi # t!(!2

i )"

= # for all # " (0, 1) . (1)

Table 1 reports threshold functions associated with a variety of ranking methods in the nor-mal observation model, and under the extra condition that the prior f(") is Normal(µ, $2).Figure 2 illustrates four of these families in the T2D case study. Notionally, the linear rank-ing of units is obtained by sweeping through the family T , beginning with the smallest # atthe top of the graph. Clearly, distinct families of threshold functions can produce distinctrankings of the units, with the family’s shape revealing how it trades o! observed signal Xi

with measurement variance !2i to prioritize the leading units.

2.2. Thresholds via direct optimizationTable 1 and Figure 2 introduce a family T ! = {t!!} that is optimal in the continuous modelin the sense that for all # " (0, 1):

P!

Xi # t!!(!2

i ) , "i # "!"

# P!

Xi # t!(!2

i ) , "i # "!"

(2)

for any other family T = {t!} which also satisfies the size constraint (1). Here "! is the# upper quantile of the prior; that is P ("i # "!) = #. In other words, T ! maximizesagreement: the joint probability that unit i is placed in the top # fraction and its drivingparameter "i is in the top # fraction of the population, for all #. We emphasize thatthe probabilities in (2) cover the joint distribution of Xi,!2

i , "i, which respects both thesampling distribution of data local to unit i and the fluctuations of unit-specific parameters.A calculus-of-variations argument provides direct optimization of the joint probability in (2),subject to the size constraint, model regularity, smoothness of the threshold functions.

Theorem 1. In the continuous model, a necessary condition for the function t!! to beoptimal as in (2), within the class of continuously di!erentiable threshold functions, is thatit satisfies:

P!

"i # "!|Xi = t!!(!2),!2

i = !2"

= c! for all !2. (3)

Theorem 1: Under certain smoothness conditions,

{t⇤↵} is optimal if for all �2

Page 34: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

maximal agreement

can solve directly if f=N(0,1)

14 Henderson and NewtonTable 1. Threshold functions associated with various ranking criteria, nor-mal/normal modelcriteria ranking variable threshold function t!(!

2)MLE Xi u!

PV H0 : "i = 0 Xi/!i u!!PV H0 : "i = c (Xi ! c)/!i c+ u!!PM Xi/(!

2i + 1) u!(!

2 + 1)

PER P ("i " "|Xi,!2i ) u!

!

(!2 + 1)(2!2 + 1)

BF 1(Xi > 0)P (Xi|"

2

i,#i !=0)

P (Xi|"2

i,#i=0)

"

!2(!2 + 1)#

u! + log ("2+1)"2

$

max agreement r-value "!(!2 + 1)! u!

!

!2(!2 + 1)

Pyeon, D., M. A. Newton, P. F. Lambert, J. A. Den Boon, S. Sengupta, C. J. Marsit,C. D. Woodworth, J. P. Connor, T. H. Haugen, E. M. Smith, et al. (2007). Fundamen-tal di!erences in cell cycle deregulation in human papillomavirus–positive and humanpapillomavirus–negative head/neck and cervical cancers. Cancer research 67 (10), 4605–4619.

Shen, W. and T. A. Louis (1998). Triple-goal estimates in two-stage hierarchical models.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60 (2), 455–471.

Smyth, G. K. et al. (2004). Linear models and empirical bayes methods for assessingdi!erential expression in microarray experiments. Stat Appl Genet Mol Biol 3 (1), 3.

Storey, J. D. (2003). The positive false discovery rate: A bayesian interpretation and theq-value. The Annals of Statistics 31 (6), pp. 2013–2035.

Wright, D. L., H. S. Stern, and N. Cressie (2003). Loss functions for estimation of extremawith an application to disease mapping. Canadian Journal of Statistics 31 (3), 251–266.

Xie, M., K. Singh, and C.-H. Zhang (2009). Confidence intervals for population ranks in thepresence of ties and near ties. Journal of the American Statistical Association 104 (486),775–788.

Page 35: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

maximal agreementa. MLE b. p−value

c. posterior mean d. maximal agreement

X

�2

Page 36: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

maximal agreement

σ

0 1 2 3

p (�i)

p��i|Xi � t⇤0.1(�

2i )

pretty close, considering that we’re targeting agreement not the artifact

Page 37: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

local tail probability V↵(Xi,�

2i ) = P

�✓i � ✓↵|Xi,�

2i

4 Henderson and Newton

All observations coincident with the graph of a given optimal threshold curve thus havea common posterior probability c! that their unit-specific parameters exceed the quantile!! associated with that curve. In the normal model for Xi and the normal prior f(!),the optimal threshold function (Figure 2d) is readily extracted from (3). Working on astandardized scale without loss of generality (µ = 0 and "2 = 1), the local posterior for !iis normal with mean Xi/(#2

i + 1) and variance #2i /(#

2i + 1). Thus,

t!!(#2) = !!(#

2 + 1)! u!

!

#2(#2 + 1), (4)

where u! is determined by the size constraint (1). Indeed u! is a!ected by the distributiong(#2), since it is defined implicitly through the constraint-induced equation:

1! $ =

"

"

0

"

#

!! ! u!

$

#2

1 + #2

%

g(#2) d#2 (5)

where " is the standard normal cumulative distribution.**hard to discover explicit thresholds this way****put in the list-conditional variance story an a second plot series (maybe supplemen-

tary) showing how the optimal thresholds are least sigma biased **Some comments on the threshold functions in Table 1 are warranted... **put in story

about posterior expected ranks as approximate threshold function **

2.3. Posterior tail probabilities and ranking variablesInsight into the structure of optimal thresholds comes by further examining their relation-ship to local posterior tail probabilities: V!(Xi,#2

i ) = P (!i " !!|Xi,#2i ).

Theorem 2. Suppose that for every $ # (0, 1) there exists %! such that

P&

V!(Xi,#2

i ) $ %!

'

= $, (6)

and furthermore suppose that V!(x,#2) is right-continuous and non-decreasing in x forevery fixed $ and #2. Then the family of thresholds

t!!(#2) = inf{x : V!(x,#

2) " %!} (7)

satisfies the size constraint (1) and is optimal in the sense of (2).

A family of threshold functions is a device to think about converting observations intorankings (i.e. by sweeping through the family). Indeed, the index $ associated with thethreshold curve on which data point (Xi,#2

i ) lands may be viewed as a ranking variable.Computation of the ranking variable amounts to solving the inversion Xi = t!(#2

i ) for $,which is well defined under suitable conditions. **maybe refer to optimal normal modelproof** For the thresholds (7) that are optimal for agreement between the inferred top listand the true top list, we have

Theorem 3. Suppose that V!(x,#2) is continuous in x for every $ and #2, di!erentiablein $ for every x and #2, and further that %! is di!erentiable in $. If &V!(x,#2)/&$ >d%!/d$ for every ($, x,#2), then for any $1 %= $2 it holds that for all #2, t!!1

(#2) %= t!!2(#2).

Theorem 2: Under certain conditions, the optimal family satisfies: {t⇤↵}

where,

P�V↵(Xi,�

2i ) � �↵

= ↵

Page 38: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

next step

from thresholds back to ranking variables

Page 39: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

ranking variables ri(Xi,�

2i ) = inf{↵ : Xi � t↵(�

2i )}a. MLE b. p−value

c. posterior mean d. maximal agreement

Page 40: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

ranking variables ri(Xi,�

2i ) = inf{↵ : Xi � t↵(�

2i )}a. MLE b. p−value

c. posterior mean d. maximal agreement

Improved ranking and selection 5

Remarks...Like for all the families shown in Figure 2 and Table 1, the optimal thresholds do not

touch or cross under the conditions of Theorem 3, and they conform to our intuition abouthow ranking procedures might be constructed from threshold functions. Thus, we mayreasonably introduce a special ranking variable that inverts the optimal threshold. For theith unit, we define the r-value:

ri(Xi,!2

i ) = inf!

" : V!(Xi,!2

i ) ! #!

"

. (8)

Essentially, unit i is placed by its r-value at position r (a percentile, measured from thetop) if when ranking the units by P ($i ! $r|Xi,!2

i ), it also happens to land at positionr. Further, the top " fraction of units by r-value has higher overlap with the true top "fraction of units than could be obtained by any other ranking procedure, in the sense of (2).

It is worth recognizing that these findings go beyond what is already known about theuse of V!(Xi,!2

i ) to optimally rank units. **Louis/Lehmann** ; also you get di!erentresults when ranking by di!erent "... *** **example of where your in the top 5

3. Connections

3.1. Connection to Bayes ruleThe proposed ranking procedure is a kind of Bayes rule for a percentile when consideringmultiple loss functions and a distributional constraint. To see how, introduce a collectionof loss functions

L!(a, $i) = 1" 1 (a # ", $i ! $!)

where action a is a percentile value in (0, 1), " $ (0, 1) indexes the collection, and again$! = F!1(1"") is a quantile in the population of interest. Specifically, no ""loss occurs ifthe inferred upper percentile a and the actual upper percentile 1"F ($i) both are less than". The marginal (pre-posterior) Bayes risk of rule %(Xi,!2

i ) is

risk! = 1" P!

%(Xi,!2

i ) # " , $ ! $!"

, (9)

which is one minus the agreement (2). In the absence of other considerations, the Bayesrule for loss L! degerenates to %(Xi,!2

i ) = 0. Degeneration is avoided if we enforce on theestimated percentile the additional structure that it share with the true percentile 1"F ($i)the property of being uniformly distributed over the population of units. Such a constrainedBayes rule then minimizes the modified objective function:

risk! + &!P!

%(Xi,!2i ) # "

"

where &! is chosen to enforce the size constraint P!

%(Xi,!2i ) # "

"

= ".The constrained Bayes rule is computed conditionally, per observed (Xi,!2

i ), by mini-mizing the (modified) posterior expected loss (PEL)

PEL! = 1" P!

%(Xi,!2

i ) # " , $i ! $!#

#Xi,!2

i

"

+ &!1!

%(Xi,!2

i ) # ""

(10)

=

$

1" V!(Xi,!2i ) + &! if %(Xi,!2

i ) # "1 if %(Xi,!2

i ) > "

r-value

Page 41: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

r-value

smallest such that unit i in top when ranking by: V↵(Xi,�

2i ) = P

�✓i � ✓↵|Xi,�

2i

�↵ ↵

Page 42: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

general form

6 Henderson and Newton

without extra conditions there could be distinct !1 != !2 such that t!1("2

i ) = t!2("2

i ) forsome "2

i . For the thresholds (7) that are optimal for agreement between the inferred toplist and the true top list, we find the following.

Theorem 3. Suppose that V!(x,"2) is continuous in x for every ! and "2, di!erentiablein ! for every x and "2, and further that #! is di!erentiable in !. If $V!(x,"2)/$! >d#!/d! for every (!, x,"2), then for any !1 != !2 it holds that for all "2, t!!1

("2) != t!!2("2).

Like for all the families shown in Figure 3 and Table 1, the optimal thresholds do not touchor cross under the conditions of Theorem 3, and they conform to our intuition about howranking procedures might be constructed from threshold functions. Thus, we introduce aspecial ranking variable that inverts the optimal threshold. For the ith unit, we define ther-value:

r(Xi,"2

i ) = inf!

! : V!(Xi,"2

i ) " #!

"

. (8)

Essentially, unit i is placed by its r-value at position ! (a relative rank, measured from thetop) if when ranking the units by P (%i " %!|Xi,"2

i ), it also happens to land at position!. Further, the top ! fraction of units by r-value has higher overlap with the true top !fraction of units than could be obtained by any other ranking procedure, in the sense of (2).

It is worth recognizing that these findings go beyond what has been reported aboutthe use of the conditional tail probability V!(Xi,"2

i ) to rank units. Classical theory onoptimal selection establishes the role of this conditional tail probability in maximizing anexceedance probability within the selected sample (e.g., Lehmann, 1986, pages 117-118).Also, the conditional tail probability has been used for ranking (e.g., Normand et al., 1997;Niemi, 2010), and is closely related to a Bayes optimal ranking under a certain loss function(Lin et al., 2006). The critical di!erence with the proposed ranking is in the role of theindex !. Conceptually, we imagine ranking the units by V!(Xi,"2

i ) separately for all possibleindices ! (not just a pre-specified one); then the r-value for unit i is the smallest index !such that unit i is placed in the top ! fraction by that ranking. By aiming to maximizeagreement at all list sizes, the proposed method does not require a pre-specified exceedancelevel to generate its ranking.

2.4. More generalityThe r-value concept makes sense in various elaborations of the the measurement modelfrom Section 2.1. We retain univariate parameters of interest {%i} varying according toa distribution F , but we allow data Di on each unit to take more general forms than the(Xi,"2

i ) pair structure. We also retain the assumption of mutual independence among units,though extensions could be developed in cases where posterior computation is feasible. Inseeking units with largest %i, the critical quantity is the local exceedance probability:

V!(Di) = P (%i " %!|Di)

for ! # (0, 1) and for upper quantiles %! of the marginal distribution F : i.e., %! = F"1(1$!). Induced by the marginal distribution of Di, the tail probability V!(Di) has cumulativedistribution function H!(v), and from it we obtain the upper quantile: #! = H"1

! (1 $ !).Then by analogy to (8), the r-value is defined:

r(Di) = inf {! : V!(Di) " #!} .

6 Henderson and Newton

without extra conditions there could be distinct !1 != !2 such that t!1("2

i ) = t!2("2

i ) forsome "2

i . For the thresholds (7) that are optimal for agreement between the inferred toplist and the true top list, we find the following.

Theorem 3. Suppose that V!(x,"2) is continuous in x for every ! and "2, di!erentiablein ! for every x and "2, and further that #! is di!erentiable in !. If $V!(x,"2)/$! >d#!/d! for every (!, x,"2), then for any !1 != !2 it holds that for all "2, t!!1

("2) != t!!2("2).

Like for all the families shown in Figure 3 and Table 1, the optimal thresholds do not touchor cross under the conditions of Theorem 3, and they conform to our intuition about howranking procedures might be constructed from threshold functions. Thus, we introduce aspecial ranking variable that inverts the optimal threshold. For the ith unit, we define ther-value:

r(Xi,"2

i ) = inf!

! : V!(Xi,"2

i ) " #!

"

. (8)

Essentially, unit i is placed by its r-value at position ! (a relative rank, measured from thetop) if when ranking the units by P (%i " %!|Xi,"2

i ), it also happens to land at position!. Further, the top ! fraction of units by r-value has higher overlap with the true top !fraction of units than could be obtained by any other ranking procedure, in the sense of (2).

It is worth recognizing that these findings go beyond what has been reported aboutthe use of the conditional tail probability V!(Xi,"2

i ) to rank units. Classical theory onoptimal selection establishes the role of this conditional tail probability in maximizing anexceedance probability within the selected sample (e.g., Lehmann, 1986, pages 117-118).Also, the conditional tail probability has been used for ranking (e.g., Normand et al., 1997;Niemi, 2010), and is closely related to a Bayes optimal ranking under a certain loss function(Lin et al., 2006). The critical di!erence with the proposed ranking is in the role of theindex !. Conceptually, we imagine ranking the units by V!(Xi,"2

i ) separately for all possibleindices ! (not just a pre-specified one); then the r-value for unit i is the smallest index !such that unit i is placed in the top ! fraction by that ranking. By aiming to maximizeagreement at all list sizes, the proposed method does not require a pre-specified exceedancelevel to generate its ranking.

2.4. More generalityThe r-value concept makes sense in various elaborations of the the measurement modelfrom Section 2.1. We retain univariate parameters of interest {%i} varying according toa distribution F , but we allow data Di on each unit to take more general forms than the(Xi,"2

i ) pair structure. We also retain the assumption of mutual independence among units,though extensions could be developed in cases where posterior computation is feasible. Inseeking units with largest %i, the critical quantity is the local exceedance probability:

V!(Di) = P (%i " %!|Di)

for ! # (0, 1) and for upper quantiles %! of the marginal distribution F : i.e., %! = F"1(1$!). Induced by the marginal distribution of Di, the tail probability V!(Di) has cumulativedistribution function H!(v), and from it we obtain the upper quantile: #! = H"1

! (1 $ !).Then by analogy to (8), the r-value is defined:

r(Di) = inf {! : V!(Di) " #!} .

6 Henderson and Newton

without extra conditions there could be distinct !1 != !2 such that t!1("2

i ) = t!2("2

i ) forsome "2

i . For the thresholds (7) that are optimal for agreement between the inferred toplist and the true top list, we find the following.

Theorem 3. Suppose that V!(x,"2) is continuous in x for every ! and "2, di!erentiablein ! for every x and "2, and further that #! is di!erentiable in !. If $V!(x,"2)/$! >d#!/d! for every (!, x,"2), then for any !1 != !2 it holds that for all "2, t!!1

("2) != t!!2("2).

Like for all the families shown in Figure 3 and Table 1, the optimal thresholds do not touchor cross under the conditions of Theorem 3, and they conform to our intuition about howranking procedures might be constructed from threshold functions. Thus, we introduce aspecial ranking variable that inverts the optimal threshold. For the ith unit, we define ther-value:

r(Xi,"2

i ) = inf!

! : V!(Xi,"2

i ) " #!

"

. (8)

Essentially, unit i is placed by its r-value at position ! (a relative rank, measured from thetop) if when ranking the units by P (%i " %!|Xi,"2

i ), it also happens to land at position!. Further, the top ! fraction of units by r-value has higher overlap with the true top !fraction of units than could be obtained by any other ranking procedure, in the sense of (2).

It is worth recognizing that these findings go beyond what has been reported aboutthe use of the conditional tail probability V!(Xi,"2

i ) to rank units. Classical theory onoptimal selection establishes the role of this conditional tail probability in maximizing anexceedance probability within the selected sample (e.g., Lehmann, 1986, pages 117-118).Also, the conditional tail probability has been used for ranking (e.g., Normand et al., 1997;Niemi, 2010), and is closely related to a Bayes optimal ranking under a certain loss function(Lin et al., 2006). The critical di!erence with the proposed ranking is in the role of theindex !. Conceptually, we imagine ranking the units by V!(Xi,"2

i ) separately for all possibleindices ! (not just a pre-specified one); then the r-value for unit i is the smallest index !such that unit i is placed in the top ! fraction by that ranking. By aiming to maximizeagreement at all list sizes, the proposed method does not require a pre-specified exceedancelevel to generate its ranking.

2.4. More generalityThe r-value concept makes sense in various elaborations of the the measurement modelfrom Section 2.1. We retain univariate parameters of interest {%i} varying according toa distribution F , but we allow data Di on each unit to take more general forms than the(Xi,"2

i ) pair structure. We also retain the assumption of mutual independence among units,though extensions could be developed in cases where posterior computation is feasible. Inseeking units with largest %i, the critical quantity is the local exceedance probability:

V!(Di) = P (%i " %!|Di)

for ! # (0, 1) and for upper quantiles %! of the marginal distribution F : i.e., %! = F"1(1$!). Induced by the marginal distribution of Di, the tail probability V!(Di) has cumulativedistribution function H!(v), and from it we obtain the upper quantile: #! = H"1

! (1 $ !).Then by analogy to (8), the r-value is defined:

r(Di) = inf {! : V!(Di) " #!} .

data for unit i Di

local posterior tail probability

marginal upper quantile

r-value

Page 43: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

next step

how to calculate r-values

Page 44: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

NBA Binomial likelihoode.g., Beta prior/posteriors

Free Throw Ability

dens

ity

0.2 0.4 0.6 0.8 1.0

05

1015

2535

Page 45: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

NBA Binomial likelihoode.g., Beta prior/posteriors

Free Throw Ability

dens

ity

0.2 0.4 0.6 0.8 1.0

05

1015

2535

✓↵

Page 46: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

e.g., NBA

α

exce

edan

ce p

roba

bilit

y

0.002 0.005 0.010 0.020 0.050 0.100 0.200 0.500 1.0000

0.05

0.1

0.2

0.4

0.8

1

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●

P( θi ≥ θα | Di )two examplesempirical quantileλαr−value

DRay.Allen = 105 116

DLeBron.James = 439 585

Page 47: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Top 10

player Free throw % r-value post. mean

qual. rank

FTP rank

PM rank

RV rank

Brian Roberts 125/133 94.0 0.002 91.3 1 17 1 1 Ryan Anderson 59/62 95.2 0.003 89.8 15 2 2 Danny Granger 63/67 94.0 0.005 89.3 16 3 3 Kyle Korver 87/94 92.6 0.008 89.2 19 4 4 Mike Harris 26/27 96.3 0.010 86.6 14 15 5 J.J. Redick 97/106 91.5 0.011 88.6 22 6 6 Ray Allen 105/116 90.5 0.016 88.0 25 8 7 Mike Muscala 14/14 100.0 0.017 84.4 7 34 8 Dirk Nowitzki 338/376 89.9 0.018 89.1 2 30 5 9 Trey Burke 102/113 90.3 0.018 87.7 28 9 10 !

Page 48: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Predictive accuracy

●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

5 10 15 20 25

0.00

0.05

0.10

0.15

0.20

0.25

0.30

t = rank from top

E[ s

imila

rity_

t{ R

anks

(thet

a) ,

Ran

ks.h

at[m

idse

ason

] } |

com

plet

e se

ason

]

● ● ●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

● ● ●

● ● ● ● ● ● ● ● ● ● ●●

●●

●●

● ● ● ● ● ● ● ● ● ● ● ●

r−valueposterior meanMLE

Page 49: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

T2D

Improved ranking and selection 17

a. MLE b. p−value

c. posterior mean d. maximal agreement

Fig. 3. Threshold functions, T2D example, data and axes as in Fig 1: Calculations use an inverse-gamma model for !2. Forty two threshold functions are shown, ranging in " values from a smallpositive value (red) just including the first data point up to " = 0.10 (blue). (Most data points aretruncated by the plot, as in Fig 1; also, the grid is uniform on the scale of log2[! log2(")].) Unitsassociated with a smaller " (i.e., more red) are ranked more highly by the given ranking method.Two units landing on the same curve would be ranked in the same position.

Page 50: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

0 1 2 3 4 50

12

34

56

σ2

x

0 1 2 3 4 5

01

23

45

6

σ2

x

�2 ⇠ Exp(1)

kick-up

Page 51: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

N[ k]

MLE

O

O

O

O

O

O

O

O O

O

O

OO

O

O

O O

O

OO

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

OO

OO

OO

O

O

O O

O

O

O

OOO

O

OO

O

OOO

O OO

O

O

O OOO

O

O

OOO

O

O

O

O

OO

O

O

O

OO

OO O

OO

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO O

O

OO

O

OO

O

O

O

O

OO

OO

OO

OO

O

OO O

O

OO

OO

O

O

O

O

O

O

OOO

OO

O

OO

OO

O

O

O

OO

O

OO O

OO

O

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

OO

O

O

O

OO

OOO

O O

O

O

O

OOOO

OO

O

OO

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO

O

O

O

O

O

O

O O

OOO

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

OOO

O

OO

O

O

OOO

O

O O

O

O

O

O

O

OO

O

O

O O

O

OOOO

O

O

O O

OO

O

O

OO

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O O

OOO

O

OO

O

O

OO

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

O

O

OO

O

O

OO

O

O O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

OOO

O

O

OOO

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O O

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O O

OO

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O OO

O

OOO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O OO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

OO

OO

O

O

O

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O OO

O

O

O

OO

O

O

O

OO

O

O O

O

OO

O

O

O

OO

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O O

O

O

O

OO

O

O

O

O

O

O OO

O

O

OOO

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO O

O

O

O O

OO

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

O O

O

OO

O

O

O

O

O

O

OO

O

OOO

O

O

O

O

O

OO

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOOO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O O

O

OO

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OOO

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

OO

OO

OOO

O

O

O

OO

O

OO

O

O

OO

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O OO

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O O

OO

O

O

O

O

O

O

O

O

O O

O

O

O

OO

OO

O

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O O

O

O

OO

O

O

O

O

OO

O

O

OOOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOOO OO

O

O

O

O

OO

OO

O

O

O

OO

O

OO

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

OOO

OO

OO

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

OO

O

O

O

O

OO

O OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

OO

O

O

O

OO

O

O

OO

O

O

O

OO

O O

O

OO

O

O

O

O

OO

O

O

O

O

O

O O

O

O

O

OOO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

OOOOO

O

O

O

O

O

OO

O OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

OO

O

O

O

OO

O

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O O

O

O

OO

O

O O

O

O

O

O

O

OOO

OO

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

OO

OO

O

O

O

O

OOO

O

O

OO OO

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

OOO

O

O

O

O

O

OO

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

OOOO

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

OO

O

O

OO

O

O

OO

OOO O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO O

O

OOO O

O

O

OO

O

O

O

OO O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O O

O

O

O

O

OO

O

O

OOO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

OOO

O

O

OO

OOOOOOO

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O O

O O

OO

OO

O OOO

O

O

O

O

OO

O

OOO

OO

OO

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

O O

OO

O

OO

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O OO

O

OO

OO

O

O

OOOO

OO

O

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O O

O

O

O

O

O

O

O O

O

OO

O

O

O

O

O

OO O

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

O

O

OO

O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

OOO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O O

O

O

O

OOO

O

O

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O O

O

OO

O

OO

O

O

O

O

O

OO

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OOO

O

O

OO

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

O O

O

O

OO

OO

O

O

O

O

O

O O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O O

OO

O

O

O

O

OO

OO

OO

O

O

O

O

OO

OO

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O OO

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

OO O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

OO

O

OO

O

O

O

OO

O

O

O

O

O

O

OO

O

OO

O

O

OOO

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

OO

OO

O

O

O

OO

O

O

OO

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

OO

O

O O

OOO

O

O

O

O

OO

O

OO

O

O

O

O

OO

O

OOOOOOO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O OO

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO

O

O

OO

O

O

O OO

O

O

OO

O

OO

O

OO

O

OOO

O

O

O

O

O

O

O

O

OO

O O

O

O

O

O

O

O

O

O

OO

O O

O

O

OO

O

O

O

O

O

O

O

O

O

O

MLEN[ k]

MLE

O

O

O

O

O

O

O

O O

O

O

OO

O

O

O O

O

OO

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

OO

OO

OO

O

O

O O

O

O

O

OOO

O

OO

O

OOO

O OO

O

O

O OOO

O

O

OOO

O

O

O

O

OO

O

O

O

OO

OO O

OO

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO O

O

OO

O

OO

O

O

O

O

OO

OO

OO

OO

O

OO O

O

OO

OO

O

O

O

O

O

O

OOO

OO

O

OO

OO

O

O

O

OO

O

OO O

OO

O

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

OO

O

O

O

OO

OOO

O O

O

O

O

OOOO

OO

O

OO

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO

O

O

O

O

O

O

O O

OOO

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

OOO

O

OO

O

O

OOO

O

O O

O

O

O

O

O

OO

O

O

O O

O

OOOO

O

O

O O

OO

O

O

OO

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O O

OOO

O

OO

O

O

OO

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

O

O

OO

O

O

OO

O

O O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

OOO

O

O

OOO

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O O

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O O

OO

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O OO

O

OOO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O OO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

OO

OO

O

O

O

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O OO

O

O

O

OO

O

O

O

OO

O

O O

O

OO

O

O

O

OO

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O O

O

O

O

OO

O

O

O

O

O

O OO

O

O

OOO

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO O

O

O

O O

OO

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

O O

O

OO

O

O

O

O

O

O

OO

O

OOO

O

O

O

O

O

OO

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOOO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O O

O

OO

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OOO

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

OO

OO

OOO

O

O

O

OO

O

OO

O

O

OO

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O OO

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O O

OO

O

O

O

O

O

O

O

O

O O

O

O

O

OO

OO

O

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O O

O

O

OO

O

O

O

O

OO

O

O

OOOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOOO OO

O

O

O

O

OO

OO

O

O

O

OO

O

OO

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

OOO

OO

OO

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

OO

O

O

O

O

OO

O OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

OO

O

O

O

OO

O

O

OO

O

O

O

OO

O O

O

OO

O

O

O

O

OO

O

O

O

O

O

O O

O

O

O

OOO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

OOOOO

O

O

O

O

O

OO

O OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

OO

O

O

O

OO

O

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O O

O

O

OO

O

O O

O

O

O

O

O

OOO

OO

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

OO

OO

O

O

O

O

OOO

O

O

OO OO

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

OOO

O

O

O

O

O

OO

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

OOOO

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

OO

O

O

OO

O

O

OO

OOO O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO O

O

OOO O

O

O

OO

O

O

O

OO O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O O

O

O

O

O

OO

O

O

OOO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

OOO

O

O

OO

OOOOOOO

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O O

O O

OO

OO

O OOO

O

O

O

O

OO

O

OOO

OO

OO

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

O O

OO

O

OO

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O OO

O

OO

OO

O

O

OOOO

OO

O

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O O

O

O

O

O

O

O

O O

O

OO

O

O

O

O

O

OO O

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

O

O

OO

O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

OOO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O O

O

O

O

OOO

O

O

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O O

O

OO

O

OO

O

O

O

O

O

OO

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OOO

O

O

OO

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

O O

O

O

OO

OO

O

O

O

O

O

O O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O O

OO

O

O

O

O

OO

OO

OO

O

O

O

O

OO

OO

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O OO

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

OO O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

OO

O

OO

O

O

O

OO

O

O

O

O

O

O

OO

O

OO

O

O

OOO

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

OO

OO

O

O

O

OO

O

O

OO

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

OO

O

O O

OOO

O

O

O

O

OO

O

OO

O

O

O

O

OO

O

OOOOOOO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O OO

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO

O

O

OO

O

O

O OO

O

O

OO

O

OO

O

OO

O

OOO

O

O

O

O

O

O

O

O

OO

O O

O

O

O

O

O

O

O

O

OO

O O

O

O

OO

O

O

O

O

O

O

O

O

O

O

p−value

N[ k]

MLE

O

O

O

O

O

O

O

O O

O

O

OO

O

O

O O

O

OO

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

OO

OO

OO

O

O

O O

O

O

O

OOO

O

OO

O

OOO

O OO

O

O

O OOO

O

O

OOO

O

O

O

O

OO

O

O

O

OO

OO O

OO

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO O

O

OO

O

OO

O

O

O

O

OO

OO

OO

OO

O

OO O

O

OO

OO

O

O

O

O

O

O

OOO

OO

O

OO

OO

O

O

O

OO

O

OO O

OO

O

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

OO

O

O

O

OO

OOO

O O

O

O

O

OOOO

OO

O

OO

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO

O

O

O

O

O

O

O O

OOO

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

OOO

O

OO

O

O

OOO

O

O O

O

O

O

O

O

OO

O

O

O O

O

OOOO

O

O

O O

OO

O

O

OO

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O O

OOO

O

OO

O

O

OO

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

O

O

OO

O

O

OO

O

O O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

OOO

O

O

OOO

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O O

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O O

OO

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O OO

O

OOO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O OO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

OO

OO

O

O

O

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O OO

O

O

O

OO

O

O

O

OO

O

O O

O

OO

O

O

O

OO

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O O

O

O

O

OO

O

O

O

O

O

O OO

O

O

OOO

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO O

O

O

O O

OO

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

O O

O

OO

O

O

O

O

O

O

OO

O

OOO

O

O

O

O

O

OO

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OOOO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O O

O

OO

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OOO

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

OO

OO

OOO

O

O

O

OO

O

OO

O

O

OO

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O OO

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O O

OO

O

O

O

O

O

O

O

O

O O

O

O

O

OO

OO

O

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O O

O

O

OO

O

O

O

O

OO

O

O

OOOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOOO OO

O

O

O

O

OO

OO

O

O

O

OO

O

OO

O

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

OOO

OO

OO

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OO

OO

O

O

O

O

OO

O OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

OO

O

O

O

OO

O

O

OO

O

O

O

OO

O O

O

OO

O

O

O

O

OO

O

O

O

O

O

O O

O

O

O

OOO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

OOOOO

O

O

O

O

O

OO

O OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

OO

O

O

O

OO

O

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O O

O

O

OO

O

O O

O

O

O

O

O

OOO

OO

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

OO

OO

O

O

O

O

OOO

O

O

OO OO

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

O

O

OOO

O

O

O

O

O

OO

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

OOOO

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

OO

O

O

OO

O

O

OO

OOO O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO O

O

OOO O

O

O

OO

O

O

O

OO O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

O

OO

OO

O

O

O

O

O

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O O

O

O

O

O

OO

O

O

OOO

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

OOO

O

O

O

O

OOO

O

O

OO

OOOOOOO

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

OO

O

O

OO

O

O

O

O

OO

O

OO

O

OO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O O

O O

OO

OO

O OOO

O

O

O

O

OO

O

OOO

OO

OO

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

O O

OO

O

OO

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O OO

O

OO

OO

O

O

OOOO

OO

O

O

OO

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O O

O

O

O

O

O

O

O O

O

OO

O

O

O

O

O

OO O

O

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

O

O

OO

O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

OOO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

OO

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O O

O

O

O

OOO

O

O

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O O

O

OO

O

OO

O

O

O

O

O

OO

O

O

OO

OOO

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

O

OO

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

OOO

O

O

OO

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

OO

O

OO

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

OO

O

O

OO

OO

O

O

O

O

O

O

O

O O

O

O

OO

OO

O

O

O

O

O

O O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O O

OO

O

O

O

O

OO

OO

OO

O

O

O

O

OO

OO

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O OO

O

O O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OOO

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

OO

O

OO

O

O

OO

O

O

O

O

O

O

O

O

OO O

O

O

O

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OOO

OO

O

OO

O

O

O

OO

O

O

O

O

O

O

OO

O

OO

O

O

OOO

OO

OO

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

OO

O

O

O

O

O

OO

O

O

O

OO

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

OO

O

O

O

O

O

OO

O

O

OO

O

O

OO

OO

O

O

O

OO

O

O

OO

OOO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

OO

O

O O

OOO

O

O

O

O

OO

O

OO

O

O

O

O

OO

O

OOOOOOO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

O

O OO

O

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

OO

O O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OOO

O

O

OO

O

O

O OO

O

O

OO

O

OO

O

OO

O

OOO

O

O

O

O

O

O

O

O

OO

O O

O

O

O

O

O

O

O

O

OO

O O

O

O

OO

O

O

O

O

O

O

O

O

O

O

posterior mean

1

1/4

1/16

0

−1/16

−1/4

−1

(X−R)/(X+R)

RNAiexample

Page 52: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Bayes?

Improved ranking and selection 7

Figure 5 compares r-value rankings with three other methods in the RNAi example fromFigure 2. In this example, Di = (ni, yi) holds binomial information (set size ni and numberyi of genes in set i that were identified by RNAi). The target parameters !i are treated asdraws from a Beta(a, b) distribution, with shape parameters estimated by marginal maxi-mum likelihood, and the conditional tail probability V!(Di) becomes the probability that aBeta(a+ yi, b + ni ! yi) variable exceeds !!. R-value computation (see Section 4) requiresthe sampling distribution of these tail probabilities, which we approximated using the datafrom all 5719 sets under study. The methods compared in Figure 5 agree to some extent onthe ranking of the most interesting sets, but systematic di!erences are apparent. Rankingby yi/ni over-ranks small sets; ranking by p-value over-ranks large sets; and ranking byposterior mean (yi + a)/(ni+ a+ b) also over-ranks large sets, though to a lesser degree, allcompared to the r-value ranking.

R-values may be computed in all sorts of hierarchical modeling e!orts, including semi-parametric models and cases where Markov chain Monte Carlo (MCMC) is used to approx-imate the marginal posterior distribution of each !i given available data. Figure 6 comparesthe r-value ranking with other rankings in an example from gene-expression analysis, whereevidence suggested that the expression of a large fraction of the human genome was associ-ated with the status of a certain viral infection (Pyeon, et al., 2007). A multi-level modelinvolving both null and non-null genes as well as t!distributed non-null e!ects !i exhibitedgood fit to the data, but did not admit a closed form for V!(Di). R-values, computed usingMCMC output, again reveal systematic ranking di!erences from other approaches.

Multi-level models drive statistical inference and software in a variety of genomic do-mains: for example, limma (Smyth, 2004), EBarrays (Kendziorski et al. 2003), EBSeq (Lenget al. 2013), among others. Since these models happen to specify distributional forms forparameters of interest, the associated code could be augmented to compute posterior tailprobabilities V!(Di) and thus r-values for ranking. The limma system utilizes a conjugatenormal, inverse-gamma model, and so V!(Di) involves the tail probability of a non-centralt distribution. The EBSeq system entails a conjugate beta, negative-binomial model, andso V!(Di) for di!erential expression involves tail probabilities in a certain ratio distribution(Coelho and Mexia, 2007). One expects the benefits of r-value computation to show espe-cially in cases involving many non-null units and relatively high variation among units intheir variance parameters (e.g., sequence read depth).

3. Connections

3.1. Connection to Bayes ruleThe proposed ranking procedure is a kind of Bayes rule for a population relative rank whenconsidering multiple loss functions and a distributional constraint. To see how, introduce acollection of loss functions

L!(a, !i) = 1! 1 (a " ", !i # !!)

where action a is a relative rank value in (0, 1), " $ (0, 1) indexes the collection, and again!! = F!1(1 ! ") is a quantile in the population of interest. Specifically, no "!loss occursif the inferred relative rank a and the actual relative rank 1 ! F (!i) both are less than ".The marginal (pre-posterior) Bayes risk of rule #(Di) is

risk! = 1! P {#(Di) " " , ! # !!} , (9)

Improved ranking and selection 7

Figure 5 compares r-value rankings with three other methods in the RNAi example fromFigure 2. In this example, Di = (ni, yi) holds binomial information (set size ni and numberyi of genes in set i that were identified by RNAi). The target parameters !i are treated asdraws from a Beta(a, b) distribution, with shape parameters estimated by marginal maxi-mum likelihood, and the conditional tail probability V!(Di) becomes the probability that aBeta(a+ yi, b + ni ! yi) variable exceeds !!. R-value computation (see Section 4) requiresthe sampling distribution of these tail probabilities, which we approximated using the datafrom all 5719 sets under study. The methods compared in Figure 5 agree to some extent onthe ranking of the most interesting sets, but systematic di!erences are apparent. Rankingby yi/ni over-ranks small sets; ranking by p-value over-ranks large sets; and ranking byposterior mean (yi + a)/(ni+ a+ b) also over-ranks large sets, though to a lesser degree, allcompared to the r-value ranking.

R-values may be computed in all sorts of hierarchical modeling e!orts, including semi-parametric models and cases where Markov chain Monte Carlo (MCMC) is used to approx-imate the marginal posterior distribution of each !i given available data. Figure 6 comparesthe r-value ranking with other rankings in an example from gene-expression analysis, whereevidence suggested that the expression of a large fraction of the human genome was associ-ated with the status of a certain viral infection (Pyeon, et al., 2007). A multi-level modelinvolving both null and non-null genes as well as t!distributed non-null e!ects !i exhibitedgood fit to the data, but did not admit a closed form for V!(Di). R-values, computed usingMCMC output, again reveal systematic ranking di!erences from other approaches.

Multi-level models drive statistical inference and software in a variety of genomic do-mains: for example, limma (Smyth, 2004), EBarrays (Kendziorski et al. 2003), EBSeq (Lenget al. 2013), among others. Since these models happen to specify distributional forms forparameters of interest, the associated code could be augmented to compute posterior tailprobabilities V!(Di) and thus r-values for ranking. The limma system utilizes a conjugatenormal, inverse-gamma model, and so V!(Di) involves the tail probability of a non-centralt distribution. The EBSeq system entails a conjugate beta, negative-binomial model, andso V!(Di) for di!erential expression involves tail probabilities in a certain ratio distribution(Coelho and Mexia, 2007). One expects the benefits of r-value computation to show espe-cially in cases involving many non-null units and relatively high variation among units intheir variance parameters (e.g., sequence read depth).

3. Connections

3.1. Connection to Bayes ruleThe proposed ranking procedure is a kind of Bayes rule for a population relative rank whenconsidering multiple loss functions and a distributional constraint. To see how, introduce acollection of loss functions

L!(a, !i) = 1! 1 (a " ", !i # !!)

where action a is a relative rank value in (0, 1), " $ (0, 1) indexes the collection, and again!! = F!1(1 ! ") is a quantile in the population of interest. Specifically, no "!loss occursif the inferred relative rank a and the actual relative rank 1 ! F (!i) both are less than ".The marginal (pre-posterior) Bayes risk of rule #(Di) is

risk! = 1! P {#(Di) " " , ! # !!} , (9)

8 Henderson and Newton

which is one minus the agreement (2). In the absence of other considerations, the Bayesrule for loss L! degenerates to !(Di) = 0. Degeneration is avoided if we enforce on theestimated rank the additional structure that it share with the true rank 1 ! F ("i) theproperty of being uniformly distributed over the population of units. Such a constrainedBayes rule then minimizes the modified objective function:

risk! + #!P {!(Di) " $}

where #! is chosen to enforce the size constraint P {!(Di) " $} = $.The constrained Bayes rule is computed conditionally, per observed Di, by minimizing

the (modified) posterior expected loss (PEL)

PEL! = 1! P {!(Di) " $ , "i # "!|Di}+ #!1 {!(Di) " $} (10)

=

!

1! V!(Di) + #! if !(Di) " $1 if !(Di) > $

where V!(Di) is the upper posterior probability P ("i # "!|Di) appearing in Section 2.Curiously, finding the rule to minimize PEL! is not determined at a single $, since mini-mization in (10) requires only that

!(Di) " $ $% V!(Di) # #!. (11)

However, taking all losses together does fix a procedure:

!(Di) = inf {$ : V!(Di) # #!} . (12)

The thresholds #! are determined by the uniformity constraint, and we have #! = H!1! (1!

$), where H! is the marginal distribution of V!(Di), counting all sources of variation,and so #! = %! from the previous section. In other words, the procedure obtained by thisconstrained, multi-loss Bayes calculation is equivalent to the r-value introduced in Section 2.

3.2. Beyond p’s and q’sIn testing a single hypothesis H0, the sample space may be structured as a nested sequenceof subsets, {!! : $ & (0, 1)}, say, such that rejection of a size $ test is equivalent to data Dlanding in set (i.e., rejection region) !!. Then, the p-value of the test is

p(D) = inf{$ : D & !!}.

Storey (2003) extended this idea to multiple testing and the positive false discovery rate withthe introduction of the q-value. Specifically, with another nested sequence {!! : $ & (0, 1)}indexed such that

P (H0|D & !!) = $,

the q-value is q(D) = inf{$ : D & !!}. Where p-values refer to the distribution of D onH0, and q-values the conditional probability of H0 given sample information, the proposedr-values refer to a marginal probability. The size constraint (1) corresponds to anothersequence of subsets, {!!}, say, for which the marginal constraint holds: P (D & !!) = $.Analogously, the r-value is r(D) = inf{$ : D & !!}. In principle an r-value could be definedfor any indexed ranking method, though we have reserved the definition for that methodwhich maximizes agreement (2).

multi-loss

marginally constrained

Page 53: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

• ranking to maximize agreement• large-scale, non-sparse settings• empirical Bayes inference

• R-package at CRAN: rvalues• manuscript at arXiv: 1312.5776

summary

Page 54: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

context

Page 55: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

interdisciplinary biostatistics

Page 56: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

genome technology: optical mapping

Page 57: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

cancer biology: polyclonality

Page 58: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

melanoma: T cell diversity

Page 59: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

virology: host/virus interactions

Page 60: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

genome biology: stress response

Page 61: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

stem cells: iPS vs hES

Page 62: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

colon cancer: APC modifiers

Page 63: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

melanoma: transcript regulation

Page 64: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods

Karen Montgomery

David Schwartz

David Schwartz

Christina KendziorskiVijay Setaluri

Rich Halberg

Mark Albertini

Bill Dove

Paul Ahlquist Audrey Gasch

genomics: gene set analysis

Page 65: Putting lots of things in order: r-values for ranking in large-scale …pages.stat.wisc.edu/~newton/talks/cdc.pdf · 2014-08-28 · Whole-genome regression and prediction methods