10 throws - arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12...

37

Upload: others

Post on 12-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14
Page 2: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

2 3 4 5 6 7 8 9 10 11

10 throws

dice total

frequency

0.0

0.5

1.0

1.5

2.0

> length( p[p==7] ) / throws[1] 0.1

Page 3: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

2 3 4 5 6 7 8 9 10 11 12

100 throws

dice total

frequency

05

10

15

20

> length( p[p==7] ) / throws[1] 0.14

2 3 4 5 6 7 8 9 10 11

10 throws

dice total

frequency

0.0

0.5

1.0

1.5

2.0

Page 4: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

2 3 4 5 6 7 8 9 10 11 12

100 throws

dice total

frequency

05

10

15

20

> length( p[p==7] ) / throws[1] 0.195

2 3 4 5 6 7 8 9 10 11

10 throws

dice total

frequency

0.0

0.5

1.0

1.5

2.0

2 3 4 5 6 7 8 9 10 11 12

1000 throws

dice total

frequency

050

100

150

Page 5: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

2 3 4 5 6 7 8 9 10 11 12

100 throws

dice total

frequency

05

10

15

20

> length( p[p==7] ) / throws[1] 0.17865

2 3 4 5 6 7 8 9 10 11

10 throws

dice total

frequency

0.0

0.5

1.0

1.5

2.0

2 3 4 5 6 7 8 9 10 11 12

1000 throws

dice total

frequency

050

100

150

2 3 4 5 6 7 8 9 10 11 12

1e+05 throws

dice total

frequency

05000

10000

15000

Page 6: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

resampling & empirical likelihood estimation

• most  sta's'cal  es'ma'on  premised  on  repeated  “experiments”:

• if  data  generated  many  'mes,  what’s  the  expected  outcome?

• instead  of  actually  repea'ng,  write  formula  that  computes  the  expecta'on  (and  likelihood  of  observed  data)

Page 7: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

resampling & empirical likelihood estimation

• what  if  we  can’t  write  the  formula?

• some'mes  impossible

• oBen  very  hard

• can  simulate  repeat  sampling  and  compute  the  expecta'on  EMPIRICALLY  (some'mes  called  “monte  carlo”  likelihood  es'ma'on)

Page 8: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

binomial example

• frequencies  of  dead  tadpoles  (again)  in  pools  of  5

• what  is  chance  of  death?

• easy  problem,  but  suppose  we  can’t  write  the  formula...

0 1 2 3 4 50

510

15

20

25

30

Page 9: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

binomial example

• even  when  can’t  write  likelihood  expression,  can  usually  simulate  data,  condi'onal  on  parameters

• strategy

• (1)  generate  a  datum,  condi'onal  on  parameters

• (2)  do  (1)  a  bunch  of  'mes

• (3)  observe  freq  of  real  data  in  distribu'on  from  (2).  this  is  the  likelihood  es'mate.

Page 10: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

binomial example

prob  =  0.4

10  000  replicates

0 1 2 3 4 5

0500

1500

2500

0 1 2 3 4 5

0500

1500

2500

Likelihood  of  1

> length( k[k==1] )/10000[1] 0.2626

Page 11: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical likelihood estimation

• at  each  set  of  parameter  values,  need  to  simulate  the  distribu'on

• may  need  many  replicates  to  get  a  smooth  picture  of  likelihood  surface

• careful  of  returning  zero  (0)  likelihoods.  NO  EVENT  should  ever  have  zero  chance  of  happening.

Page 12: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical likelihood estimation

• This  func'on  does  the  same  thing  as  dbinom(),  but  it  does  it  via  simula'on.

dsimbinom <- function( x , prob , size , log=TRUE , R=99 ) { e <- rbinom( R , prob=prob , size=size ) p <- log( sapply( x , function(y) length(e[e==y])/R ) ) p}

Page 13: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical likelihood estimation

0.2 0.3 0.4 0.5 0.6

150

160

170

180

190

200

99 replicates

prob

-logLik

0.2 0.3 0.4 0.5 0.6

150

160

170

180

190

200

999 replicates

prob

-logLik

0.2 0.3 0.4 0.5 0.6

150

160

170

180

190

200

9999 replicates

prob-logLik

Red  curve  is  real  analy/cal  likelihood  func/on

Page 14: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical likelihood estimation• “jaggies”  bad.  helps  to  use  

SIMULATED  ANNEALING  (SA)  (method=”SANN”)

• SA  hill-­‐climbs,  like  most  algorithms,  but  also  climbs  DOWN,  with  slowly  decreasing  probability  (as  it  “cools”)

m.prob <- mle2( k ~ dbinom( prob=1/(1+exp(z)) , size=5 ) , start=list(z=0) )

m.sim <- mle2( k ~ dsimbinom( prob=1/(1+exp(z)) , size=5 , R=999 ) , start=list(z=0) , method="SANN" )

0.2 0.3 0.4 0.5 0.6

160

170

180

190

200

210

prob

-logLik

Page 15: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical likelihood estimation

k <- rbinom( 100 , size=5 , prob=0.4 )

> sum(k)/500[1] 0.388

> logit(coef(m.prob)) z 0.388 > logit(coef(m.sim)) z 0.390865

m.prob <- mle2( k ~ dbinom( prob=1/(1+exp(z)) , size=5 ) , start=list(z=0) )m.sim <- mle2( k ~ dsimbinom( prob=1/(1+exp(z)) , size=5 , R=999 ) , start=list(z=0) , method="SANN" )

Page 16: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

more complex example

• beta-­‐binomial  distribu'on:

• binomial  probabili'es  sampled  from  beta  distribu'on

• has  an  analy'cal  solu'on,  but  we’ll  do  it  empirically  now

Page 17: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

0.0 0.2 0.4 0.6 0.8 1.0

probability of death

pro

babili

ty o

f pro

babili

ty o

f death

p1 = 0.4 p2 = 0.65 p3 = 0.12

40%60% 65%35% 12%88%

beta  distributed  chances  of  mortality

binomial  trials  determine  actual  deaths  in  each  pool

Page 18: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

0 1 2 3 4 5 6 7 8 9 10

count of dead tadpoles in pool

num

ber

of pools

050

100

150

200

0 1 2 3 4 5 6 7 8 9 10

count of dead tadpoles in pool

num

ber

of pools

020

40

60

80

100

0.0 0.2 0.4 0.6 0.8 1.0

probability of death

pro

babili

ty o

f pro

babili

ty o

f death

rbinom( 1000 , prob=0.5 , size=10 )

rbetabinom( 1000 , shape1=0.9 , shape2=0.9 , size=10 )

p = 0.5

binomial beta-­‐binomial

Page 19: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

0 1 2 3 4 5 6 7 8 9 10

count of dead tadpoles in pool

num

ber

of pools

050

100

150

200

rbinom( 1000 , prob=0.5 , size=10 )

rbetabinom( 1000 , shape1=2 , shape2=2 , size=10 )

p = 0.5

binomial beta-­‐binomial

0.0 0.2 0.4 0.6 0.8 1.0

probability of death

pro

babili

ty o

f pro

babili

ty o

f death

0 1 2 3 4 5 6 7 8 9 10

count of dead tadpoles in pool

num

ber

of pools

020

60

100

Page 20: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

heterogeneous tadpoles

0.0 0.2 0.4 0.6 0.8 1.0

0.000

0.005

0.010

0.015

0.020

x

y/i

0.0 0.2 0.4 0.6 0.8 1.0

0.000

0.005

0.010

0.015

0.020

x

y/i

a  =  2  ,  b  =  2

a  =  0.7  ,  b  =  0.7

0.0 0.2 0.4 0.6 0.8 1.0

0.000

0.005

0.010

0.015

0.020

x

y/i

0.0 0.2 0.4 0.6 0.8 1.0

0.000

0.005

0.010

0.015

0.020

x

y/i

a  =  1  ,  b  =  2

a  =  1  ,  b  =  0.7

probability  has  rot

prob

ability  of  p

robability  has  rot

Page 21: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical beta-binomial

• out  of  5  tadpoles,  how  many  dead?

• assume  that  mortality  correlated  WITHIN  pools

0 1 2 3 4 5

dead tadpolesfrequency

05

10

15

20

25

Page 22: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical beta-binomial

dsimbetabinom <- function( x , shape1 , shape2 , size , log=TRUE , R=99 ) {

# sample R probabilities from betabinom p <- rbeta( R , shape1=shape1 , shape2=shape2 )

# sample each event from p e <- rbinom( R , size=size , prob=p )

# observe log-freq of each x in distribution of e log( sapply( x , function(y) length(e[e==y])/R ) ) }

Page 23: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical beta-binomial

• The  analy'cal  way:

> library(emdbook)> m.prob <- mle2( k ~ dbetabinom( size=5 , shape1=exp(s1) , shape2=exp(s2) ) , start=list( s1=1,s2=1 ) )

> exp(coef(m.prob)) s1 s2 1.967890 2.010719

Page 24: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical beta-binomial

• The  empirical  way:

> m.sim <- mle2( k ~ dsimbetabinom( shape1=exp(s1) , shape2=exp(s2) , size=5 , R=999 ) , start=list( s1=1 , s2=1 ) , method="SANN" )

> exp(coef(m.sim)) s1 s2 2.013872 1.964869

Page 25: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

empirical likelihood estimation

• Problems  that  require  empirical  likelihood  methods

• complex  phylogene'c  models

• complex  popula'on  structure  models

• almost  all  Bayesian  analyses

• almost  all  network  models

• many  “mixed  effects”  models

• many  'me  series  models

Page 26: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• a  special  kind  of  resampling  aimed  at  es'ma'ng  variance  of  an  es'mate  (confidence  intervals)

• suppose  we  can’t  es'mate  confidence  from  likelihood  surface  (can’t  write  a  formula,  perhaps)

• can  treat  sample  like  a  popula'on,  and  take  many  samples  of  same  size  from  it

• theory  tells  us  that  as  sample  size  increases,  variance  in  resampled  es'mates  converges  to  true  variance

Page 27: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• (1)  sample  n  data  from  original  size  n  sample,  WITH  REPLACEMENT  

• (2)  do  (1)  many  'mes

• (3)  as  n  increases,  histogram  from  (2)  approaches  true  likelihood  surface

• (4)  find  values  of  parameter  in  histogram  that  mark  different  confidence  limits

Page 28: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrap estimates

• simplest  confidence  intervals  are  just  read  from  the  histogram

• e.g.  95%  intervalslow:  value  just  above  2.5%  of  the  valueshigh:  value  just  above  97.5%  of  the  values

Histogram of 1/(1 + exp(b$t))

1/(1 + exp(b$t))F

req

ue

ncy

0.25 0.30 0.35

05

01

00

15

02

00

Page 29: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• Original  data:

• Es'mate  parameter:

k <- rbinom( 100 , size=5 , prob=0.3 )

m <- mle2( k ~ dbinom( prob=logit(z) , size=5 ) , start=list(z=0) )

Page 30: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• Resample  999  sets  of  data  from  original  data,  and  re-­‐es'mate  mle  for  each:

plist <- replicate( 999 , coef(mle2( sample(k,100,TRUE) ~ dbinom( prob=logit(z) , size=5 ) , start=list(z=coef(m)[1]) , method="Nelder-Mead" ) ) )

logit( quantile( plist , probs=c(0.025,0.975) ) )

Page 31: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• Can  find  95%  confidence  interval  just  by  cuhng  off  lower  and  upper  2.5%

• Here:  0.251,  0.335

• confint()  gives:  0.259,  0.339

Histogram of logit(plist)

logit(plist)

Frequency

0.25 0.30 0.35

050

100

150

200

logit( quantile( plist , probs=c(0.025,0.975) ) )

Page 32: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• More  complicated  models  are  easier  to  do  with  the  boot  library.

• Consider  modeling  log  body  mass  against  log  brain  mass,  for  various  species  (at  right).

0 5 10

02

46

8log body mass

log b

rain

mass

Dinosaurs

Page 33: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

0 5 10

02

46

8

log body mass

log b

rain

mass

plot( log(d$brain) ~ log(d$body) , xlab="log body mass" , ylab="log brain mass" )

abline( lm( log(d$brain) ~ log(d$body) ) , col="red" )

Page 34: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• Now  write  a  func'on  that  accepts  the  original  data  and  a  collec'on  of  row  numbers  as  parameters:

f.coef <- function( d , i ) { # make a new data frame that contains the resampled rows in i nd <- d[i,] # fit our model to the resampled data m <- lm( log(brain) ~ log(body) , data=nd ) # return coefficients coef(m)}

Page 35: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• Then  tell  the  boot  library  to  resample  and  collect  coefficients  from  that  func'on:

library(boot)boot.animals <- boot( d , f.coef , R=9999 )

boot.object <- boot( ORIGINAL.DATA , YOUR.FUNCTION , R=NUM.RESAMPLES )

Page 36: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrappingplot( boot.animals , index=2 )

Histogram  of  the  resampled  beta  coefficients  

Histogram of t

t*

Density

0.0 0.2 0.4 0.6 0.8

01

23

-4 -2 0 2 40.0

0.2

0.4

0.6

0.8

Quantiles of Standard Normal

t*

Comparison  of  resampled  distribu/on  to  normal

Page 37: 10 throws - Arbeitxcelab.net/rm/wp-content/uploads/2010/03/week9.pdf · 2 3 4 5 6 7 8 9 10 11 12 100 throws dice total frequency 0 5 10 15 20 > length( p[p==7] ) / throws [1] 0.14

bootstrapping

• Convenient  func'on  to  extract  confidence  intervals:

> boot.ci( boot.animals , type="perc" , index=2 )

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONSBased on 9999 bootstrap replicates

CALL : boot.ci(boot.out = boot.animals, type = "perc", index = 2)

Intervals : Level Percentile 95% ( 0.2905, 0.7491 ) Calculations and Intervals on Original Scale

> confint( lm( log(d$brain) ~ log(d$body) ) ) 2.5 % 97.5 %(Intercept) 1.7056829 3.4041133log(d$body) 0.3353152 0.6566742