introduction to generalized additive...
TRANSCRIPT
Introduction to Generalized Additive Models
R. Harald Baayen
Seminar für SprachwissenschaftUniversität Tübingen & Department of Linguistics
University of Alberta
October 17, 2013 / NWAV Pittsburgh
1 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
linear regressionGalton
2 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
but . . . how linear were his data?
3 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
wiggly lines: regression splines
I restricted cubic splines
I thin plate regression splines
4 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
restricted cubic splines(pupil dilation curve)
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●●
●●●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
0 500 1000 1500 2000 2500
−10
00
100
200
300
400
500
Time (ms)
Pup
il D
ilatio
n (0
.001
mm
)
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●●
●●●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
0 500 1000 1500 2000 2500−
100
010
020
030
040
050
0
Time (ms)
Pup
il D
ilatio
n (0
.001
mm
)
5 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
thin plate regression splines
0 20 40 60 80 100
010
2030
4050
x
f1(x
)
0 20 40 60 80 100
−50
−30
−10
0
x
f2(x
)
0 20 40 60 80 100
−40
020
40
x
f3(x
)
0 20 40 60 80 100
−10
0−
60−
2020
x
f1(x
) +
2 *
f2(x
) +
3/5
* f3
(x)
6 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
stress on triconstituent compounds in English
left-branching stress left háy fever treatmentleft-branching stress right science fíction bookright-branching stress left business crédit cardright-branching stress right family Christmas dínner
7 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
Pitch(Hz)
100
300
Pitch(Hz)
100
300
300
Pitch(Hz)
100
she read about a gene therapy technology last night
Time (s)0 2.996
8 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
> library(mgcv)> pitch.gam = bam(PitchSemiTone ~> Sex +> BranchingCondition +> s(NormalizedTime, by=BranchingCondition) +> s(NormalizedTime, Speaker, bs="fs", m=1) +> s(NormalizedTime, Compound, bs="fs", m=1) +> s(Compound, Sex, bs="re"),> data=pitch,> rho=0.825, AR.start=pitch$NewTimeSeries)
9 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
> round(summary(pitch.gam)$p.table, 2)
Estimate Std. Error t value Pr(>|t|)(Intercept) 85.34 1.60 53.22 0Sexm -9.92 1.61 -6.15 0BranchingConditionLN2 5.54 1.40 3.97 0BranchingConditionR 4.11 1.18 3.48 0
10 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
> round(summary(pitch.gam)$s.table, 2)[,1:2]
edf Ref.dfs(NormalizedTime):BranchingConditionLN1 6.27 6.59s(NormalizedTime):BranchingConditionLN2 7.14 7.42s(NormalizedTime):BranchingConditionR 8.40 8.58s(NormalizedTime,Speaker) 96.04 106.00s(NormalizedTime,Compound) 304.18 349.00s(Compound,Sex) 50.33 76.00
> round(summary(pitch.gam)$s.table, 2)[,3:4]
F p-values(NormalizedTime):BranchingConditionLN1 2.70 0.01s(NormalizedTime):BranchingConditionLN2 5.40 0.00s(NormalizedTime):BranchingConditionR 15.34 0.00s(NormalizedTime,Speaker) 1298.17 0.00s(NormalizedTime,Compound) 44.11 0.00s(Compound,Sex) 18.82 0.00
11 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
0 20 40 60 80 100
−6
−4
−2
02
46
normalized time
part
ial e
ffect
(in
sem
itone
s)
0 20 40 60 80 100
−6
−4
−2
02
46
normalized time
part
ial e
ffect
(in
sem
itone
s)
0 20 40 60 80 100
−6
−4
−2
02
46
normalized time
part
ial e
ffect
(in
sem
itone
s)0 20 40 60 80 100
−6
−4
−2
02
46
normalized time
part
ial e
ffect
(in
sem
itone
s)
0 20 40 60 80 100
−2
−1
01
23
normalized time
part
ial e
ffect
(in
sem
itone
s) ●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−1.
0−
0.5
0.0
0.5
s(Compound,Sex,50.33)
Gaussian quantilesef
fect
s
12 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
−1.0 −0.5 0.0 0.5 1.0 1.5
−1.
0−
0.5
0.0
0.5
by−word random effects for females
by−
wor
d ra
ndom
effe
cts
for
mal
es
adult jogging suit
baby lemon tea
business credit card
celebrity golf tournament
city hall restoration
coffee table designer
company internet page
conference time sheet
cotton candy maker
cream cheese recipe
day care center
diamond ring exhibition
family christmas dinner
family planning clinic
field hockey player
gene therapy technologyhay fever treatment
kidney stone removal
lung cancer surgery
maple syrup production
money market fund
passenger test flight
piano sheet music
pilot leather jacket
pizza home delivery
prisoner community service
restaurant tourist guide
science fiction book
security guard service
sign language class
silicon chip manufacturer
silver jubilee gift
student season ticket
student string orchestra
team locker room
tennis grass court
tennis group practice
visitor name tag
weather station data
woman fruit cocktail
13 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
words that send males’ pitch up
−1.0 −0.5 0.0 0.5 1.0 1.5
0.2
0.3
0.4
0.5
0.6
by−word random effects for females
by−
wor
d ra
ndom
effe
cts
for
mal
es
baby lemon tea
coffee table designer
company internet page
cotton candy maker
family planning clinicmaple syrup production
money market fund
passenger test flight
piano sheet music
pilot leather jacket
tennis group practice
weather station data
14 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
words that send females’ pitch up
−1.0 −0.5 0.0 0.5 1.0 1.5
−1.
2−
1.0
−0.
8−
0.6
−0.
4−
0.2
by−word random effects for females
by−
wor
d ra
ndom
effe
cts
for
mal
es
adult jogging suit
business credit card
cream cheese recipe
day care centerlung cancer surgery
restaurant tourist guide
science fiction book
student season ticket
team locker room
visitor name tag
woman fruit cocktail
15 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
I the standard linear model: multiplicative interaction
I Y ∼ X1 + X2 + X1 · X2
x1
x2
linear predictor
16 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
I the standard linear model: multiplicative interaction
I Y ∼ X1 + X2 + X1 · X2
x1
x2linear predictor
17 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
I the standard linear model: multiplicative interaction
I Y ∼ X1 + X2 + X1 · X2
18 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
I the standard linear model: multiplicative interaction
I Y ∼ X1 + X2 + X1 · X2
−0.4 −0.2 0.0 0.2 0.4
−0.
4−
0.2
0.0
0.2
0.4
linear predictor
x1
x2
−0.2
−0.2
−0.15
−0.15
−0.1
−0.1
−0.05
−0.05
0
0
0.05
0.05
0.1
0.1
0.15
0.15
0.2
0.2
19 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
wiggly surfaces
I thin plate regression splines
isometric predictors
I tensor products
non-isometric predictors
20 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
thin plate regression splines
21 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
tensor product smooths
22 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
beware of TPRS!
X
Y
650 700 750 800 850 900
68
1012
−0.5
−0.5
−0.5
−0.5
−0.5
0
0 0 0
0.5
0.5
0.5
0.5
X
Y
650 700 750 800 850 900
68
1012
600 650 700 750 800 850 900
46
810
12
X
Y
−0.6
−0.6
−0.
6
−0.4
−0.2
−0.2
−0.2 −
0.2
0
0
0
0
0.2
0.2
0.2
0.4
0.4
0.4
0.4
0.6
0.6
0.6
0.8
600 650 700 750 800 850 900
46
810
12
X
Y
0
0.05
0.1
0.15
0.2
0.2
5 0
.3
0.3
5 0
.4
23 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
interactions with factors
I multiple surfaces, one for each factor level
I for binary factors: difference surface
24 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
two surfaces, and their difference surface
X
Y
−2
−1
−1
−1
−0.5
−0.5
−0.5
−0.
5
−0.5
0
0
0
0
0.5
0.5
0.5
0.5
1
1
A
X
Y
−4
−3
−3
−2
−2
−2
−2 −1
−1
−1
−1
−1 0
0
0
0
1
1
1
1
2
2
2
2
3
B
X
Y
−2
−1
−1
−1
−1
−1
0
0
0
0 0
1
1
1
1
1
2
2
B−A
25 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
two surfaces, and their difference surface
> # two surfaces> sinus.gam = bam(Z ~ Condition +> te(X, Y, by = Condition), data = sinus)>> # a difference surface> sinus$ConditionNum = ifelse(sinus$Condition=="A", 0, 1)> sinus.diff.gam = bam(Z ~ te(X, Y) +> te(X, Y, by = ConditionNum), data = sinus)
26 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
decompositional models
> sinus2.gam = gam(Zsin ~ ti(X) + ti(Y) + Condition +> ti(X, Y, by=Condition), data=sinus2)
27 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
decompositional models
600 650 700 750 800 850 900−
1.5
−0.
50.
51.
5X
Zsi
n4 6 8 10 12
−1.
5−
0.5
0.5
1.5
Y
Zsi
n
X
Y
−1 −1
−0.5
−0.5
−0.5
0 0 0
0
0.5
0.5
0.5
0.5
1
1
A
X
Y
−3
−3 −2
−2
−2
−1
−1
−1
−1
−1 0
0
0
0 0
1
1
1
1 2
2
2
2
3
3
B
600 650 700 750 800 850 900
46
810
12
X
Y
−2.
5
−2
−1.5
−1
−1
−1 −1
−0.5
−0.5
−0.5
0
0 0.
5
0.5
1
1
1.5
1.5
2
A
600 650 700 750 800 850 900
46
810
12
X
Y
−4
−4
−2
−2
−2
−2 −1
−1
−1
−1 0
0
0 0
1
1
1 2
2
2
3
3
4
B
28 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta
higher-dimensional interactions
> m.gam = gam(Response ~ te(X, Y, Z),> data= dfr)> m.gam = gam(Response ~ te(X, Y) + te(X, Z),> data = m)> m.gam = gam(Response ~ ti(X) + ti(Y) + ti(Z) +> ti(X, Y) + ti(X, Z),> data = m)
29 | R. H. Baayen GAMs Seminar für Sprachwissenschaft Universität Tübingen & Department of Linguistics University of Alberta