carolyn penstein rosé language technologies institute human-computer interaction institute school...
DESCRIPTION
LightSIDE. Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science Foundation and the Office of Naval Research. l ightsidelabs.com/research/. Click here to load a file. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/1.jpg)
1
Carolyn Penstein RoséLanguage Technologies InstituteHuman-Computer Interaction InstituteSchool of Computer Science
With funding from the National Science Foundation and the Office of Naval Research
LightSIDE
![Page 2: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/2.jpg)
2
lightsidelabs.com/research/
![Page 3: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/3.jpg)
3
![Page 4: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/4.jpg)
4
Click here to load a file
![Page 5: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/5.jpg)
5
Select Heteroglossia as the predicted category
![Page 6: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/6.jpg)
6
Make sure the text field is selected to extract text features from
![Page 7: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/7.jpg)
Punctuation can be a “stand in” for mood “you think the answer is 9?” “you think the answer is 9.”
Bigrams capture simple lexical patterns “common denominator” versus “common multiple”
Trigrams (just like bigrams, but with 3 words next to each other) Carnegie Mellon University
POS bigrams capture syntactic or stylistic information “the answer which is …” vs “which is the answer”
Line length can be a proxy for explanation depth
Feature Space Customizations
![Page 8: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/8.jpg)
Contains non-stop word can be a predictor of whether a conversational contribution is contentful “ok sure” versus “the common denominator”
Remove stop words removes some distracting featuresStemming allows some generalization
Multiple, multiply, multiplicationRemoving rare features is a cheap form of feature
selection Features that only occur once or twice in the corpus won’t generalize, so
they are a waste of time to include in the vector space
Feature Space Customizations
![Page 9: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/9.jpg)
Feature Space Customizations
Think like a computer!Machine learning algorithms look for features that are good predictors, not features that are necessarily meaningful
Look for approximations If you want to find questions, you don’t need to do a complete
syntactic analysis Look for question marks Look for wh-terms that occur immediately before an auxilliary
verb
![Page 10: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/10.jpg)
10
Click to extract text features
![Page 11: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/11.jpg)
11
Select Logistic Regression as the Learner
![Page 12: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/12.jpg)
12
Evaluate result by cross validation over sessions
![Page 13: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/13.jpg)
13
Run the experiment
![Page 14: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/14.jpg)
14
![Page 15: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/15.jpg)
Stretchy Patterns(Gianfortoni, Adamson, & Rosé, 2011)
A sequence of 1 to 6 categories May include GAPs
Can cover any symbol GAP+ may cover any number
of symbols Must not begin or end with a GAP
![Page 16: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/16.jpg)
16
![Page 17: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/17.jpg)
17
![Page 18: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/18.jpg)
18
Now it’s your turn!We’ll explore some advanced features and error analysis
after the break!
![Page 19: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/19.jpg)
Error Analysis Process
Identify large error cellsMake comparisons
Ask yourself how it is similar to the instances that were correctly classified with the same class (vertical comparison)
How it is different from those it was incorrectly not classified as (horizontal comparison)
PositiveNegative
![Page 20: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/20.jpg)
Error Analysis on Development Set
20
![Page 21: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/21.jpg)
21
Error Analysis on Development Set
![Page 22: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/22.jpg)
22
Error Analysis on Development Set
![Page 23: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/23.jpg)
23
Error Analysis on Development Set
![Page 24: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/24.jpg)
24
Error Analysis on Development Set
![Page 25: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/25.jpg)
25
![Page 26: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/26.jpg)
26
![Page 27: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/27.jpg)
27
Positive: is interesting, an interesting scene
Negative: would have been more interesting, potentially interesting, etc.
What’s different?
![Page 28: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/28.jpg)
28
![Page 29: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/29.jpg)
29
![Page 30: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/30.jpg)
30
![Page 31: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/31.jpg)
31
![Page 32: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/32.jpg)
32
![Page 33: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/33.jpg)
33
* Note that in this case we get no benefit if we use feature selection over the original feature space.
![Page 34: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/34.jpg)
Feature Splitting (Daumé III, 2007)
34
General
Domain A Domain BGeneral
Why is this nonlinear?
It represents the interaction between each feature and the Domain variable
Now that the feature space represents the nonlinearity, the algorithm to train the weights can be linear.
![Page 35: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/35.jpg)
35
Healthcare Bill Dataset
![Page 36: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/36.jpg)
36
Healthcare Bill Dataset
![Page 37: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/37.jpg)
37
Healthcare Bill Dataset
![Page 38: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/38.jpg)
38
Healthcare Bill Dataset
![Page 39: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/39.jpg)
39
Healthcare Bill Dataset
![Page 40: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/40.jpg)
40
Healthcare Bill Dataset
![Page 41: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/41.jpg)
41
Healthcare Bill Dataset
![Page 42: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/42.jpg)
42
Healthcare Bill Dataset
![Page 43: Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science](https://reader030.vdocuments.net/reader030/viewer/2022013012/56816812550346895ddda042/html5/thumbnails/43.jpg)
43
Healthcare Bill Dataset