hedge detection with latent features su qi [email protected] clsw2013, zhengzhou, henan may 12, 2013
TRANSCRIPT
![Page 1: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/1.jpg)
Hedge Detection with Hedge Detection with Latent FeaturesLatent Features
CLSW2013, Zhengzhou, Henan
May 12, 2013
![Page 2: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/2.jpg)
1. Introduction• The importance of Information credibility• Hedge
– hedges are “words whose job is to make things fuzzier or less fuzzy”. [Lakoff, 1972]
– to weaken or intensify the speaker’s commitment to a proposition.
– narrowed down by some linguists only to keep it as a detensifier.
• CoNLL-2010 shared task of hedge detection– Detecting hedges and their scopes
![Page 3: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/3.jpg)
1. Introduction– Examples
<sentence id="S3.7" certainty="uncertain">It is <ccue>possible</ccue> that false allegations may be over-represented, because many true victims of child sexual abuse never tell anyone at all about what happened.</sentence>
<sentence id="S3.11" certainty="uncertain"><ccue>Some studies</ccue> break down the level of false allegations by the age of the child.</sentence>
<sentence id="S3.19" certainty="uncertain"><ccue>It is suggested</ccue> that parents have consistently underestimated the seriousness of their child's distress when compared to accounts of their own children.</sentence>
![Page 4: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/4.jpg)
1. Introduction– sequence labeling models, e.g. conditional random fields
and svm-hmm– binary classification– shallow features (e.g. word, lemma, POS tags, etc.)
The complication of hedge detection is in the sense that the same word types occasionally have different, non-hedging uses
auxiliaries (may, might), hedging verbs (suggest, question), adjectives (probable, possible), adverbs (likely), conjunctions (or, and, either…or), nouns (speculation), etc.
can only marginally improve the accuracy of a bag-of-word representation
![Page 5: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/5.jpg)
2. The Main Points in This Paper • Basic assumption:
– high-level (latent) features work better for sequence labeling
– projects words to a lower dimensional latent space thus improves generalizability to unseen items, and helps disambiguate some ambiguous items
![Page 6: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/6.jpg)
3. Our Work• we perform LDA training and inference by Gibbs
sampling, then train the CRF model by adding topic IDs as additional external features.
• As an unsupervised model, LDA allows us to train and infer on an unlabeled dataset, thus relax the re-striction of the labeled dataset used for CRF train-ing.
![Page 7: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/7.jpg)
4. Corpus and Experiments • biological scientific articles• three different levels of feature set
– Level 1: token; whether the token is a potential hedge cue (occurring in the pre-extracted hedge cue list) or part of a hedge cue; its context within the scope of [-2, 2]
– Level 2: lemma; part-of-speech tag; whether the token belongs to a chunk; whether it is a named entity GENIA tagger
– Level 3: topic ID (inferred by the LDA model)
![Page 8: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/8.jpg)
4. Corpus and Experiments
![Page 9: Hedge Detection with Latent Features SU Qi sukia@pku.edu.cn CLSW2013, Zhengzhou, Henan May 12, 2013](https://reader036.vdocuments.net/reader036/viewer/2022082613/5697bf9d1a28abf838c93db1/html5/thumbnails/9.jpg)
5. Analysis and Conclusion• Hedge is a relatively “close” set• A significant improvement can be found between the
baselines and all the other experimental settings.• The performance of sequence labeling outperforms both
naïve methods significantly. • The topics generated by LDA are effective
• Our work suggests a potential research direction of incorporating topical information for hedge detection.