parti:qualityphraseminingink-ron.usc.edu/xiangren/ · •textrank [mihalceaet al.’04] •tf-idf 8...
TRANSCRIPT
![Page 1: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/1.jpg)
Constructing Structured Information Networks from Massive Text Corpora
Part I: Quality Phrase Mining
![Page 2: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/2.jpg)
Effort-Light StructMine: Methodology
2
Data-driven textsegmentation
(SIGMOD’15, WWW’16)
Entity names& context units
Partially-labeledcorpus
Learning Corpus-specific Model(KDD’15, KDD’16,
EMNLP’16, WWW’17)
Structures fromthe remainingunlabeled data
Knowledgebases
Textcorpus
![Page 3: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/3.jpg)
Quality Phrase Mining• Quality phrase mining seeks to extract a
ranked list of phrases with decreasing quality from a large collection of documents• Examples:
3
ScientificPapers
NewsArticles
Expected Results
USPresidentAndersonCooperBarack Obama…Obama administration…atown…
Expected Results
data miningmachinelearninginformationretrieval…support vectormachine…the paper…
![Page 4: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/4.jpg)
Why Phrase Mining?• Phrase: Minimal, unambiguous semantic unit; basic building
block for information networks and knowledge bases• Unigrams vs. phrases
• Unigrams (singlewords)areambiguous• E.g., “United”: United States? United Airline? United Parcel Service?
• Phrase:Anatural,meaningful,unambiguous semanticunit• E.g., “United States” vs. “United Airline”
• Mining semantically meaningful phrases• Transformtextdatafromwordgranularity tophrasegranularity
• Enhancethepowerandefficiencyatmanipulatingunstructureddatausingdatabasetechnology
4
![Page 5: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/5.jpg)
Application Scenarios
• Natural Language Processing (NLP)• Documentanalysis
• Information Retrieval (IR)• Indexinginsearchengine
• Text Mining• Keyphrases fortopicmodeling
5
![Page 6: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/6.jpg)
What Kind of Phrases Are of “High Quality”?• Popularity
• “informationretrieval”>“cross-languageinformationretrieval”
• Concordance• “strongtea”>“powerfultea”• “activelearning”> “learningclassification”
• Informativeness• “thispaper”(frequentbutnotdiscriminative,notinformative)
• Completeness• “supportvectormachine” >“vectormachine”
6
![Page 7: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/7.jpg)
Three Families of Methods
Supervised(linguisticanalyzers)
Unsupervised(statistical signals)
Weakly/DistantlySupervised
7
![Page 8: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/8.jpg)
Supervised Phrase Mining• Phrase mining was originated from the NLP
community• How to use linguistic analyzers to extract phrases?
• Parsing(e.g.,stanford NLPparsers)• NounPhrase(NP)Chunking
• How to rank extracted phrases?• C-value[Frantzi etal.’00]• TextRank [Mihalcea etal.’04]
• TF-IDF
8
![Page 9: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/9.jpg)
• Minimal Grammatical Segments ó Phrases
• Phrases: “the chef”, “the soup”
Linguistic Analyzer – Parsing
9
Rawtextsentence(string)
Fullparsetree(grammaticalanalysis)
Thechefcooksthesoup.
Full-textParsing
![Page 10: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/10.jpg)
Inefficiencies of Parsing
• Difficult to directly apply pre-trained to new domains (e.g. twitter, biomedical, yelp)• Unlesssophisticated,manuallycurated,domain-specifictrainingdataareprovided
• Computationally slow.• Cannotbeappliedonweb-scaledatatosupportemergingapplications
• We need “shallow” phrase mining techniques
10
![Page 11: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/11.jpg)
Linguistic Analyzer – Chunking
• Noun phrase chunking is a light version of parsing
1. Apply tokenization and part-of-speech (POS) tagging to each sentence
2. Search for noun phrase chunks
11
![Page 12: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/12.jpg)
Drawbacks of NP Chunking
• Pre-trained models may not be transferable to new domains• Scientificdomains,querylogs,socialmedia(e.g.,Yelp,Twitter)
• Lack of the usage of corpora-level information• NPsometimescan’tmeettherequirementsofqualityphrases
12
![Page 13: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/13.jpg)
Ranking – C-value• Given a set of phrases, for a given phrase 𝑝• 𝑓(𝑝) istherawfrequency• |𝑝| isthenumberoftokensin𝑝
• If there is no phrase contains 𝑝 as a substring• C-value(𝑝)=log) |𝑝| ⋅ 𝑓(𝑝)
• Else• C-value(𝑝)=log) |𝑝| ⋅ 𝑓 𝑝 − avg.012345267𝑓 𝑞
• Prefers “maximal” phrases• Popularity & Completeness
13
![Page 14: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/14.jpg)
Ranking – TextRank
• Construct a network of phrases & unigrams• Compute the importance of vertices• SimilartoPageRank
• Popularity & Informativeness
14
Compatibilityofsystemsoflinearconstraintsoverthesetofnaturalnumbers.CriteriaofcompatibilityofasystemoflinearDiophantineequations,strictinequations,and
nonstrict inequations areconsidered.…..
![Page 15: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/15.jpg)
Ranking – TF-IDF
• Term Frequency• E.g.,rawfrequency• Rewardsfrequentphrases
• Inverse Document Frequency• E.g.,log((#ofalldocuments)/(#ofoccurreddocuments))• Rewards“rare”phrases
• Popularity & Informativeness
15
![Page 16: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/16.jpg)
Three Families of Methods
Supervised(linguisticanalyzers)
Unsupervised(statistical signals)
Weakly/DistantlySupervised
16
![Page 17: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/17.jpg)
Unsupervised Phrase Mining
• Statistics based on massive text corpora• Popularity• Rawfrequency• FrequencydistributionbasedonZipfian ranks[Deane’05]
• Concordance• Significancescore[Churchetal.’91][El-Kishky etal.’14]
• Completeness• Comparisontosuper/sub-sequences[Parameswaran etal.’10]
17
![Page 18: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/18.jpg)
Raw Frequency
• Frequent contiguous pattern mining• If“AB” isfrequent,likely“AB” couldbeaphrase
• It prefers• “Stopphrases”• Shorterphrases
• E.g., freq(vector machine) ≥ freq(support vector machine)
• Raw frequency could NOT reflect the quality of phrases
18
![Page 19: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/19.jpg)
Raw Frequency (improved)
• Combine with topic modeling• Mergeadjacentunigramsofthesametopic[Blei &Lafferty’09]• Frequentpatternminingwithinthesametopic[Danilevsky etal.’14]
• Limitations• Tokensinthesamephrasemaybeassignedtodifferenttopics• E.g.knowledge discovery usingleastsquaressupportvector machineclassifiers…
19
![Page 20: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/20.jpg)
Frequency Distribution• Idea: ranks in a Zipfian frequency distribution is
more reliable than raw frequency• Heuristic: Actual Rank / Expected Rank• Example:• Givenaphraselike“eastend”• ActualRank:rank“eastend”amongalloccurrencesof“east”(e.g.,“east end”,“east side”,“theeast”,“towardstheeast”,etc.)• ExpectedRank:rank“__end”amongallcontextsof“east”(e.g.,“__end”,“__side”,“the__”,“towardsthe__”,etc.)
20
![Page 21: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/21.jpg)
Significance score • Significance score [Church et al.’91]• A.k.a.Zscore
• ToPMine [El-Kishky et al.’15]• Ifaphrasecanbedecomposedintotwoparts
• P = P1 ● P2• α(P1,P2)≈(f(P1●P2)̶µ0(P1,P2))/√f(P1●P2)
21
Qualityphrases
![Page 22: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/22.jpg)
Significance score (cont’d)• Merge adjacent unigrams greedily if their
significance score is above the threshold.
22
![Page 23: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/23.jpg)
Comparison to super/sub-sequences• Frequency ratio between an n-gram phrase
and its two (n-1)-gram phrases• Example
• Pre-confidence ofSanAntonio:2385/14585• Post-confidence ofSanAntonio:2385/2855
• Expand / Terminate based on thresholds
23
Phrase Rawfrequency
San 14585
Antonio 2855
SanAntonio 2385
![Page 24: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/24.jpg)
Comparison to super/sub-sequences (cont’d)• Assumption
• Anti-example• “relationaldatabasesystem”isaqualityphrase.• Both“relationaldatabase”and“databasesystem”canbequalityphrases.
24
Ann-gramqualityphrase
Two(n-1)-gramsub-phrases
Atleastoneofthemisnotaqualityphrase.
![Page 25: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/25.jpg)
Limitations of Statistical Signals
• The thresholds should be carefully chosen.• Only consider a subset of quality phrase
requirements.• Combining different signals in an
unsupervised manner is difficult.• Introducesomesupervisionmayhelp!
25
![Page 26: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/26.jpg)
Three Families of Methods
Supervised(linguisticanalyzers)
Unsupervised(statistical signals)
Weakly/DistantlySupervised
26
![Page 27: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/27.jpg)
Weakly / Distantly Supervised Phrase Mining Methods• SegPhrase [Liu et al.’15]• Weaklysupervised
• AutoPhrase [Shang et al.’17]• Distantlysupervised
27
![Page 28: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/28.jpg)
SegPhrase
28
Document 1Citationrecommendationisaninterestingbutchallengingresearchproblemindataminingarea.
Document 2Inthisstudy,weinvestigatetheprobleminthecontextofheterogeneousinformationnetworksusingdataminingtechnique.
Phrase Mining
Document 3PrincipalComponentAnalysisisalineardimensionalityreduction technique commonly usedin machine learning applications.
Quality Phrases
PhrasalSegmentation
RawCorpus SegmentedCorpus
InputRawCorpus Quality Phrases SegmentedCorpus
• Outperform all above methods on domain-specific corpus (e.g., Yelp reviews)
![Page 29: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/29.jpg)
Quality Estimation• Weakly Supervised
• Labels:Whetheraphraseisaqualityoneornot• “support vector machine”: 1• “the experiment shows”: 0
• For~1GBcorpus,only300labels
• Pros• Binaryannotationsareeasy
• Cons• Theselectionofhundredsofvarying-qualityphrasesfrommillionsofcandidatesshouldbecareful.
29
![Page 30: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/30.jpg)
Phrasal Segmentation• Phrasal segmentation can tell which phrase is
more appropriate• Ex:Astandard⌈featurevector⌋ ⌈machinelearning⌋ setupisusedtodescribe...
• Effects on quality re-estimation (real data)• nphardinthestrongsense• nphardinthestrong• databasemanagementsystem
30
Notcountedtowardstherectifiedfrequency
![Page 31: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/31.jpg)
From the Titles and Abstracts of SIGMOD
31
Query SIGMOD
Method SegPhrase Chunking(TF-IDF&C-Value)
1 database data base
2 databasesystem database system
3 relationaldatabase queryprocessing
4 queryoptimization queryoptimization
5 queryprocessing relationaldatabase
… … …
51 sql server databasetechnology
52 relationaldata databaseserver
53 datastructure largevolume
54 joinquery performancestudy
55 webservice webservice
… … …
201 highdimensionaldata efficientimplementation
202 location basedservice sensornetwork
203 xmlschema largecollection
204 twophaselocking importantissue
205 deepweb frequentitemset
… … …
OnlyinSegPhrase OnlyinChunking
![Page 32: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/32.jpg)
From the Titles and Abstracts of SIGKDD
32
Query SIGKDD
Method SegPhrase Chunking(TF-IDF&C-Value)
1 datamining datamining
2 dataset association rule
3 association rule knowledge discovery
4 knowledgediscovery frequentitemset
5 timeseries decisiontree
… … …
51 associationrulemining searchspace
52 ruleset domain knowledge
53 conceptdrift importnant problem
54 knowledgeacquisition concurrencycontrol
55 geneexpressiondata conceptualgraph
… … …
201 web content optimalsolution
202 frequentsubgraph semanticrelationship
203 intrusiondetection effectiveway
204 categoricalattribute spacecomplexity
205 userpreference smallset
… … …
OnlyinSegPhrase OnlyinChunking
![Page 33: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/33.jpg)
Reported by TripAdvisor(Find “Interesting” Collections of Hotels)
33
![Page 34: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/34.jpg)
AutoPhrase• No label selection and annotation effort• Smoothly support multiple languages
34
![Page 35: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/35.jpg)
AutoPhrase vs. Previous Work
35
Differentdomains
Differentlanguages
![Page 36: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/36.jpg)
AutoPhrase’s Example Results
36
![Page 37: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/37.jpg)
ReferencesDeane, P., 2005, June. A nonparametric method for extraction of candidate phrasal terms. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 605-613). Association for Computational Linguistics.
Koo, T., Carreras Pérez, X. and Collins, M., 2008. Simple semi-supervised dependency parsing. In 46th Annual Meeting of the Association for Computational Linguistics (pp. 595-603).
Xun, E., Huang, C. and Zhou, M., 2000, October. A unified statistical model for the identification of English baseNP. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (pp. 109-116). Association for Computational Linguistics.
Zhang, Z., Iria, J., Brewster, C. and Ciravegna, F., 2008, May. A comparative evaluation of term recognition algorithms. In LREC.
Park, Y., Byrd, R.J. and Boguraev, B.K., 2002, August. Automatic glossary extraction: beyond terminology identification. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C. and Nevill-Manning, C.G., 1999, August. KEA: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries (pp. 254-255). ACM.
Liu, Z., Chen, X., Zheng, Y. and Sun, M., 2011, June. Automatic keyphrase extraction by bridging vocabulary gap. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (pp. 135-144). Association for Computational Linguistics.
Evans, D.A. and Zhai, C., 1996, June. Noun-phrase analysis in unrestricted text for information retrieval. In Proceedings of the 34th annual meeting on Association for Computational Linguistics (pp. 17-24). Association for Computational Linguistics.
37
![Page 38: PartI:QualityPhraseMiningink-ron.usc.edu/xiangren/ · •TextRank [Mihalceaet al.’04] •TF-IDF 8 •Minimal Grammatical Segments óPhrases ... •If a phrase can be decomposed](https://reader036.vdocuments.net/reader036/viewer/2022071109/5fe39b126238836b22096673/html5/thumbnails/38.jpg)
ReferencesFrantzi, K., Ananiadou, S. and Mima, H., 2000. Automatic recognition of multi-word terms:. the c-value/nc-value method. International Journal on Digital Libraries, 3(2), pp.115-130.
Mihalcea, R. and Tarau, P., 2004, July. TextRank: Bringing order into texts. Association for Computational Linguistics.
Blei, D.M. and Lafferty, J.D., 2009. Topic models. Text mining: classification, clustering, and applications, 10(71), p.34.
Danilevsky, M., Wang, C., Desai, N., Ren, X., Guo, J. and Han, J., 2014, April. Automatic construction and ranking of topical keyphrases on collections of short documents. In Proceedings of the 2014 SIAM International Conference on Data Mining (pp. 398-406). Society for Industrial and Applied Mathematics.
Church, K., Gale, W., Hanks, P. and Hindle, D., 1991. Using statistics in lexical analysis. Lexical acquisition: exploiting on-line resources to build a lexicon, 115, p.164.
El-Kishky, A., Song, Y., Wang, C., Voss, C.R. and Han, J., 2014. Scalable topical phrase mining from text corpora. Proceedings of the VLDB Endowment, 8(3), pp.305-316.
Parameswaran, A., Garcia-Molina, H. and Rajaraman, A., 2010. Towards the web of concepts: Extracting concepts from large datasets. Proceedings of the VLDB Endowment, 3(1-2), pp.566-577.
Liu, J., Shang, J., Wang, C., Ren, X. and Han, J., 2015, May. Mining quality phrases from massive text corpora. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 1729-1744). ACM.
Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C.R. and Han, J., 2017. Automated Phrase Mining from Massive Text Corpora. arXiv preprint arXiv:1702.04457.
38