ontology-based event modeling for semantic understanding of chinese news story wang wei, zhao...
TRANSCRIPT
Ontology-Based Event Modeling for Semantic
Understanding of Chinese News Story
Wang Wei, Zhao Dongyan
Institute of Computer Science & Technology
NLP&CC 2012 – Beijing, China
Outline
Introduction Related Work
Event definitions Existing event models
News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM
Evaluation Conclusion
-2-NLP&CC, Beijing,
China
Introduction “News Information Overload”
Numerous online news service providers Explosive increase of online news users
Persons (Ten
thousand )
Numbers of online news users and time they spend in browsing news
-3-NLP&CC, Beijing,
China
Introduction Classification & summarization are widely used in
online news domain document-oriented techniques based on traditional “BOW”
models can not provide sufficient event semantic information
Users need intelligent event level semantic news services to push events but not documents to users employing entities and relations to provide semantic
navigation, e.g., renlifang of Microsoft, soso waltz of Tencent
Web of Document Web of Data Web of entity and relation
-4-NLP&CC, Beijing,
China
-5-
Introduction
基于关键词的分析,容易造成“语义”错误
事件发生地是鹭岛而非香港
上海演唱会是王菲的事件,与刘德华无关
How to provide multi-dimensional semantic navigation?
5W1H : Who, When, Where, What, Why, How
NLP&CC, Beijing, China
Introduction
-6-
Our research aim is semantic understanding of Chinese news by extracting
entities, relations involved in a key event of a news story building a news events knowledge base as well as a
semantic retrieval engine to support event level semantic applications
We implemented a novel framework to address the whole list of 5W1H
key event identification event semantic elements extraction Ontology-based event knowledge base construction
This paper discusses Ontology-Based Event Modeling for Semantic
Understanding of Chinese News StoryNLP&CC, Beijing, China
Chinese Online News
Chinese Online News
Methodology
Key event identification in one
news storyEvent
knowledge base
Event knowledge
baseEvent semantic modeling and ontology population
5W1H event semantic-elements
extraction
-7-NLP&CC, Beijing,
China
Outline
Introduction Related Work
Event Definitions Existing Event Models
News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM
Evaluation Conclusion
-8-NLP&CC, Beijing,
China
Related Work Event Definitions
WordNet “something that happens at a given place and time.”
Cognitive psychologists “happenings in the outside world”, people observe and understand the world
through event . Linguists (Chung and Timberlake, 1985)
“an event can be defined in terms of three components: a predicate; an interval of time on which the predicate occurs and a situation or set of conditions under which the predicate occurs.”
TimeML “a cover term for situations that happen or occur. Events can be punctual or
last for a period of time.” ACE (Automatic Content Extraction)
“an event involving zero or more ACE entities, values and time expressions” Event-based summarization
atomic events: link major constituent parts (participants, locations, times) of events through verbs or action nouns labeling the event itself.
-9-NLP&CC, Beijing,
China
Related Work
-10-NLP&CC, Beijing,
China
<S, P, O, T, L>, where S, P, O are core elements and T, L are subordinates.<S, P, O, T, L>, where S, P, O are core elements and T, L are subordinates.
We define event as “an event is a specific occurrence which involves in some
participants”. It has three components:
a predicate; core participants, i.e., agents and patients; auxiliary participants, i.e., time and location of the event.
These participants are usually named entities which correspond to what, who, whom, when, where elements of an event.
Related Work
-11-NLP&CC, Beijing,
China
Existing Event Models Script Theory, Event Domain Cognitive Model
Cognitive linguistics Probabilistic Event Model
TDT Atomic Event Model
Event-based automatic summarization Structural Event Model
MUC & ACE Generic Event Model
Eventcentric multimedia data management Ontology Event Models
ABC, PROTON, EO (Event Ontology) , Event-Model-F
Outline
Introduction Related Work
Event Definitions Existing Event Models
News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM
Evaluation Conclusion
-12-NLP&CC, Beijing,
China
News Ontology Event Model
Modeling (1) event information, (2) event relations, (3) event
media
-13-NLP&CC, Beijing,
China
Main concepts
Relations
News Ontology Event Model
-14-NLP&CC, Beijing,
China
Outline
Introduction Related Work
Event Definitions Existing Event Models
News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM
Evaluation Conclusion
-15-NLP&CC, Beijing,
China
Evaluation
-16-
Janez Brank et. al. classified ontology evaluation methods into four categories: (1) Comparing the ontology to a “golden standard”; (2) Using an ontology in an application and evaluating
the results; (3) Comparing with a source of data about the domain to
be covered by the ontology; (4) Evaluation is done by humans who try to assess how
well the ontology meets a set of predefined criteria, standards, requirements.
NLP&CC, Beijing, China
Comparison between NOEM and existing event models
Evaluation
-17-NLP&CC, Beijing,
China
Evaluation
-18-
Manual labeling 4 postgraduates 6000+ Chinese News
stories from Xinhua news agency
Covers 23 top classes and 2082 subclasses of CNML
In 85% of them, we found a topic sentence which
contains key event of the news 4/5Ws in the topic sentence
which can be described by NOEM appropriately
Category code
Category name Subclasses
1 政治 85
2 法律、司法 76
3 对外关系、国际关系 72
4 军事 129
5 社会、劳动、灾难事故 105
11 经济 132
12 经济理论研究 132
13 基本建设、建筑业、房地产 47
14 农业、农村 99
15 矿业、工业 239
16 能源、水务、水利 69
17 信息产业 72
18 交通运输、邮政、物流 65
19 商业、外贸、海关 55
21 服务业、旅游业 84
22 环境、气象 43
31 教育 63
33 科学技术 70
35 文化、娱乐休闲 98
36 文学、艺术 130
37 传媒业 61
38 医药、卫生 88
39 体育 68NLP&CC, Beijing,
China
Evaluation: A Case Study Chinese President Hu Jintao arrived in Canada for a state visit
Result of 5W1H extraction of key event
<抵达 , isTypeof, Movement/Transport>,
<胡锦涛 , isTypeof, Person>,
<8 日 , isTypeof, Time> ,
<渥太华 , isTypeof, Place>
……5W1H
Extraction5W1H
Extraction
-19-NLP&CC, Beijing,
China
Evaluation: Population of NOEM
An automatic generated OWL File
Chinese President Hu Jintao arrived in Canada for a state visit
Ontology
PopulationOntology
Population
-20-NLP&CC, Beijing,
China
Outline
Introduction Related Work
Event Definitions Existing Event Models
News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM
Evaluation Conclusion
-21-NLP&CC, Beijing,
China
Conclusion
-22-NLP&CC, Beijing,
China
Main contributions an extensive investigation of “event” and “event
modeling” the usage of concept of 5W1H semantic elements in Chinese
news domain the design of ontology-based event model: NOEM
defining concepts of entities (time, person, location, organization etc.), events and relationships to capture temporal, spatial, information, experiential, structural and causal aspect, e.g. the 5W1H, of an event
Future work building a news events knowledge base and a semantic
retrieval engine on NOEM to support event level semantic applications
The End
Thank you for your patience!
Q&A
Framework A streamline of three steps and six sub-tasks
(1) Title classification and (2) topic sentences extraction for key event identification;
(3) Semantic role labeling and (4) 5W1H elements identification for event semantic elements extraction;
(5) NOEM definition and (6) Ontology population for event knowledge base construction.
-24-NLP&CC, Beijing,
China
Publications Please see our previous work for more details Key Event Extraction
Wang, W., Zhao, D., Zhao, W.: Identification of topic sentence about key event in Chinese News. Acta Scientiarum Naturalium Universitatis Pekinensis 47(5),789–796 (2011).
5Ws Extraction Wang, W., Zhao, D., Zou, L., Wang, D., Zheng, W.: Extracting 5W1H Event
Semantic Elements from Chinese Online News. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 644–655. Springer, Heidelberg (2010)
Wang W., Zhao D., Wang D.: Chinese news event 5w1h elements extraction using semantic role labeling. In: the 3th ISIP. pp. 484–489(2010)
Framework Wang, W., Zhao, D.: Chinese News Event 5W1H Semantic Elements
Extraction for Event Ontology Population. WWW2012 PhD symposium. Lyon, France. (2012)
-25-NLP&CC, Beijing,
China
-26-NLP&CC, Beijing,
China
Title Based Key Event ExtractionInput: News document
Output: Topic sentencesBegin NLP-based Preprocessing: Title classification; // classified the title into informative or non-informative
Topic words extraction; //1)TFIDF; 2) PageRank in word co-occurrence graph
Title & Topic words co-occurrence analysis; //(1)
For each sentence do: Term frequency scoring; //(2)
Sentence location scoring; //(3)
Sentence length scoring; //(4)
Name entity scoring; //(5)
Sentence and title similarity scoring; //(6)
Sentence weighting & ranking; //(8)
End doEnd
-27-NLP&CC, Beijing,
China
Chinese News Semantic Elements ExtractionInput: Topic Sentences Output: < Subject, Predicate, Object, Time, Location> & How of newsBegin
For each topic sentence do
1) NE recognition;
2) NP recognition;
3) Event identification and classification by verb-driven & SVM ;
4) Syntactic-semantic rules-based <Subject, Predicate, Object> recognition;
5) Time expressions identification and normalization;
6) Location identification;
7) Topic sentences as short summarization;
End doEnd
Who did what to whom
Who did what to whom
WhenWhen
WhereWhere
HowHow
CRF-based NP taggerCRF-based NP tagger
HMM-based NER tool
HMM-based NER tool
WhatWhat
-28-NLP&CC, Beijing,
China