ontology-based event modeling for semantic understanding of chinese news story wang wei, zhao...

28
Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology [email protected] NLP&CC 2012 – Beijing, China

Upload: osbaldo-gorbett

Post on 29-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Ontology-Based Event Modeling for Semantic

Understanding of Chinese News Story

Wang Wei, Zhao Dongyan

Institute of Computer Science & Technology

[email protected]

NLP&CC 2012 – Beijing, China

Page 2: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Outline

Introduction Related Work

Event definitions Existing event models

News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM

Evaluation Conclusion

-2-NLP&CC, Beijing,

China

Page 3: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Introduction “News Information Overload”

Numerous online news service providers Explosive increase of online news users

Persons (Ten

thousand )

Numbers of online news users and time they spend in browsing news

-3-NLP&CC, Beijing,

China

Page 4: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Introduction Classification & summarization are widely used in

online news domain document-oriented techniques based on traditional “BOW”

models can not provide sufficient event semantic information

Users need intelligent event level semantic news services to push events but not documents to users employing entities and relations to provide semantic

navigation, e.g., renlifang of Microsoft, soso waltz of Tencent

Web of Document Web of Data Web of entity and relation

-4-NLP&CC, Beijing,

China

Page 5: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

-5-

Introduction

基于关键词的分析,容易造成“语义”错误

事件发生地是鹭岛而非香港

上海演唱会是王菲的事件,与刘德华无关

How to provide multi-dimensional semantic navigation?

5W1H : Who, When, Where, What, Why, How

NLP&CC, Beijing, China

Page 6: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Introduction

-6-

Our research aim is semantic understanding of Chinese news by extracting

entities, relations involved in a key event of a news story building a news events knowledge base as well as a

semantic retrieval engine to support event level semantic applications

We implemented a novel framework to address the whole list of 5W1H

key event identification event semantic elements extraction Ontology-based event knowledge base construction

This paper discusses Ontology-Based Event Modeling for Semantic

Understanding of Chinese News StoryNLP&CC, Beijing, China

Page 7: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Chinese Online News

Chinese Online News

Methodology

Key event identification in one

news storyEvent

knowledge base

Event knowledge

baseEvent semantic modeling and ontology population

5W1H event semantic-elements

extraction

-7-NLP&CC, Beijing,

China

Page 8: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Outline

Introduction Related Work

Event Definitions Existing Event Models

News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM

Evaluation Conclusion

-8-NLP&CC, Beijing,

China

Page 9: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Related Work Event Definitions

WordNet “something that happens at a given place and time.”

Cognitive psychologists “happenings in the outside world”, people observe and understand the world

through event . Linguists (Chung and Timberlake, 1985)

“an event can be defined in terms of three components: a predicate; an interval of time on which the predicate occurs and a situation or set of conditions under which the predicate occurs.”

TimeML “a cover term for situations that happen or occur. Events can be punctual or

last for a period of time.” ACE (Automatic Content Extraction)

“an event involving zero or more ACE entities, values and time expressions” Event-based summarization

atomic events: link major constituent parts (participants, locations, times) of events through verbs or action nouns labeling the event itself.

-9-NLP&CC, Beijing,

China

Page 10: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Related Work

-10-NLP&CC, Beijing,

China

<S, P, O, T, L>, where S, P, O are core elements and T, L are subordinates.<S, P, O, T, L>, where S, P, O are core elements and T, L are subordinates.

We define event as “an event is a specific occurrence which involves in some

participants”. It has three components:

a predicate; core participants, i.e., agents and patients; auxiliary participants, i.e., time and location of the event.

These participants are usually named entities which correspond to what, who, whom, when, where elements of an event.

Page 11: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Related Work

-11-NLP&CC, Beijing,

China

Existing Event Models Script Theory, Event Domain Cognitive Model

Cognitive linguistics Probabilistic Event Model

TDT Atomic Event Model

Event-based automatic summarization Structural Event Model

MUC & ACE Generic Event Model

Eventcentric multimedia data management Ontology Event Models

ABC, PROTON, EO (Event Ontology) , Event-Model-F

Page 12: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Outline

Introduction Related Work

Event Definitions Existing Event Models

News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM

Evaluation Conclusion

-12-NLP&CC, Beijing,

China

Page 13: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

News Ontology Event Model

Modeling (1) event information, (2) event relations, (3) event

media

-13-NLP&CC, Beijing,

China

Page 14: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Main concepts

Relations

News Ontology Event Model

-14-NLP&CC, Beijing,

China

Page 15: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Outline

Introduction Related Work

Event Definitions Existing Event Models

News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM

Evaluation Conclusion

-15-NLP&CC, Beijing,

China

Page 16: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Evaluation

-16-

Janez Brank et. al. classified ontology evaluation methods into four categories: (1) Comparing the ontology to a “golden standard”; (2) Using an ontology in an application and evaluating

the results; (3) Comparing with a source of data about the domain to

be covered by the ontology; (4) Evaluation is done by humans who try to assess how

well the ontology meets a set of predefined criteria, standards, requirements.

NLP&CC, Beijing, China

Page 17: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Comparison between NOEM and existing event models

Evaluation

-17-NLP&CC, Beijing,

China

Page 18: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Evaluation

-18-

Manual labeling 4 postgraduates 6000+ Chinese News

stories from Xinhua news agency

Covers 23 top classes and 2082 subclasses of CNML

In 85% of them, we found a topic sentence which

contains key event of the news 4/5Ws in the topic sentence

which can be described by NOEM appropriately

Category code

Category name Subclasses

1 政治 85

2 法律、司法 76

3 对外关系、国际关系 72

4 军事 129

5 社会、劳动、灾难事故 105

11 经济 132

12 经济理论研究 132

13 基本建设、建筑业、房地产 47

14 农业、农村 99

15 矿业、工业 239

16 能源、水务、水利 69

17 信息产业 72

18 交通运输、邮政、物流 65

19 商业、外贸、海关 55

21 服务业、旅游业 84

22 环境、气象 43

31 教育 63

33 科学技术 70

35 文化、娱乐休闲 98

36 文学、艺术 130

37 传媒业 61

38 医药、卫生 88

39 体育 68NLP&CC, Beijing,

China

Page 19: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Evaluation: A Case Study Chinese President Hu Jintao arrived in Canada for a state visit

Result of 5W1H extraction of key event

<抵达 , isTypeof, Movement/Transport>,

<胡锦涛 , isTypeof, Person>,

<8 日 , isTypeof, Time> ,

<渥太华 , isTypeof, Place>

……5W1H

Extraction5W1H

Extraction

-19-NLP&CC, Beijing,

China

Page 20: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Evaluation: Population of NOEM

An automatic generated OWL File

Chinese President Hu Jintao arrived in Canada for a state visit

Ontology

PopulationOntology

Population

-20-NLP&CC, Beijing,

China

Page 21: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Outline

Introduction Related Work

Event Definitions Existing Event Models

News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM

Evaluation Conclusion

-21-NLP&CC, Beijing,

China

Page 22: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Conclusion

-22-NLP&CC, Beijing,

China

Main contributions an extensive investigation of “event” and “event

modeling” the usage of concept of 5W1H semantic elements in Chinese

news domain the design of ontology-based event model: NOEM

defining concepts of entities (time, person, location, organization etc.), events and relationships to capture temporal, spatial, information, experiential, structural and causal aspect, e.g. the 5W1H, of an event

Future work building a news events knowledge base and a semantic

retrieval engine on NOEM to support event level semantic applications

Page 23: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

The End

Thank you for your patience!

Q&A

Page 24: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Framework A streamline of three steps and six sub-tasks

(1) Title classification and (2) topic sentences extraction for key event identification;

(3) Semantic role labeling and (4) 5W1H elements identification for event semantic elements extraction;

(5) NOEM definition and (6) Ontology population for event knowledge base construction.

-24-NLP&CC, Beijing,

China

Page 25: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Publications Please see our previous work for more details Key Event Extraction

Wang, W., Zhao, D., Zhao, W.: Identification of topic sentence about key event in Chinese News. Acta Scientiarum Naturalium Universitatis Pekinensis 47(5),789–796 (2011).

5Ws Extraction Wang, W., Zhao, D., Zou, L., Wang, D., Zheng, W.: Extracting 5W1H Event

Semantic Elements from Chinese Online News. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 644–655. Springer, Heidelberg (2010)

Wang W., Zhao D., Wang D.: Chinese news event 5w1h elements extraction using semantic role labeling. In: the 3th ISIP. pp. 484–489(2010)

Framework Wang, W., Zhao, D.: Chinese News Event 5W1H Semantic Elements

Extraction for Event Ontology Population. WWW2012 PhD symposium. Lyon, France. (2012)

-25-NLP&CC, Beijing,

China

Page 26: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

-26-NLP&CC, Beijing,

China

Page 27: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Title Based Key Event ExtractionInput: News document

Output: Topic sentencesBegin NLP-based Preprocessing: Title classification; // classified the title into informative or non-informative

Topic words extraction; //1)TFIDF; 2) PageRank in word co-occurrence graph

Title & Topic words co-occurrence analysis; //(1)

For each sentence do: Term frequency scoring; //(2)

Sentence location scoring; //(3)

Sentence length scoring; //(4)

Name entity scoring; //(5)

Sentence and title similarity scoring; //(6)

Sentence weighting & ranking; //(8)

End doEnd

-27-NLP&CC, Beijing,

China

Page 28: Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology wjwangwei@pku.edu.cn

Chinese News Semantic Elements ExtractionInput: Topic Sentences Output: < Subject, Predicate, Object, Time, Location> & How of newsBegin

For each topic sentence do

1) NE recognition;

2) NP recognition;

3) Event identification and classification by verb-driven & SVM ;

4) Syntactic-semantic rules-based <Subject, Predicate, Object> recognition;

5) Time expressions identification and normalization;

6) Location identification;

7) Topic sentences as short summarization;

End doEnd

Who did what to whom

Who did what to whom

WhenWhen

WhereWhere

HowHow

CRF-based NP taggerCRF-based NP tagger

HMM-based NER tool

HMM-based NER tool

WhatWhat

-28-NLP&CC, Beijing,

China