relation-wise automatic domain-range information management for knowledge entries

15
Relation-wise Automatic Domain- Range Information Management for Knowledge Entries Md-Mizanur Rahoman & Ryutaro Ichise The Graduate University for Advanced Studies, Tokyo, Japan National Institute of Informatics, Tokyo, Japan Begum Rokeya University, Rangpur, Bangladesh

Upload: national-inistitute-of-informatics-nii-tokyo-japann

Post on 13-Apr-2017

32 views

Category:

Technology


0 download

TRANSCRIPT

Relation-wise Automatic Domain-Range Information Management for

Knowledge Entries

Md-Mizanur Rahoman & Ryutaro Ichise

The Graduate University for Advanced Studies, Tokyo, Japan

National Institute of Informatics, Tokyo, Japan

Begum Rokeya University, Rangpur, Bangladesh

Outline

• Background

• Problem & Possible Solution

• Proposed Framework

• Experiment

• Conclusion

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 2

Background

• knowledge-base (KB) construction and management gained interest

• relations play great role in KB

• construction – generation of knowledge entries <Subject, relation, Object>

• e.g., <Obama, born_in, Hawaii>

• management – validation of knowledge entries

• e.g., domain(born_in) = Person, range(born_in) = Place

• not all knowledge-base maintain domain-range validation for relation, e.g., Freebase

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 3

Problem• existence of wrong entries – e.g., in current

• costly maintenance - domain-range selection is not automatic

• manual checking time consuming

• require domain level expertise

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 4

Subject Relation Object

Paprika type Book

Paprika author Yasutaka Tsutsui

Freedom in Exile type Book

Freedom in Exile author 14

Possible Solution

• Intuition

• Subjects of a relation should hold some similarity

• extract features for Subject entities and generate learning model e.g.,

• Subject(born_in) will only comply if it is Person i.e., domain

• Objects of a relation should hold some similarity

• extract features for Objects entities and generate learning model e.g.,

• Object(born_in) will only comply if it is Place i.e., range

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 5

Proposed Framework• required resource

• language specific relation - e.g., born_in, spouse, author etc.

• language specific training example - e.g., entries

• language specific large text corpus - e.g.,

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 6

Subject Relation Object

Obama born_in Hawaii

Trump born_in New York

Clinton born_in Chicago

… … …

Proposed Framework

• process

• Word Vectorizer

• generate features for words from a large text corpus

• Model Generator

• generate supervised machine learning models for the extracted features

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 7

Word Vectorizer

• take large text corpus e.g.,

• use Word2Vec* implementation for word embedding

• generate feature vectors for text vocabulary

• maintain linguistic context for the corpus

• put similar words into similar kind of vectors

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 8

* https://code.google.com/p/word2vec/

Model Generator (1/4)

• For each relation

• collect positive and negative training words

• collect feature vectors for training words

• generate two supervised machine learning models (domain & range model) that classify

• a word element should belong to domain or not

• a word element should belong to range or not

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 9

Model Generator (2/4)

• positive features• collected from existing knowledge entries

• divided into Subject element feature vectors and Object element feature vectors

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 10

Subject Relation Object

Obama born_in Hawaii

Trump born_in New York

Clinton born_in Chicago

… … …

Model Generator (3/4)

• negative features

• collected for random vocabularies of text corpus

• excluded for positive word elements that already considered

• maintained for same number of negative and positive training

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 11

Model Generator (4/4)

• models• domain model

• generated for Subject element feature vectors and negative word feature vectors

• used decision tree-based learning model

• range model • generated for Object element feature vectors and negative

word feature vectors• used decision tree-based learning model

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 12

Experiment

• resource• relations – 32 frequent English relations (among first 100)

• Cat-1 – range values are distributed over domain e.g., candidate • Cat-2 – range values are concentrated over domain e.g., genre

• training example – entries for the relations

• Text corpus – English

• evaluation metrics - accuracy

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 13

Result

• purpose – show how accurately it can detect correct (pos) and incorrect (neg) entries, and mix (i.e., pos + neg)

• finding – same type of word belong to same kind of feature vectors, model generalize the words

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 14

Conclusion

• Observation• a relation should hold same type of elements as Subject and same

type of elements as Object• generalization of Subject and Object can automatically generate

domain and range for a relation - experiment result support this assumption

• Future Work• look for more sophisticated learning model other than decision

tree• want to investigate different word embedding other than the

default in word2vec

30-Jan-2017 Relation-wise Automatic Domain-Range Information Management for Knowledge Entries I Rahoman & Ichise 15