[international asian lod challenge day 2012]lod generation of social and mass media data: apply to...
Post on 17-Oct-2014
457 views
DESCRIPTION
I'll present at Nara 1st Dec, 2012TRANSCRIPT
LOD genera*on of Social and Mass media data:
Apply to media comparisons
Presenter: Kenji Koshikawa Co-‐Researcher(Adviser): T. Kawamura, H. Nakagawa, Y. Tanaka, A. Ohsuga
Affilia*on: Department of Social Intelligence and InformaBcs Graduate course of InformaBon Systems
The University of Electro-‐CommunicaBons
電気通信大学 大学院情報システム学研究科 社会知能情報学専攻
大須賀・田原研究室
Interna*onal Asian LOD Challenge Day 1st Dec, 2012
You can download this slide here: hHp://slidesha.re/TwzFrf
About Our Project
2
We do just two things on the project:
1. Building seman*c networks from media informa*on
2. Comparing with different media using the networks.
3
Project Abstract
when
what ac*on
status 太郎
幸せ
iPhone5 入手
昨日
JAWS時点
昨日太郎は秋葉原でiPhone5を購入したので、幸せそうだった。 (Yesterday, Taro bought a iPhone 5 at Akihabara, so he looked happy.) Event 1 Event 2
Example1:
Represen*ng events informa*on using seman*c network (RDF) 1/2
太郎(Taro)
iPhone 5
Event 2 Cause Event 1
Ac*vity Status
Time Time Object
昨日 (Yesterday)
幸福 (Happiness)
購買 (Buying)
Loca*on
秋葉原 (Akihabara)
4
2012-‐11-‐30
Conver*ng natural language into seman*c networks
Outpu[ng Linked Data as RDF/XML format e.g. “Taro bought a iPhone 5 at Akihabara, so he looked happy.”
5
an accident
a fall accident
to occur
a poor maintenance
the state of Florida, U.S.
the southern state of Florida
June
April
Represen*ng events informa*on using seman*c network (RDF) 2/2
Example 2 (from real media):
6
We do just two things on the project:
1. Building seman*c networks from media informa*on
2. Comparing with different media using the networks.
7
Project Abstract
Mass media Social media
A photo of Osprey
About Dataset : Period:
1st April – 16th Aug, 2012 Condition:
Dataset of Social media:
Twitter: 3,084 tweets
Dataset of Mass media: Asahi digital news paper: 116 articles
MSN Sankei news: 231 articles
Nippon News Network(NNN): 110 articles
Fuji News Network(FNN): 78 articles
A Case of media comparison Topic: Introduction of Osprey in Japan
Media textual information have a word “オスプレイ”(Osprey).
Consideration throughout visualizing network
9
• the difference of diversity of topic between each media
• easy to access minority opinion • the existence of 2 kinds of osprey (introduce) • the Laterality of dependence on
user loca*on
10
Summary of the existence of 2 kinds of osprey
※ The V-‐22's accident rate is the lowest of any Marine rotorcrab [Ref 01]
On mass media there are NOT information about following: • The existence of other variants (of Osprey) • The relation between the variants and the accident rate • The fact that the accident rate of a variant, be deployed in Japan is Lower than other rotorcraft ※
By visualizing, we found the existence of 2 kinds of osprey and the relation between the variants and accident rate. Thus, we could notice a doubt of media bias on mass media.
A doubt of media bias “Mass media hardly report about such information intentionally, and they was in a mood in the press fomenting the contrary opinion about introduction of osprey in Japan.”
Example of Considera*on: the existence of 2 kinds of osprey
11
This Figure has been showing that there are 2 kinds of variants of osprey according to the network built by social media dataset.
deploying CV-‐22 osprey
MV-‐22 osprey
Look around a “deploying” node
Mass Social
a common concept
A Color of node means the occurrence rate on each media.
12
Example of Considera*on: the existence of 2 kinds of osprey
CV-‐22 Osprey
MV-‐22 Osprey
There are the difference of use of each variant of osprey, It can be read from this figure.
e.g. MV-22: for transporting / CV-22: for ?
deploying
for transport, original requirement
Be nothing like
Harmful rumor Lower
13
Look around a “accident rate” node
Example of Considera*on: the existence of 2 kinds of osprey
Accident rate
low Copter Pilot error
Look around a “Accident rate of Osprey” node
Look around a “1.93” node Look around a “13.47” node
14
Example of Considera*on: the existence of 2 kinds of osprey
Accident rate Accident rate
Accident rate of MV-‐22 Accident rate of CV-‐22
Low
Accident rate of Osprey
Accident rate of Osprey
for the Special Opera*ons Command
Accident rate of Osprey
Look around a “Accident rate of Osprey” node
Look around a “1.93” node Look around a “13.47” node
15
Example of Considera*on: the existence of 2 kinds of osprey
Accident rate Accident rate
Accident rate of MV-‐22 Accident rate of CV-‐22
Low
Accident rate of Osprey
Accident rate of Osprey
for the Special Opera*ons Command
Accident rate of Osprey
The rela*on between the variants and the accident rate was reflected. (from social media dataset)
Summary • Introduced our project:
– To generate LOD from media informa*on – To compare with different media using the Linked Data
• We are looking for solving below: – en*ty resolu*on, instance matching problem – connect to other Linked Data
• In future work, we will concentrate on improving LOD visualiza*on for knowledge discovery.
• If you know interes*ng topic for media comparison, let me know.
16
[Ref 01] "V-‐22 Is The Safest, Most Survivable Rotorcrab The Marines Have."LexingtonInsBtute.org, February 2011. Retrieved: 16 February 2011.
[Ref 02] (Japanese) 越川 兼地, 川村 隆浩, 中川 博之, 田原 康之, 大須賀 昭彦: CRFを用いたメディア情報の抽出とLinkedData化 -‐ ソーシャルメディアとマスメディアの比較事例 -‐ ,合同エージェントワークショップ&シンポジウム(JAWS 2012), 2012. Slide (wriHen in Japanese): hHp://slidesha.re/11pf0qR
Reference
Appendix
Goal / Mo*va*on
1. To generate Linked Data from Media Informa*on – Mo*va*on:
• to organize abundance informa*on • to make us recognize real events easily
2. To compare with different media using the Linked Data (we generated) – Mo*va*on:
• to discover knowledge from the difference of informa*on between media
• to understand real events from mul*ple points of view
19
Our System Overview
20
Visualizing the Network
21
Mass Social
subject
activity
object
location
time
target
status
cause
quoted source
※we used a visualization Application: Gephi 0.8.1 beta
Color of edge: expresses kind of relationship between two concepts.
Color of node: expresses the occurrence rate of
concept between each media using 5 colors.
Size of node/Thickness of edge: are calculated based on the frequency information.
a common concept
Future Work • At this stage we just visualize the network, so users have to
discover knowledge themselves. – We are developing tools to support for knowledge discovery from the
network. • To es*mate important node/sub-‐network in the network.
• to evaluate our system and to be needed to experience other topic
• We are looking for solving below: – en*ty resolu*on, Instance matching.
• We will go up for LOD Challenge 2012 Japan.
– But, I’m not sure which sec*on is the best for our project.
22 Dataset Idea Applica*on Visualiza*on
23
整理: MV-22 / CV-22オスプレイの型番と事故率の関係
型番 用途 事故率
MV-‐22 (日本配備)
輸送用 1.93
米海兵隊所属 航空機平均
-‐ 2.45
CV-‐22 特殊作戦用(空軍) 13.47
日本に配備される(た)機種 「MV-22」の事故率は低い.
英語にする
事象の表現方法
24
事象情報を表現するために,[Nguyen 12]の 行動属性を拡張し9つの事象属性を定義した.
Event descripDon property
describe
Subject Subject of an event
Ac*vity Ac*vity of an event
Object Object of an ac*vity
Target (new) Against whom (e.g. people, country, …)
Status(new) Status of a subject
Loca*on Loca*on where an event occurred
Time Time informa*on when an event occured
Cause (new) Cause what an event occurred
Quoted source (new) Source of a quote [Nguyen 12]
The-‐Minh Nguyen, Takahiro Kawamura, Yasuyuki Tahara, and Akihiko Ohsuga: Self-‐Supervised Capturing of Users’ Ac*vi*es from Weblogs. Interna*onal Journal of Intelligent Informa*on and Database Systems,Vol.6, No.1, pp.61-‐76, InderScience Publishers, 2012
End