entity-oriented filtering of large streams trec kba 2013 john r. frank [email protected] ian soboroff...

11
Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank [email protected] Ian Soboroff ian.soboroff@ni st.gov Max Kleiman- Weiner [email protected] Dan A. Roberts [email protected] http://trec-kba.org

Upload: abraham-floyd

Post on 03-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

Entity-oriented Filtering of Large StreamsTREC KBA 2013

John R. [email protected]

Ian [email protected]

Max [email protected]

Dan A. [email protected]

http://trec-kba.org

Page 2: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

Two Tasks in KBA 2013

• CCR: Cumulative Citation Recommendation, same as 2012 with more entities. Find documents that mention the target entities and are worth citing in a KB like WP.

• SSF: Streaming Slot Filling, same target entities as CCR with a slot identified for each entity. Find the changes to the slot values for each hour in the corpus.

Page 3: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

IR:•User task centric•Variation in interpretation•Scores cascading lists•Constructionist, emergence

NLP:•Data parsing centric•Universal annotation•Scores probabilities•Reductionist

TRECing the continental divide between NLP and IR

Page 4: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

2012 Task:Filtering to Recommend Citations

1) Initialize with a target WP entity• state of WP from Jan 2012

2) Iterate over stream of text items• Oct-Dec 2011: train on labels

3) For each, output confidence between 0, 1• Jan-Apr 2012: labels hidden

Content Stream•462M texts, 40% English•4,973 hourly chunks of a 105 docs/hour•News, blogs, forums, and link shortening

Content Stream•462M texts, 40% English•4,973 hourly chunks of a 105 docs/hour•News, blogs, forums, and link shortening

Your KBA System

Entities in Wikipedia or another Knowledge Base

Automatically recommend

new editsDiffeo

Sponsors:

Page 5: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu
Page 6: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

CCR rating is pre-hoc.

Published: March 31, 2012Impact of Thoughts on WaterBy Denis Gorce-Bourge

Water covers 70% of our Blue planet and our body is made of about 70% water.

Masaru Emoto is a Japanese Photographer and scientist. He is known over the world for his remarkable work on water and its deep connection with individual and collective consciousness.

For decades, Masaru took pictures of frozen crystals of water and tested the direct influence of the environment on the quality of those crystals.

Pollution has a direct impact on the beauty of a frozen crystal but as well words, music and thoughts. He tested the quality of water crystals by exposing it to various conditions: to written words like hate and violence and Love and gratitude. The results were just astonishing. The crystal exposed to Love and gratitude was beautiful and perfectly formed where the other one was severely degraded. He demonstrated as well the impact of Heavy Metal music versus Mozart or Beethoven and how the vibration of music impacts water.

The very shape of water crystals is modified by violence, aggression, and negative words.

Useful Vital

Vital = Citation Worthy

Page 7: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu
Page 8: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

BBC: “What would you say to forming The Beatles - The Next Generation” … James: “… I'd be up for it.”

echo

Confidence of sensitive versus insensitive systems

SSF example: “FounderOf” slot on James McCartney

Page 9: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

Examples of redundant texts that appear after the document that changed the slot. COREF, EQUIV, REL, NOV All of these are examples of detecting REDUNDANCY: Doc 1) Nothing may be sacred after all: Sir Paul McCartney's son James is interested in starting a second-generation Beatles band with John Lennon's son Sean, George Harrison's son Dhani and Ringo Starr's son Zak. Doc 2) UPDATE: James McCartney has clarified his comments on his Facebook page: Hi Everyone...well, looks like quite some attention being given to my BBC interview! Honestly, I was just thinking out loud about playing with Beatles family friends, nothing more. My band’s going to be on tour in the UK and US for most of this year, and the shows are going great! I'm so grateful…Lots of love to you all…! Doc 3) It is 42 years since the world’s most famous band broke up, following an acrimonious split between former best friends Paul McCartney and John Lennon. Now, however, Sir Paul’s only son James has revealed a new group featuring the offspring of the Fab Four could become a reality. James – who was last night due to follow in The Beatles’ footsteps by playing the Cavern Club in Liverpool …

Page 10: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

KBXCold Start queries

focused on:nil entities

related to target clusterand/or

causality of event

Pool top-K filtered docs, or use each KBA run as separate KBP input.

(1000x filter)

Must coordinate choice of KBA target entities with

desired content of KBs for Cold Start queries.

KBA Stream Corpus 2012 (or the new Stream Corpus 2013)•462M texts, 40% English•4,973 hourly chunks of a 105 docs/hour•News, blogs, forums, and link shortening

KBA Stream Corpus 2012 (or the new Stream Corpus 2013)•462M texts, 40% English•4,973 hourly chunks of a 105 docs/hour•News, blogs, forums, and link shortening

Output KB

Output KB

Clusters of related entities

and/orevent-type

entities

Clusters of related entities

and/orevent-type

entities

KBP

KBA

Page 11: Entity-oriented Filtering of Large Streams TREC KBA 2013 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov Max Kleiman-Weiner maxkw@mit.edu

Longer-Lived Phenomena

Sudden Onset Phenomena

Widely KnownEvent

Notoriety

Mid-to-Long-Tail small plane crash

Iraq WarJapan Earthquake (36hrs)

TREC TempSum

Life of James McCartney

TREC KBA

Duration

Hurricane Sandy (months)

TREC 2013 Filtering Tasks

Financial Activities of Russian Oligarchs