reporting on smerp data challenge - dcu school of...

22
Reporting on SMERP Data Challenge Speaker: Saptarshi Ghosh Department of CST, Indian Institute of Engineering Science and Technology Shibpur, India Department of CSE, Indian Institute of Technology Khargpur, India

Upload: nguyentram

Post on 22-Aug-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Reporting on SMERP Data Challenge

Speaker: Saptarshi Ghosh

Department of CST,Indian Institute of Engineering Science and Technology Shibpur, India

Department of CSE,Indian Institute of Technology Khargpur, India

Outline

1 Introduction and Motivation

2 The Test Collection

3 The Challenges

4 Task 1 : Text Retrieval

5 Task 2 : Summarization

Role on Microblogs during Disasters

Lot of useful situational information posted on microblogging sites likeTwitter during disaster events

Challenges in extracting the important information

Important information obscured amongst lot of sentiment, opinion, ...Microblogs are very short and written informallyLarge variation in vocabulary of crowdsourced content

Motivation for the track

Develop a standard data collection for evaluating IR and summarizationmethodologies for microblog retrieval during disasters

Inspired by TREC microblog track (which does not consider disaster scenario)

Outline

1 Introduction and Motivation

2 The Test Collection

3 The Challenges

4 Task 1 : Text Retrieval

5 Task 2 : Summarization

The Microblog dataset

Collected tweets posted during two weeks after the devastating earthquake inItaly in August 2016

Used Twitter Search API with the keyword ‘italy’

About 180,000 tweets in English collected

Removed duplicates and near-duplicates based on presence of common words

Final dataset of 72,220 tweets

Topics for retrieval

Consulted members of NGOs who work in disaster-affected regions – whatare the typical information requirements during a disaster relief operation?

Identified 4 broad information requirements (topics)

SMERP-T1: What resources are availableSMERP-T2: What resources are requiredSMERP-T3: What infrastructure damage, restoration and casualties arereportedSMERP-T4 : What are the rescue activities of various NGOs / governmentorganizations

Examples of relevant tweets

SMERP-T1: What resources are available

Long queues to donate blood after Earthquake strikes Central Italy.#PrayForItaly [url]Earthquake in Central Italy, collection of food in the supermarkets of the ValledAosta

SMERP-T2: What resources are required

#earthquake Avis Rieti hospital Avis ask for blood donors, all blood typeItaly Earthquake: ’No electricity or food’ - Sabarina lives 8 miles from theepicentre [url] via @audioBoom

SMERP-T3: What infrastructure damage, restoration and casualties arereported

A 6.1 quake hit Italy, damaged buildings near Rieti & ppl fleeing homes[url]Death toll rises to 159, over 360 injured after deadly earthquake rocks centralItaly, Army mobilized in rescue

Outline

1 Introduction and Motivation

2 The Test Collection

3 The Challenges

4 Task 1 : Text Retrieval

5 Task 2 : Summarization

The Challenges

Tasks

Task 1 : Text Retrieval

Task 2 : Summarization

The Challenges : Levels

Levels

Level 1 : The tweets collected during the first day (24 hours) – 52,469tweets given

Level 2 : The tweets collected during the second day (24 hours) – 19,751and gold standard of Level 1 given

The Challenges : Format

Participants given

The tweet-ids, and a Python script to download the tweets using Twitter API

The four topics in the format conventionally used for TREC topics (number,title, description, narrative)

The Challenges : Types of methodologies considered

Full-Automatic – No manual intervention at any stage

Semi-automatic – manual intervention involved in query formulation stage(but not in the retrieval/summarization stage)

Outline

1 Introduction and Motivation

2 The Test Collection

3 The Challenges

4 Task 1 : Text Retrieval

5 Task 2 : Summarization

Task 1 : Evaluation

5 teams participated

Runs submitted :

Level 1 : 10 runs; 2 automatic, 8 semi-automaticLevel 2 : 14 runs; 2 automatic, 12 semi-automatic

Primary evaluation measure – Bpref ; ties broken by MAP.

Task 1 : Evaluation – Level 1

Team-id Run-id Run type Bpref MAP Method summaryDCU dcu ADAPT run2 Full-automatic 0.617 0.0517 QE using WordNet,

BM25

DCU dcu ADAPT run1 Full-automatic 0.608 0.0572 QE from FIRE2016 dataset,BM25

DCU dcu ADAPT run3 Semi-automatic 0.4407 0.0338 Manual Annotation,QE using WordNet

USI USI 1 Semi-automatic 0.3286 0.1403 QE from Nepal earthquake 2015,Boolean conjunction

DAIICT daiict irlab 2 Semi-automatic 0.3171 0.0417 POS tagging, QE using WordNet,BM25

RU rel ru nl lang analy Semi-automatic 0.3153 0.0678 Rule-based approach, no ranking of results

DAIICT daiict irlab 1 Semi-automatic 0.3074 0.0391 POS tagging, QE using WordNet,Cosine similarity between tweets and expanded topic

CSPIT charusat smerp17 1 Semi-automatic 0.2021 0.018 Word2Vec, BM25

RU rel ru nl ml Semi-automatic 0.1973 0.0375 Relevancer without ranking

USI USI 2 Semi-automatic 0.1803 0.0955 QE from Nepal earthquake 2015,Boolean conjunction, POS tagging

Table: Level 1 results. The table is sorted according to the Bpref measure.

Task 1 : Evaluation – Level 2

Team-id Run-id Run type Bpref MAP Method summaryDCU dcu ADAPT run2 Full-automatic 0.7767 0.06 Same as Level 1

DCU dcu ADAPT run1 Full-automatic 0.6861 0.0627 Same as Level 1

RU ru nl ml0 Semi-automatic 0.4724 0.1295 Relevancer with ranking

RU rel ru nl lang analy1 Semi-automatic 0.3846 0.1323 Rule-based approach, results ranked

RU rel ru nl lang analy0 Semi-automatic 0.3846 0.0853 Rule-based approach, no ranking of results

DCU dcu ADAPT run3 Semi-automatic 0.3821 0.0399 Same as Level 1

RU ru nl ml1 Semi-automatic 0.3097 0.1093 Combined approach

USI USI 2.1 Semi-automatic 0.3029 0.1549 QE on Level 1 training data, Boolean conjunction

DAIICT daiict irlab l2 2 Semi-automatic 0.2869 0.0635 Same as Level 1

DAIICT daiict irlab l2 1 Semi-automatic 0.2869 0.0571 Same as Level 1

USI USI 2.2 Semi-automatic 0.2425 0.1462 QE on Level 1 training data, Boolean conjunctionNaive Bayes classifier

USI USI 2.3 Semi-automatic 0.1828 0.1266 QE on Level 1 training data, Boolean conjunctionPOS tagging

DAIICT daiict irlab l2 3 Semi-automatic 0.1204 0.0433 POS tagging, QE using WordNet,Language model

CSPIT charusat smerp17 2 Semi-automatic 0.0218 0.0072 Same as Level 1

Table: Level 2 results. The table is sorted according to the Bpref measure.

Outline

1 Introduction and Motivation

2 The Test Collection

3 The Challenges

4 Task 1 : Text Retrieval

5 Task 2 : Summarization

Task 2 : Evaluation

4 teams participated

Summaries submitted :

Level 1 : 7 summaries; 4 automatic, 3 semi-automaticLevel 2 : 4 summaries; all semi-automatic

Primary evaluation measure – ROUGE-L; ties broken by ROUGE-SU4.

Task 2 : Evaluation – Level 1

Team-id Summary-id Summary type ROUGE-L ROUGE-SU4 Method summaryUSI USI 2 1 Full-automatic 0.3029 0.1011 Relevance, Novelty,

Word2Vec, POS,linear interpolation with a specific parameter value

USI USI 2 2 Full-automatic 0.2809 0.0903 Relevance, Novelty using Word2Vec,POS,

linear interpolation with a specific parameter value

USI USI 1 1 Full-automatic 0.2806 0.0947 Relevance, Novelty using Word2Vec,linear interpolation with a specific parameter value

USI USI 1 2 Full-automatic 0.275 0.09 Relevance, Novelty,Word2Vec,

linear interpolation with a specific parameter value

IIEST Kanav Mehra Semi-automatic 0.4885 0.2329 SumBasic summarizer,Naive Bayes classifier

IIEST Kanav Mehra Semi-automatic 0.4375 0.1983 SumBasic summarizer,ensemble classifier

DAIICT daiict irlab summ 1 Semi-automatic 0.3085 0.1055 Relevance,Novelty using Jaccard based word overlap

Table: Level 1 results. The table is sorted according to the ROUGE-L measure.

Task 2 : Evaluation – Level 2

Team-id Summary-id Summary type ROUGE-L ROUGE-SU4 Method summaryIIEST Kanav Mehra Semi-automatic 0.5142 0.2864 Same as Level 1

but with feedback from Level 1 training data

IIEST Kanav Mehra Semi-automatic 0.4796 0.2505 Same as Level 1but with feedback from Level 1 training data

DAIICT daiict irlab summ 1 Semi-automatic 0.3254 0.1194 Same as Level 1

CSPIT Sindur Patel Semi-automatic 0.3233 0.122 Cosine similarity, Jaccard similarity

Table: Level 2 results. The table is sorted according to the ROUGE-L measure.