building the polimedia search system; data- and user-driven

45
www.polimedi a.nl Building the PoliMedia system; data- and user-driven

Upload: maxkemman

Post on 29-Jun-2015

444 views

Category:

Technology


1 download

DESCRIPTION

Presentation at eHumanities group at Meerten's Institute (Amsterdam) on Thursday 18 April 2013. Analysing media coverage across several types of media-outlets is a challenging task for (media) historians. A specific example of media coverage research investigates the coverage of political debates and how the representation of topics and people change over time. The PoliMedia project (http://www.polimedia.nl) aims to showcase the potential of cross-media analysis for research in the humanities, by 1) curating automatically detected semantic links between four data sets of different media types, and 2) developing a demonstrator application that allows researchers to deploy such an interlinked collection for quantitative and qualitative analysis of media coverage of debates in the Dutch parliament. These two goals reflect the two perspectives on the development of a search system such as PoliMedia; data- and user-driven. In this presentation, Laura Hollink (VU) will present the data-driven perspective of linking between different datasets and the research questions that arise in achieving this linkage: how to combine different types of datasets and what kind of research questions are made possible by the data? Max Kemman (EUR) will present the user-driven perspective: which benefits can scholars have from linking of these datasets? What are the user requirements for the PoliMedia search system and how was the system evaluated with scholars in an eye tracking study?

TRANSCRIPT

Page 1: Building the PoliMedia search system; data- and user-driven

www.polimedia.nl

Building the PoliMedia system; data- and user-driven

Page 2: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 2

Who are we?Laura Hollink• Assistant professor at VU• Modeling, linking and enrichment

of data• Data-driven research• @laurahollink

Max Kemman• Junior researcher at EUR• Human-Computer Interaction• User-driven research• @MaxJ_K

PoliMedia teamHenri Beunders (EUR)Jaap Blom (NISV)Laura Hollink (VU)Geert-Jan Houben (TU Delft)

Funded by CLARIN-NL

Damir Juric (TU Delft)Max Kemman (EUR)Martijn Kleppe (EUR) Johan Oomen (NISV)

Page 3: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 3

Linking Politics to Media

Page 4: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 4

The research questions• How is a person, subject or process covered & visualised by the media?• How do debates and arguments develop over a longer period of time?• Analysing the changing ideas, arguments and presentation in different

media

Page 5: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 5

Issues with current approach

Page 6: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 6

Issues with current approach

Page 7: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 7

Goal: explicit links to different media types in one system

Page 8: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 8

PoliMedia systemPoliMedia

Portal

- Browse: debate and date

- Search: debate and person

NewspapersKB

TelevisionSound and Vision

RadioKB

Staten Generaal Digitaal

KB

Data-driven (Laura) & user-driven (Max)

Page 9: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 9

Data

Page 10: Building the PoliMedia search system; data- and user-driven

Debate dataHandelingen der Staten-General or Dutch Hansard from 1945-1995

Some provenance:1. Transcripts are made of the complete debates of the Dutch

parliament.2. Published online by the government on

http://www.statengeneraaldigitaal.nl/ (1818 1995) and http://officielebekendmakingen.nl/ (from 1995)

3. PoliticalMashup project has translated government pdf and txt files into XML, incl URI’s as identifiers, see http://politicalmashup.nl/

4. We build on that.

Page 11: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 11

Structure of the debate data

Including:• who, when, what • identifiers for subparts o

f the debate• chronological order of

speakers

Page 12: Building the PoliMedia search system; data- and user-driven

Media data• Newspaper articles

– at the National Library of the Netherlands

– Many newspapers 1950- 1995– Text + images of newspaper

layout

• Radio bulletins

– Transcripts of ANP news

• Newscasts

– in the Academia collection of the Netherlands institute for Sound and Vision

Page 13: Building the PoliMedia search system; data- and user-driven

Semantic model

Page 14: Building the PoliMedia search system; data- and user-driven

Semantic model

Reuse of vocabularies:

Simple Event Model (SEM), Dublin Core, FOAF, links to ISOCAT data categories.

Page 15: Building the PoliMedia search system; data- and user-driven

15

Linked Data

eHumanities group - PoliMedia

• Data openly accessible in a semantic Web standard• Easy to combine with other semantic Web data• E.g. DBpedia data on politicians and parties.

Page 16: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 16

Linking Debates to Newspaper articles that cover them

• Challenges:– How to link documents that are so different in

nature?– Can we use the structure of the debates: people,

chronologic order of speeches, introductions to each new topic, etc.

– How can we do this efficiently, using the access mechanisms of the archives?

Page 17: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 17

Linking approach

Page 18: Building the PoliMedia search system; data- and user-driven

Detect topicsThe MALLET topic model package• Unsupervised analysis of text• “a Topic consists of a cluster of words that frequently occur together”• [see http://mallet.cs.umass.edu/topics.php]• Input:

– Text– Number of iterations– Number of topics/clusters

• Output:– Words that cluster around one topic.

• Example:– Text: a speech in a debate from 1975– number of iterations: 2000– number of topics: 1

Page 19: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 19

Create Queries

Page 20: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 20

Evaluation

• Experiment 1: NEs in speech• Experiment 2: NEs + topics in speech• Experiment 3: NEs + topics in speech and debate

Page 21: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 21

Results

• A linked open data set of Dutch parliamentary debates.

• With links to URL’s of news paper articles and radio bulletins at the Royal Library.

• A system that supports researchers in finding the data to answer their questions.

Page 22: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 22

User-driven What do scholars want?

• Why user research?• Understanding the user [1, 2]

– Acceptance– Performance– Capabilities– Weaknesses

• Goal– Creating a system that is intuitive and helpful to the users

[1] Y. Liu, A. Osvalder, and M. Karlsson, “Considering the importance of user profiles in interface design,” no. May, 2010[2] J. Preece, Y. Rogers, and H. Sharp, “Interaction Design: Beyond Human-Computer Interaction,” Design, vol. 18, no. 1, pp. 68-68, 2002

Page 23: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 23

User research in the development process

• Examine search behaviour of users– Survey regarding search strategies– Interviews

• User wishes → user requirements• Wireframes → Prototype• Evaluation →New prioritization of remaining

user requirements• Final version

Page 24: Building the PoliMedia search system; data- and user-driven

24

SurveyGeneral search strategies

• N=294• Popular search engines

Very often

Often

Regularly

Sometimes

Never

Don’t know it Goo

gle

Goo

gle

Imag

esG

oogl

e Sc

hola

rYo

uTub

eJS

TOR

KB Flic

krEB

SCO

Nati

onaa

l Arc

hief

Web

of K

now

ledg

e

Uitz

endi

ng G

emis

t

Yaho

o!Bi

ngAc

adem

ia.n

lEu

rope

ana

Scop

usM

icro

soft

Aca

dem

ic S

earc

hEU

scre

enAr

kyve

s

Page 25: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 25

SurveyGeneral search strategies

1. Keywords 4,752. Advanced search 3,363. Related terms 2,524. Boolean 2,425. Browsing subject

categories 2,296. Filters 2,197. Thesaurus 1,878. Visualization 1,22

Page 26: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 26

SurveyConclusions

• Google is the dominant search engine• This has two consequences

1. People compare other search systems to their experience with Google

2. The search task is mainly performed by using keywords

Page 27: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 27

Interviews

• N=5• Quantitative (n=2) as well as qualitative (n=4)• Main themes

– How do people search currently?– What could be improved about current search systems?– What should PoliMedia offer, given its goals?

• Results– 39 user wishes– Prioritized internally

• 19 user wishes deemed out of scope• 20 user requirements

Page 28: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 28

Interviews Findings

• Key issue is to provide a good overview of data – Why are search results retrieved– How are search results ranked

• Assumptions of relevance– Higher frequency of keywords indicated higher relevancy to query?– Longer segments (speeches and articles) indicate higher

importance?• Many more or less out-of-scope wishes to make current

research easier– Sentiment-metadata– Context metadata– Ability to export to own software

Page 29: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 29

• Clear and immediate keyword-search

• Support for Booleans and (some) Google-search operators

• Separate advanced-search

WireframesSearch interface

Page 30: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 30

WireframesSearch results

• Keyword search remains prominent

• User chosen ranking of results

• Keyword highlighting

• Overview of related media

• Support for filtering

Page 31: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 31

WireframesDebate page

• Keyword search remains prominent

• Overview of people in debate

• Easy access to related material

Page 32: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 32

Prototype v1.0

Page 33: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 33

Evaluation

• Eye tracking evaluation of the search system– Search system was still in development

• N=24– History– Political communication

• Goals– Gain understanding of distribution of attention– Collect general feedback on interface

Page 34: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 34

Evaluation Eye tracking

• Viewing Duration

• Search bar received little attention after search results were displayed

• Facets received a lot of attention• Page-search (CTRL+F) mainly received

attention on debate page view

Tasks Search bar Facets Search results Page-search

Known Item 17% 22% 60% 2%

Exploratory 6% 12% 80% 2%

Page 35: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 35

Evaluation Usability feedback

• The ranking of search results was an issue for users

• The year-filter should be a slider• The debate page should be greatly improved– Better identification for speaker, party, topic,

relevance to query– Provide filters on debate-page as well

Page 36: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 36

Prototype v2.0

Page 37: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 37

Prototype v2.0 - query

Page 38: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 38

Prototype v2.0 – filter speaker

Page 39: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 39

Prototype v2.0 - filter role

Page 40: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 40

Prototype v2.0 - debate

Page 41: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 41

Prototype v2.0 - highlight speech

Page 42: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 42

Prototype v2.0 - link newspaper

Page 43: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 43

Prototype v2.0 - newspaper

Page 44: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 44

Prototype v2.0 - link radio

Page 45: Building the PoliMedia search system; data- and user-driven

eHumanities group - PoliMedia 45

Conclusion

• PoliMedia; data- or user-driven?• Continuous interplay– Users gave input for usefulness of links– Data limits what features we can offer to users

• Collection quality and usability are both critical to users [3]

[3] Xie, I. (2006). Evaluation of digital libraries: Criteria and problems from users’ perspectives. Library & Information Science Research, 28(3), 433–452. doi:10.1016/j.lisr.2006.06.002