building the polimedia search system; data- and user-driven

Post on 29-Jun-2015

444 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation at eHumanities group at Meerten's Institute (Amsterdam) on Thursday 18 April 2013. Analysing media coverage across several types of media-outlets is a challenging task for (media) historians. A specific example of media coverage research investigates the coverage of political debates and how the representation of topics and people change over time. The PoliMedia project (http://www.polimedia.nl) aims to showcase the potential of cross-media analysis for research in the humanities, by 1) curating automatically detected semantic links between four data sets of different media types, and 2) developing a demonstrator application that allows researchers to deploy such an interlinked collection for quantitative and qualitative analysis of media coverage of debates in the Dutch parliament. These two goals reflect the two perspectives on the development of a search system such as PoliMedia; data- and user-driven. In this presentation, Laura Hollink (VU) will present the data-driven perspective of linking between different datasets and the research questions that arise in achieving this linkage: how to combine different types of datasets and what kind of research questions are made possible by the data? Max Kemman (EUR) will present the user-driven perspective: which benefits can scholars have from linking of these datasets? What are the user requirements for the PoliMedia search system and how was the system evaluated with scholars in an eye tracking study?

TRANSCRIPT

www.polimedia.nl

Building the PoliMedia system; data- and user-driven

eHumanities group - PoliMedia 2

Who are we?Laura Hollink• Assistant professor at VU• Modeling, linking and enrichment

of data• Data-driven research• @laurahollink

Max Kemman• Junior researcher at EUR• Human-Computer Interaction• User-driven research• @MaxJ_K

PoliMedia teamHenri Beunders (EUR)Jaap Blom (NISV)Laura Hollink (VU)Geert-Jan Houben (TU Delft)

Funded by CLARIN-NL

Damir Juric (TU Delft)Max Kemman (EUR)Martijn Kleppe (EUR) Johan Oomen (NISV)

eHumanities group - PoliMedia 3

Linking Politics to Media

eHumanities group - PoliMedia 4

The research questions• How is a person, subject or process covered & visualised by the media?• How do debates and arguments develop over a longer period of time?• Analysing the changing ideas, arguments and presentation in different

media

eHumanities group - PoliMedia 5

Issues with current approach

eHumanities group - PoliMedia 6

Issues with current approach

eHumanities group - PoliMedia 7

Goal: explicit links to different media types in one system

eHumanities group - PoliMedia 8

PoliMedia systemPoliMedia

Portal

- Browse: debate and date

- Search: debate and person

NewspapersKB

TelevisionSound and Vision

RadioKB

Staten Generaal Digitaal

KB

Data-driven (Laura) & user-driven (Max)

eHumanities group - PoliMedia 9

Data

Debate dataHandelingen der Staten-General or Dutch Hansard from 1945-1995

Some provenance:1. Transcripts are made of the complete debates of the Dutch

parliament.2. Published online by the government on

http://www.statengeneraaldigitaal.nl/ (1818 1995) and http://officielebekendmakingen.nl/ (from 1995)

3. PoliticalMashup project has translated government pdf and txt files into XML, incl URI’s as identifiers, see http://politicalmashup.nl/

4. We build on that.

eHumanities group - PoliMedia 11

Structure of the debate data

Including:• who, when, what • identifiers for subparts o

f the debate• chronological order of

speakers

Media data• Newspaper articles

– at the National Library of the Netherlands

– Many newspapers 1950- 1995– Text + images of newspaper

layout

• Radio bulletins

– Transcripts of ANP news

• Newscasts

– in the Academia collection of the Netherlands institute for Sound and Vision

Semantic model

Semantic model

Reuse of vocabularies:

Simple Event Model (SEM), Dublin Core, FOAF, links to ISOCAT data categories.

15

Linked Data

eHumanities group - PoliMedia

• Data openly accessible in a semantic Web standard• Easy to combine with other semantic Web data• E.g. DBpedia data on politicians and parties.

eHumanities group - PoliMedia 16

Linking Debates to Newspaper articles that cover them

• Challenges:– How to link documents that are so different in

nature?– Can we use the structure of the debates: people,

chronologic order of speeches, introductions to each new topic, etc.

– How can we do this efficiently, using the access mechanisms of the archives?

eHumanities group - PoliMedia 17

Linking approach

Detect topicsThe MALLET topic model package• Unsupervised analysis of text• “a Topic consists of a cluster of words that frequently occur together”• [see http://mallet.cs.umass.edu/topics.php]• Input:

– Text– Number of iterations– Number of topics/clusters

• Output:– Words that cluster around one topic.

• Example:– Text: a speech in a debate from 1975– number of iterations: 2000– number of topics: 1

eHumanities group - PoliMedia 19

Create Queries

eHumanities group - PoliMedia 20

Evaluation

• Experiment 1: NEs in speech• Experiment 2: NEs + topics in speech• Experiment 3: NEs + topics in speech and debate

eHumanities group - PoliMedia 21

Results

• A linked open data set of Dutch parliamentary debates.

• With links to URL’s of news paper articles and radio bulletins at the Royal Library.

• A system that supports researchers in finding the data to answer their questions.

eHumanities group - PoliMedia 22

User-driven What do scholars want?

• Why user research?• Understanding the user [1, 2]

– Acceptance– Performance– Capabilities– Weaknesses

• Goal– Creating a system that is intuitive and helpful to the users

[1] Y. Liu, A. Osvalder, and M. Karlsson, “Considering the importance of user profiles in interface design,” no. May, 2010[2] J. Preece, Y. Rogers, and H. Sharp, “Interaction Design: Beyond Human-Computer Interaction,” Design, vol. 18, no. 1, pp. 68-68, 2002

eHumanities group - PoliMedia 23

User research in the development process

• Examine search behaviour of users– Survey regarding search strategies– Interviews

• User wishes → user requirements• Wireframes → Prototype• Evaluation →New prioritization of remaining

user requirements• Final version

24

SurveyGeneral search strategies

• N=294• Popular search engines

Very often

Often

Regularly

Sometimes

Never

Don’t know it Goo

gle

Goo

gle

Imag

esG

oogl

e Sc

hola

rYo

uTub

eJS

TOR

KB Flic

krEB

SCO

Nati

onaa

l Arc

hief

Web

of K

now

ledg

e

Uitz

endi

ng G

emis

t

Yaho

o!Bi

ngAc

adem

ia.n

lEu

rope

ana

Scop

usM

icro

soft

Aca

dem

ic S

earc

hEU

scre

enAr

kyve

s

eHumanities group - PoliMedia 25

SurveyGeneral search strategies

1. Keywords 4,752. Advanced search 3,363. Related terms 2,524. Boolean 2,425. Browsing subject

categories 2,296. Filters 2,197. Thesaurus 1,878. Visualization 1,22

eHumanities group - PoliMedia 26

SurveyConclusions

• Google is the dominant search engine• This has two consequences

1. People compare other search systems to their experience with Google

2. The search task is mainly performed by using keywords

eHumanities group - PoliMedia 27

Interviews

• N=5• Quantitative (n=2) as well as qualitative (n=4)• Main themes

– How do people search currently?– What could be improved about current search systems?– What should PoliMedia offer, given its goals?

• Results– 39 user wishes– Prioritized internally

• 19 user wishes deemed out of scope• 20 user requirements

eHumanities group - PoliMedia 28

Interviews Findings

• Key issue is to provide a good overview of data – Why are search results retrieved– How are search results ranked

• Assumptions of relevance– Higher frequency of keywords indicated higher relevancy to query?– Longer segments (speeches and articles) indicate higher

importance?• Many more or less out-of-scope wishes to make current

research easier– Sentiment-metadata– Context metadata– Ability to export to own software

eHumanities group - PoliMedia 29

• Clear and immediate keyword-search

• Support for Booleans and (some) Google-search operators

• Separate advanced-search

WireframesSearch interface

eHumanities group - PoliMedia 30

WireframesSearch results

• Keyword search remains prominent

• User chosen ranking of results

• Keyword highlighting

• Overview of related media

• Support for filtering

eHumanities group - PoliMedia 31

WireframesDebate page

• Keyword search remains prominent

• Overview of people in debate

• Easy access to related material

eHumanities group - PoliMedia 32

Prototype v1.0

eHumanities group - PoliMedia 33

Evaluation

• Eye tracking evaluation of the search system– Search system was still in development

• N=24– History– Political communication

• Goals– Gain understanding of distribution of attention– Collect general feedback on interface

eHumanities group - PoliMedia 34

Evaluation Eye tracking

• Viewing Duration

• Search bar received little attention after search results were displayed

• Facets received a lot of attention• Page-search (CTRL+F) mainly received

attention on debate page view

Tasks Search bar Facets Search results Page-search

Known Item 17% 22% 60% 2%

Exploratory 6% 12% 80% 2%

eHumanities group - PoliMedia 35

Evaluation Usability feedback

• The ranking of search results was an issue for users

• The year-filter should be a slider• The debate page should be greatly improved– Better identification for speaker, party, topic,

relevance to query– Provide filters on debate-page as well

eHumanities group - PoliMedia 36

Prototype v2.0

eHumanities group - PoliMedia 37

Prototype v2.0 - query

eHumanities group - PoliMedia 38

Prototype v2.0 – filter speaker

eHumanities group - PoliMedia 39

Prototype v2.0 - filter role

eHumanities group - PoliMedia 40

Prototype v2.0 - debate

eHumanities group - PoliMedia 41

Prototype v2.0 - highlight speech

eHumanities group - PoliMedia 42

Prototype v2.0 - link newspaper

eHumanities group - PoliMedia 43

Prototype v2.0 - newspaper

eHumanities group - PoliMedia 44

Prototype v2.0 - link radio

eHumanities group - PoliMedia 45

Conclusion

• PoliMedia; data- or user-driven?• Continuous interplay– Users gave input for usefulness of links– Data limits what features we can offer to users

• Collection quality and usability are both critical to users [3]

[3] Xie, I. (2006). Evaluation of digital libraries: Criteria and problems from users’ perspectives. Library & Information Science Research, 28(3), 433–452. doi:10.1016/j.lisr.2006.06.002

top related