keynote exploring and exploiting official publications

PoliticalMashup 1

PoliticalMashupOpen Official Documents Requirements and

Opportunities

Maarten Marx

Universiteit van Amsterdam

Istanbul EEOP (LREC) 2012-05-27

PoliticalMashup 2

Content

bull Official Documents Zoom in on a specific official publications

dataset

bull Opportunities What makes official publications data valuable

bull Requirements What is needed to make official publications data

reusable and interoperable

PoliticalMashup 3

Our Leading Research Question

What is the best data format for publishing both legacy and current

parliamentary proceedings in a digital sustainable manner [Marx et

al 2010]

PoliticalMashup 4

W3C recommendations on Open Government Data

bull make data both machine and human readable

bull link data make data linkable provide permanent identifiers for

each government object and data item

bull provide metadata using common standards (eg Dublin Core)

bull make the data as easy to reuse (eg in mashups) as possible

Goal of this talk make this concrete

PoliticalMashup 5

Value of a large data corpus

bull Consider a 200 year corpus of temperature and humidity readings

in one location

bull Value is not in the individual ldquodocumentsrdquo

bull Value is not in the corpus as a whole

bull Value is in the relation between the ldquodocumentsrdquo

PoliticalMashup 6

Documents related by publication date

Google books Ngram viewer

PoliticalMashup 7

Properties of our Parliamentary ProceedingsDataset

PoliticalMashup 8

Longitudinal data

bull weakly measurement for over 150 years

bull very stable measurement procedure and data model

PoliticalMashup 9

Data about human behaviour

PoliticalMashup 10

Often rather boring

PoliticalMashup 11

But sometimes full of drama and excitement

PoliticalMashup 12

Loads of measurement points

24000 days 450000 topics 75 miljoen speeches

PoliticalMashup 13

Digitally available

PoliticalMashup 14

About this collection

bull very sparse available metadata

bull very rich ldquometadatardquo sits hidden inside the raw data

bull Rich data model

bull Meeting (1 Day)

bull Topic

bull Stage direction

bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15

Very rich metadata for each word

For every word spoken in parliament the following facts are known

at the time of the speech act and can often be extracted from the

written proceedings

1) when it was said

2) who said it

3) in what function

4) speaking on behalf of which party

5) in which context and

6) who was actively present during the speech act

PoliticalMashup 16

How to exploit the extra metadata and structure

bull Letrsquos consider a simple killer app

PoliticalMashup 17

Political n-gram viewer

bull From every word we know both the date and the speaker

bull Every speaker belongs to a political party

bull 3D n-gram viewer political spectrum vs time vs word-count

bull Use topic ownership agenda setting framing

PoliticalMashup 18

Political n-gram viewer requirements

documents

1 metadata date of the meeting

2 document structure for every spoken word who said it

Linked Data Speakers names are disambiguated normalized and

mapped to a database with temporal party information

Completeness and correctness Few missing or wrong data also for

long time ago

PoliticalMashup 19

Is Linked (Open) Data the solution

bull Link speakers name to WikipediaDBpedia page (named entity

disambiguation and resolution) See also Google Knowledge

Graph and [Spitkovsky Chang LREC 2012]

bull DBpedia extracts link between person and party affiliation from

Wikipedia infobox

bull Timestamped triple

Geert Wilders is partymember of VVD

from 1998-08-25 until 2004-09-02

PoliticalMashup 20

DBpedia not yet reliable

bull Data extraction is difficult even from the infobox even from

complete data

Wikipedia page of Geert Wilders

DBpedia information about Geert Wilders

Notice the values of the party and the office attributes

Timestamped facts are difficult to extract and difficult to

represent in RDF triples

PoliticalMashup 21

Lesson learned requirement on metadata andrelations

bull One cannot rely on Linked Open Data for good quality metadata

bull Official documents should be self-describing also for facts which

are obvious at publication time

bull Compare speakerrsquos data in original (OCRed) data and XMLified

and enriched version

bull Original

bull Part of it in XML

bull And now for human consumption

PoliticalMashup 22

A few more applications

PoliticalMashup 23

Entity Profiling and Entity Search

bull Users search for entities not for documents [TREC Entity Track]

[Balog et al 2009]

bull Main research questions

How to collect information on entities

how to model an entity

how to rank entities

bull (Parsimonious) language models work well as models [Balog et

al 2009][Hiemstra et al 2004]

bull Entity profiling httpwwwpolitiekinzichtcom

bull Entity search httpikkieswijzernl

PoliticalMashup 24

Content and structure search

bull Usual advanced search combines keyword search with metadata

search

bull Extra fields are just extra filters on the returned documents

bull With structured documents we can do search on content and

structure

bull Most useful task rank best entry points in large documents

bull Compare two search systems on the same data

on flat text

on an XML representation

PoliticalMashup 25

Lesson learned requirement on structure

bull Make semantically important structure of documents explicit in

XML markup

bull Publish for machine readability

bull Publish generic data not data prepared for one use-case

PoliticalMashup 26

Application of structure Interruption graph(Attackogram)

bull MP A interrupts B lArrrArr A speaks during the block of B

combined with entity profiling

httpdebatpolitiekinzichtcom

PoliticalMashup 27

Exploring and exploiting official documents

bull We saw what can be done with one well-curated collection

bull What are the key infrastructural and research questions

In what direction and how to scale this up

1 in time

2 in breadth

3 in links

PoliticalMashup 28

Scale diachronically

bull Stable data model and measurement procedure make this data

very valuable for diachronic comparisons

bull towards the past

bull OCR

bull consistency in structure

bull more missing data to link to

bull towards the future

bull remain up to date

bull legacy decisions

PoliticalMashup 29

Scale in breadth eg parlproceedings of allEuropean countries

bull All describe the same ldquoscriptrdquo so all fit in one schema

bull Main question how to connect the data from different countries

Common structure and annotation use the same Relax NG

schema

Common values on certain attributesbull Entities Normalize to Wikipedia concepts

bull Controlled vocabulary keywords Normalize to Eurovoc

bull Language Machine translate to English

bull Events Normalize to EMM Newsexplorer query Wikinews

query

PoliticalMashup 30

Scale in breadth link to related datasets

bull Link on time entities events topics

bull Other official publications

bull News

bull User generated content

bull (In our case) promisses of political actors election manifestos

PoliticalMashup 31

Conclusions

bull There are ample opportunities for exploiting Official Publications

bull Preprocessing and interlinking with other datasets is difficult and

does not scale well

bull High precision and recall is needed for many applications

bull Many text analysis and data-mapping tasks [MUC TAC]

bull Every format needs an own transformer

bull Linked Open Data knowledge bases are not (yet) good enough

create special purpose knowledge extractors

bull High investment but if done in a general way high return and

impact

PoliticalMashup 32

Back to our research question


parliamentary proceedings in a digital sustainable manner

Lessons learned

bull Common open standardized self-describing machine readable

bull not tied to a single application

bull linked linked linked

bull Not only shared attributes

bull but more importantly shared data values

bull also store utterly obvious facts (10 years later they arenrsquot)

PoliticalMashup 33

How we can help (ourselves)

Help improve input data at the source

bull Push at the source (in UK open government data in Holland all

parliamentary data is now in XML )

bull Help reduce dumb cut-and-paste annotation work so annotators

can concentrate on tasks which are hard for machines (eg

text-classification)

bull Emphasize importance of using shared standards

Future researchers will love you

PoliticalMashup 34

Last Question

Official Publications are they

or

1 Open Official Documents Requirements and Opportunities

2 Content

3 Our Leading Research Question

4 W3C recommendations on Open Government Data

5 Value of a large data corpus

6 Documents related by publication date

7 Properties of our Parliamentary Proceedings Dataset

8 Longitudinal data

9 Data about human behaviour

10 Often rather boring

11 But sometimes full of drama and excitement

12 Loads of measurement points

13 Digitally available

14 About this collection

15 Very rich metadata for each word

16 How to exploit the extra metadata and structure

17 Political n-gram viewer

18 Political n-gram viewer requirements

19 Is Linked (Open) Data the solution

20 DBpedia not yet reliable

21 Lesson learned requirement on metadata and relations

22 A few more applications

23 Entity Profiling and Entity Search

24 Content and structure search

25 Lesson learned requirement on structure

26 Application of structure Interruption graph (Attackogram)

27 Exploring and exploiting official documents

28 Scale diachronically

29 Scale in breadth eg parlproceedings of all European countries

30 Scale in breadth link to related datasets

31 Conclusions

32 Back to our research question

33 How we can help (ourselves)

34 Last Question

PoliticalMashup 2

Content

bull Official Documents Zoom in on a specific official publications

dataset

bull Opportunities What makes official publications data valuable

bull Requirements What is needed to make official publications data

reusable and interoperable

PoliticalMashup 3




al 2010]

PoliticalMashup 4








PoliticalMashup 5



in one location




PoliticalMashup 6



PoliticalMashup 7


PoliticalMashup 8

Longitudinal data



PoliticalMashup 9


PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 3




al 2010]

PoliticalMashup 4








PoliticalMashup 5



in one location




PoliticalMashup 6



PoliticalMashup 7


PoliticalMashup 8

Longitudinal data



PoliticalMashup 9


PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 4








PoliticalMashup 5



in one location




PoliticalMashup 6



PoliticalMashup 7


PoliticalMashup 8

Longitudinal data



PoliticalMashup 9


PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 5



in one location




PoliticalMashup 6



PoliticalMashup 7


PoliticalMashup 8

Longitudinal data



PoliticalMashup 9


PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 6



PoliticalMashup 7


PoliticalMashup 8

Longitudinal data



PoliticalMashup 9


PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 7


PoliticalMashup 8

Longitudinal data



PoliticalMashup 9


PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 8

Longitudinal data



PoliticalMashup 9


PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 9


PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 10

Often rather boring

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 11


PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 12



PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 13

Digitally available

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 14






bull Topic


bull Scene


bull Speech

bull Paragraph

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 15




written proceedings

1) when it was said

2) who said it

3) in what function




PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 16



PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 17






PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 18


documents






long time ago

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 19






Wikipedia infobox



from 1998-08-25 until 2004-09-02

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 20



complete data






PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 21







bull Original



PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 22


PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 23



[Balog et al 2009]









PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 24



search



structure



on flat text


PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 25



XML markup



PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 26





PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 27





1 in time

2 in breadth

3 in links

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 28





bull OCR






PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 29





schema





query

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 30




bull News



PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 31

Conclusions



does not scale well







impact

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 32




Lessons learned







PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 33










PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

PoliticalMashup 34

Last Question


or


2 Content






8 Longitudinal data























31 Conclusions



34 Last Question

keynote exploring and exploiting official publications

Education

speakers data

data item

data linkable

wrong data

fromcomplete data

linked open data

open government data

large data corpus