gildas illien & sébastien peyrard europeanatech 2015

Post on 14-Jul-2015

148 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

D A T A Q U A L I T Y I N T H E A G E O F L I N K E D D A T A

Trust me if you dare

http://gallica/ark:/12148/btv1b90519196

G I L D A S I L L I E N , B N F S É B A S T I E N P E Y R A R D , B N F

data.bnf.fr : one vision, three goals

Be reusable

Be visible

Be legible

https://www.flickr.com/photos/ramonbaile/2274662139

http://commons.wikimedia.org/wiki/File:Carte_M%C3%A9tro_de_Paris.jpg#mediaviewer/File:Carte_M%C3%A9tro_de_Paris.jpg

By humans

By machines

https://www.flickr.com/photos/bdesham/2432400623

2

DATA.BNF.FR: WHAT IT DOES

data.bnf.fr: a guided tour

4

DATA.BNF.FR: HOW IT WORKS

Any black Magic?

1xx(creator of)

0070(author)

WORK

PUBLICATION

PERSON

INTERMARC

INTERMARC

INTERMARC

FRBNF11896956

FRBNF11967514

FRBNF37465618

Any black Magic?

dc:creator

WORK

PUBLICATION

PERSON

http://catalogue.bnf.fr/ark:/12148/cb118969563

http://catalogue.bnf.fr/ark:/12148/cb11967514v

http://catalogue.bnf.fr/ark:/12148/cb374656186

dc:creator

RDF triples

RDF triples

RDF triples rdarelationships:workManifested

S T R U C T U R E D D A T A

V I N T A G E L I N K E D D A T A

S I N C E 1 9 8 7

T R U S T E D I D E N T I F I E R S

The true magic behind this is:

C A T A L O G E R S A N D C A T A L O G U E Q A

T E A M :

- I N T E L L I G E N T D A T A

- C O N S I S T E N C Y

- L I N K C U R A T I O N

https://www.flickr.com/photos/bohman/4394901689

Work-manifestation links: machine calculated

dc:creator

WORK

PUBLICATION

PERSON

http://catalogue.bnf.fr/ark:/12148/cb118969563

http://catalogue.bnf.fr/ark:/12148/cb11967514v

http://catalogue.bnf.fr/ark:/12148/cb374656186

dc:creator

RDF triples

RDF triples

RDF triples rdarelationships:workManifested

DATA.BNF.FR : RETURN ON INVESTMENTS?

Why not give the data back to the source catalogue?

MARC catalogue

data.bnf.fr

structured data

enriched data

How to make it happen?

Start easy

evolution vs. revolution

one challenge at a time

Speak the language of the catalogue

channel the skills and tools of the QA experts

leverage the organization around the catalogue

What happened… and what will happen

Tests: improve the algorithm

discussion with experts

injection whenever one decision

tolerance of a certain levelof error

Use the algorithm to suggest

suggest multiple candidates to an expert when undecicable

Next steps: new algorithms

Aggregates (one manifestation, many works)

Create new works out of clusters

Deduplication of sparse records

M I T T L E R Z W I S C H E N H I R N U N D H Ä N D E N M U S S D A S H E R Z S E I N

(temporary)conclusion

to be continued…

top related