linked data for czech legislation

40
Linked Data for Czech Legislation Martin Nečaský, Ph.D. [email protected] Matematicko-fyzikální fakulta Univerzity Karlovy http://www.xrg.cz http://www.opendata.cz

Upload: martin-necasky

Post on 22-Nov-2014

4.622 views

Category:

Technology


0 download

DESCRIPTION

The slides show what is linked data and how we experiment with linked data in the area of legislative documents (in Czech Republic). Download the slides for detailed embedded comments.

TRANSCRIPT

Page 1: Linked Data for Czech Legislation

Linked Data for Czech Legislation

Martin Nečaský, [email protected]

Matematicko-fyzikální fakulta Univerzity Karlovyhttp://www.xrg.cz

http://www.opendata.cz

Page 2: Linked Data for Czech Legislation

Our projects in Nutshell The goal of our effort is to enable intelligent browsing and

querying a set of semi-structured documents from some domain. legislative documents project documentation medical documentation basic prerequisite – documents have some common characteristics

The project consists of following steps extract useful structured data from semi-structured documents with

NLP techniques transform extracted data to Linked Data so that the data can be easily

(= quickly and cheaply) interconnected with other related data and with the original documents

provide tools for browsing and querying the created data + documents space

Page 3: Linked Data for Czech Legislation

Rolesufal.mff.cuni.cz ksi.mff.cuni.cz

Page 4: Linked Data for Czech Legislation

Outline

What is Linked Data? current Web publishing data on Web Linked Data principles

Linked Data for legislative documents basic ideas what we have done and what we want to do sample data and queries

Page 5: Linked Data for Czech Legislation

What is Linked Data?

Page 6: Linked Data for Czech Legislation

Web of Documents

Current Web (of Documents) provides lot of data about Prague. Problems• Data about Prague is encoded in

documents distributed across the Web• Documents are intended for humans not

for computers• Documents about Prague or related things

are not linked

Computers are not able to process data about Prague published on the Web http://monitor.statnipokladna.cz

Prague budget

http://registry.czso.cz

Basic info about Prague

http://www.praha.eu

Prague public contracts

http://www.czso.cz

Demography of Prague

http://www.risy.cz

EU funded projects in Prague

Page 7: Linked Data for Czech Legislation

Web of Documents

Try to search for this information on the current Web• Top 100 suppliers of Prague with

headquarters outside of Prague region.• Money spent in Prague for new public

playgrounds in the last 5 years per one child.

• Public playgrounds in Prague funded by EU.

http://monitor.statnipokladna.cz

Prague budget

http://registry.czso.cz

Basic info about Prague

http://www.praha.eu

Prague public contracts

http://www.czso.cz

Demography of Prague

http://www.risy.cz

EU funded projects in Prague

Page 8: Linked Data for Czech Legislation

Architecture of Web of Documents

Unified global space of documents

Built on top of several simple principles:

1. HTML as a format for publishing documents

2. URLs as unique global identifiers of documents

3. HTTP for localization and accessing documents by their URLs

4. hyperlinks between documents

There are two kinds of applications working in this space of documents:• web browsers (localizing and

browsing documents through hyperlinks)

• search engines (indexing and full text searching of documents)

Database A

HTML

Database B

HTML

Database D

HTML

Database C

HTML

Web browser

Search engine

HTTP

HTTP

Page 9: Linked Data for Czech Legislation

What about publishing data? The next step should be publishing data instead of

documents Raw (open) data about things published on the Web which

can be processed by machines (applications, domain-specific search engines)

See public administration efforts in the area of publishing open data:

• http://data.gov.uk• http://1.usa.gov/193lKN6

We can publish data on the current Web! basic way: data files with their own URLs in different formats

(CSV, XLS, DBF, XML, etc.) advanced way: Application Programming Interfaces (APIs)

Page 10: Linked Data for Czech Legislation

Web can publish data! APIs

Different APIs provide machine readable data for further processing in so called mash-up applications.

Also built on several simple principles:• XML/JSON as formats for publishing

data• HTTP URIs as global unique

identifiers of APIs and their operations

• HTTP protocol for transferring data between APIs and applications

Database A

Database B

Database D

Database C

Mash-up App

Mash-up App

HTTP

Proprietary Data API A

HTTP

HTTP HTTP

Proprietary Data API C

Proprietary Data API D

Proprietary Data API B

Page 11: Linked Data for Czech Legislation

Current principles and technologies do not lead to Web of Data! publishing data about things not based on the principles

which have already been invented for documents

Problems with data on current Web

Web of Documents Current Web IS NOT Web of Data!

HTML as a format for publishing documents many formats for publishing data (XML, JSON, CSV, XLS, ...)

URLs as unique global identifiers of documents

no unique global identifiers of things

HTTP for localization and accessing documents by their URLs

HTTP for localization of APIs and accessing them (REST) [but not for localization of things and accessing their data]

hyperlinks between documents none of current formats enables to link related things

Page 12: Linked Data for Czech Legislation

Linked Data

data published on the Web according to 4 simple principles (introduced by sir T. B. Lee)1. Use URIs as names for things2. Use HTTP URIs so that people can look up those

names.3. When someone looks up a URI, provide useful

information, using the standards (RDF, SPARQL)4. Include links to other URIs so that they can

discover more things.

Page 13: Linked Data for Czech Legislation

Linked Data vs. Documents

Web of Documents Linked Data = Web of Data!

HTML as a format for publishing documents RDF as a format for publishing data about things

URLs as unique global identifiers of documents

HTTP URIs (URLs) as unique global identifiers of things

HTTP for localization and accessing documents by their URLs

HTTP for localization and accessing things by their HTTP URIs

hyperlinks between documents links between related entities

Page 14: Linked Data for Czech Legislation

Things as first-class citizens

Public contract OSM/MZ/044/09

City of Prague

Prague council Prague budget

Prague demography

EU funded projectCZ.2.16/2.1.00/22189

Public contract MAN/23/07/007316/2010Public contract

DIL/23/07/007302/2010

Page 15: Linked Data for Czech Legislation

HTTP URIs for Things

czso.cz (Czech Statistical Office)

http://registry.cszo.cz/

prague

http://www.czso.cz/prague

http://www.czso.cz/prague/stats/demog

mfcr.cz (Ministry of Finance of CZ)

http://www.mfcr.cz/prague

http://www.mfcr.cz/prague/budget

praha.eu (Prague)

http://www.praha.eu/

contract/007316

http://www.praha.eu/city

http://www.praha.eu/council

http://www.praha.eu/

contract/006870

http://www.praha.eu/

contract/007302

risy.cz (Regional Information Service in CZ)

http://www.risy.cz/location/prague

http://www.risy.cz/project/412457

http://www.risy.cz/contract/007302

Page 16: Linked Data for Czech Legislation

Data about Things in RDF

Client

HTTP REQUEST

Playground RevitalizationAuthority: PragueDelivery date: 31.8.2011Price: 28 444 000 CZK...

http://www.praha.eu/

contract/007302

http://www.praha.eu/

contract/007302

Playground Revitalization

http://www.praha.eu/contract/007302/price

28444000 CZK

dcterms:title

pc:contractingAuthority

pc:agreedPrice

gr:hasCurrencygr:hasCurrencyValue

31.8.2011

pc:estimatedEndDate

http://www.praha.eu/council

Page 17: Linked Data for Czech Legislation

Data about Things in RDF

Client

HTTP REQUEST

Playground RevitalizationSupplier: PKS INPOSDelivery date: 31.8.2011Price: 28 444 000 CZK...

http://www.praha.eu/

contract/007302

<http://www.praha.eu/contract/007302> rdf:type pc:Contract ;dcterms:title "Playground Revitalization" ;pc:estimatedEndDate "31.8.2011" ;pc:agreedPrice <http://www.praha.eu/contract/007302/price> ;pc:contractingAuthority <http://www.praha.eu/council> .

<http://www.praha.eu/contract/007302/price>rdf:type gr:PriceSpecification ;gr:hasCurrency "CZK" ;gr:hasCurrencyValue "28444000" .

Page 18: Linked Data for Czech Legislation

Vocabularies published RDF data would be hardly interpretable when

each publisher would use proprietary types types of properties (= predicates) and types of things (=

classes) therefore, standardized (or at least widely used)

predicates should have priority before proprietary ones e.g. Dublin Core, Good Relations, FOAF, schema.org, ...

predicates are defined in so called vocabularies (or ontologies) note: ontology is a special case of vocabulary, it contains more

detailed reasoning rules which is out of scope of this lecture

Page 19: Linked Data for Czech Legislation

Vocabularies classes and predicates semantic relationships between classes and predicates in one

vocabulary or more different vocabularies subtyping (sub-class of, sub-property of) semantic equivalence (equivalent class, equivalent property) – when two

different vocabularies define classes/properties with the same semantics vocabularies expressed in RDF using RDF Schema, OWL vocabularies each class and predicate has own HTTP URI

mechanism of XML namespaces and prefixes is usually used class URI is used to denote the type of a thing:

<http://www.praha.eu/contract/007302> rdf:type pc:Contract .

predicate URI is used to denote the predicate in a triple:

<http://www.praha.eu/contract/007302> dcterms:title "..." .

Page 20: Linked Data for Czech Legislation

Linking URIs of Related Things

risy.cz (Regional Information Service in CZ)

http://www.risy.cz/project/412457

czso.cz (Czech Statistical Office)

http://www.czso.cz/prague/stats/demog

mfcr.cz (Ministry of Finance of CZ)

http://www.mfcr.cz/prague/budget

praha.eu (Prague)

http://www.praha.eu/

contract/007316

http://www.praha.eu/city

http://www.praha.eu/council

http://www.praha.eu/

contract/006870

http://www.praha.eu/

contract/007302

n1:budget

n2:demographyn3:beneficiary

n3:realizedBy

http://registry.cszo.cz/

prague

http://www.czso.cz/prague

http://www.mfcr.cz/prague

http://www.risy.cz/location/prague

http://www.risy.cz/contract/007302

Page 21: Linked Data for Czech Legislation

Linking URIs of Same Things

czso.cz (Czech Statistical Office)

http://registry.cszo.cz/

prague

http://www.czso.cz/prague

mfcr.cz (Ministry of Finance of CZ)

http://www.mfcr.cz/prague

praha.eu (Prague)

http://www.praha.eu/city

http://www.praha.eu/council

http://www.praha.eu/

contract/007302

risy.cz (Regional Information Service in CZ)

http://www.risy.cz/contract/007302

owl:sameAs

owl:sameAs

http://www.risy.cz/project/412457

http://www.czso.cz/prague/stats/demog

http://www.mfcr.cz/prague/budget

http://www.risy.cz/location/prague

Page 22: Linked Data for Czech Legislation

Related vs. Same Things Situation: Publisher A publishes some data about a thing T

under URI U

you want to publish something new about T create your own URI V for T, publish new data under V and link V to U with owl:sameAs

you want to say that your things are related to T but you do not publish anything new for T do not create own HTTP URI for T and do not copy data about T from A, only link your things to U

You AV

... ...

...U

...

...

...

...

You A

...U

...

...

...

...

Page 23: Linked Data for Czech Legislation

Primary Data vs. Secondary Data

czso.cz (Czech Statistical Office)

http://registry.cszo.cz/

prague

http://www.czso.cz/prague

http://www.czso.cz/prague/stats/demog

mfcr.cz (Ministry of Finance of CZ)

http://www.mfcr.cz/prague

http://www.mfcr.cz/prague/budget

praha.eu (Prague)

http://www.praha.eu/

contract/007316

http://www.praha.eu/city

http://www.praha.eu/council

http://www.praha.eu/

contract/006870

http://www.praha.eu/

contract/007302

risy.cz (Regional Information Service in CZ)

http://www.risy.cz/location/prague

http://www.risy.cz/project/412457

http://www.risy.cz/contract/007302

Page 24: Linked Data for Czech Legislation

Linked Data for (Czech) Legislation

Page 25: Linked Data for Czech Legislation

Linked Data in Czech Legislation

Acts and Regulations

Court Decisions

Public authorities

Agendas of Public

Authorities

Rights and obligations

Life situations

defin

ede

term

ine

regulate

concern

concern

supports

execute

Acts and Regulations Proposals

results from

Page 26: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

structural parts of acts and regulations references between

court decisions and parts of acts and regulations court decisions (legal case retrospection)

amendments what we have done

vocabulary of legislative documents metadata and structure of acts, regulations and decrees

represented as Linked Data• metadata about each version of each act, regulation and decree since

1945• structured content of versions of all acts, regulations and decrees valid in

2011, 2012 extraction of references and retrospection NLP

Page 27: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

Public Contracts Act

Public Contracts Act Version

07/2006Public Contracts

Act Version 07/2012

Public Contracts Act Version

01/2015

version of

vers

ion

of

version of

original

actual

last

Public Contracts Act Version

06/2007

versi

on of

Similarly, we represent paragraphs, sections, etc. of each version of each law. However, we have a problem to get consolidated documents.

DECISIONXYZ

DECISIONABCre

fers

refe

rs

Page 28: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

CASE6 C 135/2007

DECISION6 C 135/2007-44

CASE21 Co 472/2008

DECISION21 Co 472/2008-62

made for

base

d on

appe

al ag

ainst

made foroverturns

DECISION6 C 135/2007-141

mad

e fo

r

CASE21 Co 458/2011

based on appeal against

DECISION21 Co 458/2011-173

mad

e fo

r

affirm

s

CASE26 Cdo 2523/2012

based on extraordinary appeal against

DECISION26 Cdo 2523/2012

made foraffirms

Acts

Other Decisions

refers

Metropolitan Court in Prague

District Court Prague 9

Supreme Court

files

renders

files

files

Page 29: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

browsing data http

://linked.opendata.cz/resource/legislation/cz/act/2006/137-2006

• instance of lex:Act representing Public Procurement Act

Page 30: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

Page 31: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

querying data (SPARQL) http://linked.opendata.cz/sparql

Page 32: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

Which acts amended the Act about political parties of Czech Republic?

PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?amendmentTitle ?amendmentValidityWHERE { ?version frbr:realizationOf <http://linked.opendata.cz/resource/legislation/cz/act/1991/424-1991> .

?change lex:changedOriginal ?version .

?amendment lex:definesChange ?change ; dcterms:title ?amendmentTitle ; dcterms:valid ?amendmentValidity . }

Page 33: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

One of well-known hidden amendments. It increased the payments of state to political parties from 500k to 900k for one parliament member.

Page 34: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

Which another acts were amended together with Act about political parties of Czech Republic?

PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?anotherActTitle ?anotherVersionValidityWHERE { ?version frbr:realizationOf <http://linked.opendata.cz/resource/legislation/cz/act/1991/424-1991> .

?change lex:changedOriginal ?version .

?amendment lex:definesChange ?change ; lex:definesChange ?anotherChange . FILTER (?change != ?anotherChange)

?anotherChange lex:changeResult ?anotherVersion .

?anotherVersion frbr:realizationOf ?anotherAct ; dcterms:valid ?anotherVersionValidity .

?anotherAct dcterms:title ?anotherActTitle . }

Page 35: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

Page 36: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

How many changes have been done in Czech legislation per year?

PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT (COUNT(?amendment) as ?changeCnt) (year(?validity) AS ?year)WHERE { ?amendment lex:definesChange ?change ; dcterms:valid ?validity .}GROUP BY year(?validity)ORDER BY DESC(year(?validity))

Page 37: Linked Data for Czech Legislation

Structural Layer of Legislative Documents

Page 38: Linked Data for Czech Legislation

Semantic Layer of Legislative Documents

rights, obligations and subjects defined by legislation

their occurrences in court decisions currently we start experiments with extracting

these concepts and relationships between them from documents with acts NLP based on syntactic parsing

we do not have RDF representation yet

Page 39: Linked Data for Czech Legislation

Semantic Layer of Legislative Documents

Page 40: Linked Data for Czech Legislation

Semantic Layer of Legislative Documents