linked data for czech legislation
DESCRIPTION
The slides show what is linked data and how we experiment with linked data in the area of legislative documents (in Czech Republic). Download the slides for detailed embedded comments.TRANSCRIPT
Linked Data for Czech Legislation
Martin Nečaský, [email protected]
Matematicko-fyzikální fakulta Univerzity Karlovyhttp://www.xrg.cz
http://www.opendata.cz
Our projects in Nutshell The goal of our effort is to enable intelligent browsing and
querying a set of semi-structured documents from some domain. legislative documents project documentation medical documentation basic prerequisite – documents have some common characteristics
The project consists of following steps extract useful structured data from semi-structured documents with
NLP techniques transform extracted data to Linked Data so that the data can be easily
(= quickly and cheaply) interconnected with other related data and with the original documents
provide tools for browsing and querying the created data + documents space
Rolesufal.mff.cuni.cz ksi.mff.cuni.cz
Outline
What is Linked Data? current Web publishing data on Web Linked Data principles
Linked Data for legislative documents basic ideas what we have done and what we want to do sample data and queries
What is Linked Data?
Web of Documents
Current Web (of Documents) provides lot of data about Prague. Problems• Data about Prague is encoded in
documents distributed across the Web• Documents are intended for humans not
for computers• Documents about Prague or related things
are not linked
Computers are not able to process data about Prague published on the Web http://monitor.statnipokladna.cz
Prague budget
http://registry.czso.cz
Basic info about Prague
http://www.praha.eu
Prague public contracts
http://www.czso.cz
Demography of Prague
http://www.risy.cz
EU funded projects in Prague
Web of Documents
Try to search for this information on the current Web• Top 100 suppliers of Prague with
headquarters outside of Prague region.• Money spent in Prague for new public
playgrounds in the last 5 years per one child.
• Public playgrounds in Prague funded by EU.
http://monitor.statnipokladna.cz
Prague budget
http://registry.czso.cz
Basic info about Prague
http://www.praha.eu
Prague public contracts
http://www.czso.cz
Demography of Prague
http://www.risy.cz
EU funded projects in Prague
Architecture of Web of Documents
Unified global space of documents
Built on top of several simple principles:
1. HTML as a format for publishing documents
2. URLs as unique global identifiers of documents
3. HTTP for localization and accessing documents by their URLs
4. hyperlinks between documents
There are two kinds of applications working in this space of documents:• web browsers (localizing and
browsing documents through hyperlinks)
• search engines (indexing and full text searching of documents)
Database A
HTML
Database B
HTML
Database D
HTML
Database C
HTML
Web browser
Search engine
HTTP
HTTP
What about publishing data? The next step should be publishing data instead of
documents Raw (open) data about things published on the Web which
can be processed by machines (applications, domain-specific search engines)
See public administration efforts in the area of publishing open data:
• http://data.gov.uk• http://1.usa.gov/193lKN6
We can publish data on the current Web! basic way: data files with their own URLs in different formats
(CSV, XLS, DBF, XML, etc.) advanced way: Application Programming Interfaces (APIs)
Web can publish data! APIs
Different APIs provide machine readable data for further processing in so called mash-up applications.
Also built on several simple principles:• XML/JSON as formats for publishing
data• HTTP URIs as global unique
identifiers of APIs and their operations
• HTTP protocol for transferring data between APIs and applications
Database A
Database B
Database D
Database C
Mash-up App
Mash-up App
HTTP
Proprietary Data API A
HTTP
HTTP HTTP
Proprietary Data API C
Proprietary Data API D
Proprietary Data API B
Current principles and technologies do not lead to Web of Data! publishing data about things not based on the principles
which have already been invented for documents
Problems with data on current Web
Web of Documents Current Web IS NOT Web of Data!
HTML as a format for publishing documents many formats for publishing data (XML, JSON, CSV, XLS, ...)
URLs as unique global identifiers of documents
no unique global identifiers of things
HTTP for localization and accessing documents by their URLs
HTTP for localization of APIs and accessing them (REST) [but not for localization of things and accessing their data]
hyperlinks between documents none of current formats enables to link related things
Linked Data
data published on the Web according to 4 simple principles (introduced by sir T. B. Lee)1. Use URIs as names for things2. Use HTTP URIs so that people can look up those
names.3. When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)4. Include links to other URIs so that they can
discover more things.
Linked Data vs. Documents
Web of Documents Linked Data = Web of Data!
HTML as a format for publishing documents RDF as a format for publishing data about things
URLs as unique global identifiers of documents
HTTP URIs (URLs) as unique global identifiers of things
HTTP for localization and accessing documents by their URLs
HTTP for localization and accessing things by their HTTP URIs
hyperlinks between documents links between related entities
Things as first-class citizens
Public contract OSM/MZ/044/09
City of Prague
Prague council Prague budget
Prague demography
EU funded projectCZ.2.16/2.1.00/22189
Public contract MAN/23/07/007316/2010Public contract
DIL/23/07/007302/2010
HTTP URIs for Things
czso.cz (Czech Statistical Office)
http://registry.cszo.cz/
prague
http://www.czso.cz/prague
http://www.czso.cz/prague/stats/demog
mfcr.cz (Ministry of Finance of CZ)
http://www.mfcr.cz/prague
http://www.mfcr.cz/prague/budget
praha.eu (Prague)
http://www.praha.eu/
contract/007316
http://www.praha.eu/city
http://www.praha.eu/council
http://www.praha.eu/
contract/006870
http://www.praha.eu/
contract/007302
risy.cz (Regional Information Service in CZ)
http://www.risy.cz/location/prague
http://www.risy.cz/project/412457
http://www.risy.cz/contract/007302
Data about Things in RDF
Client
HTTP REQUEST
Playground RevitalizationAuthority: PragueDelivery date: 31.8.2011Price: 28 444 000 CZK...
http://www.praha.eu/
contract/007302
http://www.praha.eu/
contract/007302
Playground Revitalization
http://www.praha.eu/contract/007302/price
28444000 CZK
dcterms:title
pc:contractingAuthority
pc:agreedPrice
gr:hasCurrencygr:hasCurrencyValue
31.8.2011
pc:estimatedEndDate
http://www.praha.eu/council
Data about Things in RDF
Client
HTTP REQUEST
Playground RevitalizationSupplier: PKS INPOSDelivery date: 31.8.2011Price: 28 444 000 CZK...
http://www.praha.eu/
contract/007302
<http://www.praha.eu/contract/007302> rdf:type pc:Contract ;dcterms:title "Playground Revitalization" ;pc:estimatedEndDate "31.8.2011" ;pc:agreedPrice <http://www.praha.eu/contract/007302/price> ;pc:contractingAuthority <http://www.praha.eu/council> .
<http://www.praha.eu/contract/007302/price>rdf:type gr:PriceSpecification ;gr:hasCurrency "CZK" ;gr:hasCurrencyValue "28444000" .
Vocabularies published RDF data would be hardly interpretable when
each publisher would use proprietary types types of properties (= predicates) and types of things (=
classes) therefore, standardized (or at least widely used)
predicates should have priority before proprietary ones e.g. Dublin Core, Good Relations, FOAF, schema.org, ...
predicates are defined in so called vocabularies (or ontologies) note: ontology is a special case of vocabulary, it contains more
detailed reasoning rules which is out of scope of this lecture
Vocabularies classes and predicates semantic relationships between classes and predicates in one
vocabulary or more different vocabularies subtyping (sub-class of, sub-property of) semantic equivalence (equivalent class, equivalent property) – when two
different vocabularies define classes/properties with the same semantics vocabularies expressed in RDF using RDF Schema, OWL vocabularies each class and predicate has own HTTP URI
mechanism of XML namespaces and prefixes is usually used class URI is used to denote the type of a thing:
<http://www.praha.eu/contract/007302> rdf:type pc:Contract .
predicate URI is used to denote the predicate in a triple:
<http://www.praha.eu/contract/007302> dcterms:title "..." .
Linking URIs of Related Things
risy.cz (Regional Information Service in CZ)
http://www.risy.cz/project/412457
czso.cz (Czech Statistical Office)
http://www.czso.cz/prague/stats/demog
mfcr.cz (Ministry of Finance of CZ)
http://www.mfcr.cz/prague/budget
praha.eu (Prague)
http://www.praha.eu/
contract/007316
http://www.praha.eu/city
http://www.praha.eu/council
http://www.praha.eu/
contract/006870
http://www.praha.eu/
contract/007302
n1:budget
n2:demographyn3:beneficiary
n3:realizedBy
http://registry.cszo.cz/
prague
http://www.czso.cz/prague
http://www.mfcr.cz/prague
http://www.risy.cz/location/prague
http://www.risy.cz/contract/007302
Linking URIs of Same Things
czso.cz (Czech Statistical Office)
http://registry.cszo.cz/
prague
http://www.czso.cz/prague
mfcr.cz (Ministry of Finance of CZ)
http://www.mfcr.cz/prague
praha.eu (Prague)
http://www.praha.eu/city
http://www.praha.eu/council
http://www.praha.eu/
contract/007302
risy.cz (Regional Information Service in CZ)
http://www.risy.cz/contract/007302
owl:sameAs
owl:sameAs
http://www.risy.cz/project/412457
http://www.czso.cz/prague/stats/demog
http://www.mfcr.cz/prague/budget
http://www.risy.cz/location/prague
Related vs. Same Things Situation: Publisher A publishes some data about a thing T
under URI U
you want to publish something new about T create your own URI V for T, publish new data under V and link V to U with owl:sameAs
you want to say that your things are related to T but you do not publish anything new for T do not create own HTTP URI for T and do not copy data about T from A, only link your things to U
You AV
... ...
...U
...
...
...
...
You A
...U
...
...
...
...
Primary Data vs. Secondary Data
czso.cz (Czech Statistical Office)
http://registry.cszo.cz/
prague
http://www.czso.cz/prague
http://www.czso.cz/prague/stats/demog
mfcr.cz (Ministry of Finance of CZ)
http://www.mfcr.cz/prague
http://www.mfcr.cz/prague/budget
praha.eu (Prague)
http://www.praha.eu/
contract/007316
http://www.praha.eu/city
http://www.praha.eu/council
http://www.praha.eu/
contract/006870
http://www.praha.eu/
contract/007302
risy.cz (Regional Information Service in CZ)
http://www.risy.cz/location/prague
http://www.risy.cz/project/412457
http://www.risy.cz/contract/007302
Linked Data for (Czech) Legislation
Linked Data in Czech Legislation
Acts and Regulations
Court Decisions
Public authorities
Agendas of Public
Authorities
Rights and obligations
Life situations
defin
ede
term
ine
regulate
concern
concern
supports
execute
Acts and Regulations Proposals
results from
Structural Layer of Legislative Documents
structural parts of acts and regulations references between
court decisions and parts of acts and regulations court decisions (legal case retrospection)
amendments what we have done
vocabulary of legislative documents metadata and structure of acts, regulations and decrees
represented as Linked Data• metadata about each version of each act, regulation and decree since
1945• structured content of versions of all acts, regulations and decrees valid in
2011, 2012 extraction of references and retrospection NLP
Structural Layer of Legislative Documents
Public Contracts Act
Public Contracts Act Version
07/2006Public Contracts
Act Version 07/2012
Public Contracts Act Version
01/2015
version of
vers
ion
of
version of
original
actual
last
Public Contracts Act Version
06/2007
versi
on of
Similarly, we represent paragraphs, sections, etc. of each version of each law. However, we have a problem to get consolidated documents.
DECISIONXYZ
DECISIONABCre
fers
refe
rs
Structural Layer of Legislative Documents
CASE6 C 135/2007
DECISION6 C 135/2007-44
CASE21 Co 472/2008
DECISION21 Co 472/2008-62
made for
base
d on
appe
al ag
ainst
made foroverturns
DECISION6 C 135/2007-141
mad
e fo
r
CASE21 Co 458/2011
based on appeal against
DECISION21 Co 458/2011-173
mad
e fo
r
affirm
s
CASE26 Cdo 2523/2012
based on extraordinary appeal against
DECISION26 Cdo 2523/2012
made foraffirms
Acts
Other Decisions
refers
Metropolitan Court in Prague
District Court Prague 9
Supreme Court
files
renders
files
files
Structural Layer of Legislative Documents
browsing data http
://linked.opendata.cz/resource/legislation/cz/act/2006/137-2006
• instance of lex:Act representing Public Procurement Act
Structural Layer of Legislative Documents
Structural Layer of Legislative Documents
querying data (SPARQL) http://linked.opendata.cz/sparql
Structural Layer of Legislative Documents
Which acts amended the Act about political parties of Czech Republic?
PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?amendmentTitle ?amendmentValidityWHERE { ?version frbr:realizationOf <http://linked.opendata.cz/resource/legislation/cz/act/1991/424-1991> .
?change lex:changedOriginal ?version .
?amendment lex:definesChange ?change ; dcterms:title ?amendmentTitle ; dcterms:valid ?amendmentValidity . }
Structural Layer of Legislative Documents
One of well-known hidden amendments. It increased the payments of state to political parties from 500k to 900k for one parliament member.
Structural Layer of Legislative Documents
Which another acts were amended together with Act about political parties of Czech Republic?
PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?anotherActTitle ?anotherVersionValidityWHERE { ?version frbr:realizationOf <http://linked.opendata.cz/resource/legislation/cz/act/1991/424-1991> .
?change lex:changedOriginal ?version .
?amendment lex:definesChange ?change ; lex:definesChange ?anotherChange . FILTER (?change != ?anotherChange)
?anotherChange lex:changeResult ?anotherVersion .
?anotherVersion frbr:realizationOf ?anotherAct ; dcterms:valid ?anotherVersionValidity .
?anotherAct dcterms:title ?anotherActTitle . }
Structural Layer of Legislative Documents
Structural Layer of Legislative Documents
How many changes have been done in Czech legislation per year?
PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT (COUNT(?amendment) as ?changeCnt) (year(?validity) AS ?year)WHERE { ?amendment lex:definesChange ?change ; dcterms:valid ?validity .}GROUP BY year(?validity)ORDER BY DESC(year(?validity))
Structural Layer of Legislative Documents
Semantic Layer of Legislative Documents
rights, obligations and subjects defined by legislation
their occurrences in court decisions currently we start experiments with extracting
these concepts and relationships between them from documents with acts NLP based on syntactic parsing
we do not have RDF representation yet
Semantic Layer of Legislative Documents
Semantic Layer of Legislative Documents