edf2012: the web of data and its five stars
DESCRIPTION
TRANSCRIPT
The Web of Data and its Five Stars
Richard Cyganiak, DERI, NUI Galway���@cygri
6 June 2012
Realising and Exploiting the EU data cloud
European Data Forum, Copenhagen, Denmark
Generating insight from data
• Today, data is abundant
• New middlemen find new ways of getting data to the end user
• Supply and demand for data higher than ever
• Analyst's problem is no longer a lack of relevant data, but:
• Understanding data
• Assessing applicability
• Getting it into the right form for use
• Similar problems inside and outside of the firewall
From the Web ���to the Web of Data
Tim Berners-Lee’s 5-star plan for an open web of data
★ Make data available on the Web under an open license
★★ Make it available as structured data
★★★ Use a non-proprietary format
★★★★ Use URIs to identify things
★★★★★ Link your data to other people’s data ���to provide context
The 0th star
• Data catalog with good metadata
• Make your data findable
Data on the Web, Open License
★
Open Data
Government data catalogs
Open vs. Closed Data used to be closed by default.���
In the future, it will be open by default.
Is open data just for governments?
Good reasons against opening data
• Privacy
• Competitive advantage
• Producing data and charging for it as business model
• Can't get license from upstream
Business models
Scott Brinker, http://www.chiefmartec.com/2010/01/7-business-models-for-linked-data.html
Data licenses
http://opendefinition.org/licenses/
Structured Data
★★
Enabling re-use
• Delivering data to end users in different forms
• Combining data with other data
• 3rd party analysis of data
Formats in government data
• Good for re-use: MS Excel, CSV, XML, JSON, Microdata
• Not so good for re-use: Pure websites, MS Word
• Bad for re-use: PDF
• Really bad for re-use: Only charts/maps without numbers
Symptom: Screenscraping
Non-Proprietary Formats
★★★
Specialist formats
• Specialist tools often have specialist formats
• Few people have the tools
• Expensive
• Difficult to re-use
• (Geospatial tools, statistics packages, etc.)
Non-proprietary formats, open standards
• CSV (dead simple)
• XML
• JSON
• RDF (good for 4+5 stars)
• OGC web services
• OAI-ORE web services
Use URIs as Identifiers
★★★★
http://www.bbc.co.uk/music/artists/79239441-bfd5-4981-a70c-55c3f15c1287
http://data.ordnancesurvey.co.uk/id/postcodeunit/HA99HD
http://opencorporates.com/companies/us_vt/F013910
Turning local identifiers into URIs–Why?
• Make them globally unique
• Clarify authority
• Make them resolvable
• Make them linkable
http://data.ordnancesurvey.co.uk/id/7000000000017765
The schema level
By using URIs, connections that existed only in people's minds can be put explicitly into the data model.
Include Links to Other Data
★★★★★
Hyperlinks are the soul of the Web.���
The Web of Data is no different.
Central Contractor Registration (CCR)
Geonames
Data links
Linked Data Principles
1. Use URIs to name things (not only documents, but also people, locations, concepts, etc.)
2. To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs
3. When someone looks up a URI, provide useful information (structured data in RDF, SPARQL).
4. Include links to other URIs allowing agents to discover more things
http://www.w3.org/DesignIssues/LinkedData.html
Summary
• In the future, data will be open by default, unless good reason not to
• Emergence of a web of data
• “Five-star plan” for getting there, dataset by dataset
• 2 stars: re-usable data!
• 3 stars: open standards!
• 4+5 stars: connect the silos!