20111120 warsaw learning curve by b hyland notes

37
Climbing the Learning Curve with Linked Data Open Government Data Camp 20-Oct-2011 Bernadette Hyland, CEO [email protected] Twitter @BernHyland Wednesday, October 19, 2011 Information overload, Impatient society, Change is the only constant Software is not valued by its usefulness ... but by its expected future value

Upload: bernadette-hyland

Post on 05-Dec-2014

2.122 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: 20111120 warsaw   learning curve by b hyland notes

Climbing the Learning Curve with Linked DataOpen Government Data Camp 20-Oct-2011

Bernadette Hyland, [email protected]

Twitter @BernHyland

Wednesday, October 19, 2011

Information overload, Impatient society, Change is the only constantSoftware is not valued by its usefulness ... but by its expected future value

Page 2: 20111120 warsaw   learning curve by b hyland notes

• Linked Data is about publishing and consuming data using international data standards

• Based on 20 year old idea

• Goal is to solve organizational issues related to data silos, requirements for faster data integration and an environment of reduced IT budgets

Wednesday, October 19, 2011

Why am I speaking on Linked Data and sharing? I’m here in my role as the co-chair of W3C GLD WG.I’m also a long time entrepreneur in this space having founded companies that led to several of the most widely used Open Source projects for Linked Data, including Mulgara, OpenRDF/Sesame, the PURLs 2.0 and Callimachus. I’ve authored chapters in two of these peer-reviewed books published by Springer which are available in hardcopy or for free, via the Web.

Page 3: 20111120 warsaw   learning curve by b hyland notes

There is a Process

PublishConvertDescribeNameModelIdentify

Maintain

Wednesday, October 19, 2011

Identify the data, model exemplar records -- what you are going to carry forward & what you are going to leave behind. Name all of the NOUNs. Turn the records into URIs. Next, describe RESOURCES with vocabularies. Write a script or process to convert from canonical form to RDF. Then publish. Maintain over time.

Page 4: 20111120 warsaw   learning curve by b hyland notes

Preparation1.Leverage what exists

• Request a copy of the logical and physical model of the database(s)

• Obtain data extracts (i.e., databases and/or spreadsheets) or create data in a way that can be replicated.

Wednesday, October 19, 2011

Linked Data modelers typically model two or three exemplar objects to begin the process. We figure out the relationships and identify how each object relates to the real world, initially drawing on a large white board or collaborative wiki site.

Page 5: 20111120 warsaw   learning curve by b hyland notes

Model the data2. Model data without context to allow for

reuse and easier merging of data sets

•Traditional DBAs organize data for specified Web services or applications.

•With LD, application logic does not drive the data schema, concepts, etc.

Wednesday, October 19, 2011

LD domain experts model data without context versus traditional modelers who typically organize data for specified Web services or applications. Application logic does not drive the data schema.Better enables data reuse and easier merging of data sets.

Page 6: 20111120 warsaw   learning curve by b hyland notes

Model the data3.Look for real world objects of interest (e.g.,

people, places, things, locations, etc.) and model them.

• Investigate how others are already modeling similar or related data.

• Look for duplication and normalize the data

•Use common sense to decide whether or not to make link

Wednesday, October 19, 2011

Linked Data modeling experts typically model two or three exemplar objects to begin the process. We figure out the relationships and identify how each object relates to the real world, initially drawing on a large white board or collaborative wiki site.

Page 7: 20111120 warsaw   learning curve by b hyland notes

Model the data ...

4. Connect data from different sources and authoritative vocabularies (see list of popular vocabularies below).

•Use URIs as names for your objects

Wednesday, October 19, 2011

During the modeling process, donʼt think about how an application will use your data. Instead, focus on modeling real world things that are known about the data and how it is related to other objects. Take the time to understand the data and how the objects represented in the data are related to each other.

Page 8: 20111120 warsaw   learning curve by b hyland notes

Model the data ...

•Put aside immediate needs of any application

•Don’t think about how an application will use your data

•Do think about time and how the data will change over time.

Wednesday, October 19, 2011

Focus on modeling real world things that are known about the data and how it is related to other objects. Take the time to understand the data and how the objects represented in the data are related to each other.

Page 9: 20111120 warsaw   learning curve by b hyland notes

Convert, Publish & Maintain5.Write a script or process to convert the

data set repeatedly

6.Publish to the Web and announce it! (more details shortly)

7.Maintenance strategy (more details in the social contract at the end)

Wednesday, October 19, 2011

1.Expect to be maintained in perpetuity2.Do not encode the name of the department or agency currently defining and naming a

concept, as that may be re-assigned3.Support a direct response, or redirect to department/agency servers

Page 10: 20111120 warsaw   learning curve by b hyland notes

Take the plunge ... Be forgiving

•Simplistic data models can still be useful

•Better to make progress with something rather than do nothing because we cannot be comprehensive and complete

Wednesday, October 19, 2011

Science still doesn’t have a good understanding of a gene. We have gene therapy yet we haven’t agreed on a definition of a gene.

We capture vast quantities of topographical data (USGS), yet scientists still debate the meaning of topographical elements. From the time we are young children, we use mono syllabic words to navigate trees and roads. If our parents said we cannot do anything because we don’t have a perfect model of the world, we couldn’t have learned to navigate our home as toddlers.

Page 11: 20111120 warsaw   learning curve by b hyland notes

Take an iterative approach1. Review of modeling decisions

2. Review vocabularies chosen and developed

3. Modify/update data conversion scripts

4. Do a maintenance walk-through with real use cases

5. Show how to explore data with SPARQL and visualizations

6. Discuss a persistent identifier strategy (think PURLs)

Wednesday, October 19, 2011

Iterate on this process in short sprints, two weeks at a time. Don’t be afraid to review modeling decisions with SMEs. Review vocabulary choicesDo a maintenance walk through with actual use cases and ensure the team can carry forwardShow people their OWN DATA in visualization tools like Callimachus.

Page 12: 20111120 warsaw   learning curve by b hyland notes

Reality ... We started with the usual CSV dump ... ugly, cumbersome data

Wednesday, October 19, 2011

Page 13: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 14: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 15: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 16: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

We used two common RDF vocabulary description languages in our modeling for SRS: RDF Schema (RDFS) and Simple Knowledge Organization System (SKOS). RDFS is used to give labels to objects, synonyms and substance lists. Human-readable comments were added using rdfs:comment property.

Page 17: 20111120 warsaw   learning curve by b hyland notes

Possible Solutions for Data Management

•Roll your own three-tier

•Content Management System

•Wiki-based

•Linked Data Management System

Wednesday, October 19, 2011

A few different possible solutions to the three challenges stated earlier

Page 18: 20111120 warsaw   learning curve by b hyland notes

Content Management Systems

•Wordpress

•Drupal

•Joomla!

Wednesday, October 19, 2011

The big downside to 3 tier architecture is the upfront cost, as well as getting people to agree upfront on the schemaSo we then looked at CMS. These are systems that can be up and running the same day, however these systems are architected to work well with primarily unstructured content.

Page 19: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

We have a strong heritage in FLOSS projects starting with the first community supported RDF database in 2003. We offered a commercial version used by the US defense community primarily, and in 2004 open sourced 80% into what became the Mulgara triple store and is used by institutions all over the world. OpenRDF and Sesame was led by Aduna.

Page 20: 20111120 warsaw   learning curve by b hyland notes

Linked Data Management System

•Callimachus (kəәlĭm'əәkəәs) is a framework for data-driven applications based on Linked Data principles.

•Callimachus allows Web authors to quickly and easily create semantically-enabled Web applications.

Wednesday, October 19, 2011

Wiki Systems don't handle structured content well nor promulgate change well. A tool for Web 2.0 developers creating DATA RICH web sites was needed … We created Callimachus, a triples up & down solution (no mySQL under the covers). HIGHLY SCALABLE for real world use.Named for the father of Bibliography (The Pinakes) at the Great Library of Alexandria. Lived during 305-c. 240 BCE. He could not categorize his own work using Aristotle's hierarchical system. He was the first person who defined the use case for Linked Data.

Page 21: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Callimachus uses RDFa as a query langage; templates are parsed to build SPARQL from RDFa markup and the query result set is returned to the Web page for human to read, or a machine to parse. This is very valuable and to our knowledge, there is no other solution available as FLOSS or commercially that compares to Callimachus at this time.

Page 22: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Once we had the data modeled, validated with SMEs, we converted & loaded into Callimachus. We spent about 1 hour creating templates to view the data in Callimachus. So here is the power of LOD in action -- Within one hour, we could view the data, navigate through the data and verify the contents without being a DBA or Java developer!

Page 23: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Callimachus’ forms driven interface allows authorized users to modify the underlying triples in the database -- we are round tripping create/modify/delete to a triple store via a Web page!

Page 24: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 25: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 26: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 27: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 28: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Note the fixed name and added comment.

Page 29: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

A history of changes is kept. Note the change to the name and the added comment, along with the time/date and name of the user who made the edit.

Page 30: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Callimachus view page of the SRS, created in less than an hour. Someone with HTML, CSS and RDFa / SPARQL skills can create this type of page. No understanding of semantics, deep RDF knowledge is required.

Page 31: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Notice the wiki like editing capabilities of a Callimachus page!

Page 32: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 33: 20111120 warsaw   learning curve by b hyland notes

Wednesday, October 19, 2011

Page 34: 20111120 warsaw   learning curve by b hyland notes

•Web 2.0 developers can create data driven application with templates in hours

•Triples up & down (no mySQL under the covers)

•Wiki editing of content

•Access control

•Collaboration via Web

•Change tracking (history)

•Page/form Templates

Wednesday, October 19, 2011

Callimachus is a great way to collaboratively manage your Linked DataMedia Wiki is to free text what Callimachus is to linked dataCallimachus uses a straight forward ACL for linked data

Page 35: 20111120 warsaw   learning curve by b hyland notes

Join the Community•Callimachus has benefited from 2+ years of corporate support

•We’re using it for real world Web applications in environmental protection, finance and healthcare

•We’d love to work with the publishing industry

•Open Source project

•Visit callimachusproject.org

• Join the discussion

Wednesday, October 19, 2011

Page 36: 20111120 warsaw   learning curve by b hyland notes

@BernHylandEmail. [email protected]

Wednesday, October 19, 2011

Page 37: 20111120 warsaw   learning curve by b hyland notes

WHY SHARE AND WHO BENEFITS?

Bernadette Hyland, co-chair W3C Government Linked Data Working Group

http://purl.org/net/bhyland/why-share-2011-10

Next talk today @ 14:00 Sala I - “Linked Open Government

Data Workshop”

Wednesday, October 19, 2011