peak cloud based data - linked data

81
London - New York - Dubai - Mumbai 2011 Dealing with the “new” data in the “Cloud” – Linked Data

Upload: wael-elrifai

Post on 25-Jan-2015

597 views

Category:

Technology


2 download

DESCRIPTION

A description of using linked data to create clear and unambiguous systems across the Internet or within your enterprise.

TRANSCRIPT

Page 1: Peak   cloud based data - linked data

London - New York - Dubai - Mumbai 2011

Dealing with the “new” data in the

“Cloud” – Linked Data

Page 2: Peak   cloud based data - linked data

Table of Contents

Definitions 3

History 5

The Modigliani Test 11

Link Data 13

Raw Data 23

Resource Description Framework 30

Linked Data Principles 42

Publishing Linked Data 57

Faceted Browsers 65

On-the-fly Mashups 67

SPARQL 73

What is a Linked Data Application 77

Characteristics of a Linked Data Application 78

Contact Us 81

Page 3: Peak   cloud based data - linked data

Definitions

RDF: The RDF data model is similar to classic conceptual

modelling approaches such as Entity-Relationship or Class

diagrams, as it is based upon the idea of making statements about

resources (in particular Web resources) in the form of subject-

predicate-object expressions. These expressions are known as

triples in RDF terminology. The subject denotes the resource, and

the predicate denotes traits or aspects of the resource and

expresses a relationship between the subject and the object. For

example, one way to represent the notion "The sky has the colour

blue" in RDF is as the triple: a subject denoting "the sky", a

predicate denoting "has the colour", and an object denoting "blue".

RDF is an abstract model with several serialization formats (i.e.,

file formats), and so the particular way in which a resource or

triple is encoded varies from format to format.

Page 4: Peak   cloud based data - linked data

Definitions

SPARQL: (SPARQL Protocol and RDF Query Language,

pronounced "sparkle") is an RDF query language

Linked Data: Linked Data describes a method of publishing

structured data, so that it can be interlinked and become more

useful. It builds upon standard Web technologies, such as HTTP

and URIs - but rather than using them to serve web pages for

human readers, it extends them to share information in a way that

can be read automatically by computers. This enables data from

different sources to be connected and queried.

Page 5: Peak   cloud based data - linked data

History

Linked Data Design Issues by Tim Berners-Lee July 2006

Linked Open Data Project WWW2007

First LOD Cloud May 2007

BBC publishes Linked Data 2008

NY Times announcement SemTech2009 - ISWC09

Data.gov.uk publishes Linked Data 2010

Page 6: Peak   cloud based data - linked data

May 2007

Page 7: Peak   cloud based data - linked data

Mar 2008

Page 8: Peak   cloud based data - linked data

Sept 2008

Page 9: Peak   cloud based data - linked data

Mar 2009

Page 10: Peak   cloud based data - linked data

July 2009

Page 11: Peak   cloud based data - linked data

The Modigliani Test

Show me all the locations of all the original paintings

of Modigliani

Daniel Koller (@dakoller) showed that you can find

this with a SPARQL query on DBpedia

Page 12: Peak   cloud based data - linked data
Page 13: Peak   cloud based data - linked data

So what is Linked Data?

Page 14: Peak   cloud based data - linked data

Do you SEARCH or do you FIND?

Page 15: Peak   cloud based data - linked data

Search for

Football Players who went to the University of

Texas at Austin, played for the Dallas Cowboys as

Cornerback

Page 16: Peak   cloud based data - linked data
Page 17: Peak   cloud based data - linked data

Why can’t we just FIND it…

Page 18: Peak   cloud based data - linked data
Page 19: Peak   cloud based data - linked data

Using the Current Web =internet + links + docs

is terribly inefficient

Page 20: Peak   cloud based data - linked data

So what is the problem?

We aren’t always interested in documents

• We are interested in THINGS

• These THINGS might be in documents

We can read a HTML document rendered in a browser and find

what we are searching for

• This is hard for computers. It’s typically based on

guesswork from some primitive NLP engine, or simple

keyword search

Page 21: Peak   cloud based data - linked data

What do we need to do?

Make it easy for computers/software to find THINGS

Page 22: Peak   cloud based data - linked data

How can we do that?

• Besides publishing documents on the web

- which computers can’t understand easily

• Let’s publish something that computers can

understand

Page 23: Peak   cloud based data - linked data

RAW DATA!

But don’t we already publish raw data in

RDBMS, XML, CSV, etc?

Page 24: Peak   cloud based data - linked data

Yes!

But it’s not in a consistent format, and very

difficult to integrate (or “link”).

Page 25: Peak   cloud based data - linked data

For example, how do I know that the

Wael Elrifai in Facebook is the same

as Wael Elrifai in Twitter

Page 26: Peak   cloud based data - linked data

Don’t we already have a standard

way of publishing on the web?

Page 27: Peak   cloud based data - linked data

We have a standardized way of

publishing documents on the web, right?

HTML

Page 28: Peak   cloud based data - linked data

Then why can’t we have a standard way

of publishing data on the Web?

Page 29: Peak   cloud based data - linked data

In fact, we do have one.

Page 30: Peak   cloud based data - linked data

Resource Description Framework (RDF)

A data model

•A way to model data

•i.e. Relational databases use relational data model

RDF is a triple data model

Labeled Graph

Subject, Predicate, Object

<Wael> <was born in> <Beirut>

<Beirut> <is part of> <the Lebanon>

<Wael> <likes> <the Semantic Web>

Page 31: Peak   cloud based data - linked data

RDF can be serialized in different ways

RDF/XML

RDFa (RDF in HTML)

N3

Turtle

JSON

Page 32: Peak   cloud based data - linked data

So does that mean that I have to

publish my data in RDF now?

Page 33: Peak   cloud based data - linked data

You don’t have to… but it sure

would be nice.

Page 34: Peak   cloud based data - linked data

Document on the Web

Page 35: Peak   cloud based data - linked data

Databases back up documents

Isbn Title Author PublisherID ReleasedData

978-0-596-15381-6

Programming the Semantic Web

Toby Segaran 1 July 2009

… … … … …

PublisherID PublisherName

1 O’Reilly Media

… …

This is a THING: A book title “Programming the Semantic Web” by Toby Segaran, …

THINGS have PROPERTIES: A Book as a Title, an author, …

Page 36: Peak   cloud based data - linked data

Lets represent the data in RDF

Isbn Title Author PublisherID ReleasedData

978-0-596-15381-6

Programming the Semantic Web

Toby Segaran

1 July 2009

book

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

Publisher O’Reilly

title

name

author

publisher

isbn

PublisherID PublisherName

1 O’Reilly Media

Page 37: Peak   cloud based data - linked data

Remember that we are on the web

Everything on the web is identified by a URL

Page 38: Peak   cloud based data - linked data

And now let’s link the data to other data

http://…/isbn978

Programming the Semantic Web

978-0-596-15381-6

Toby Segaran

http://…/publisher1

O’Reilly

title

name

author

publisher

isbn

Page 39: Peak   cloud based data - linked data

And now consider the data from Revyu.com

http://…/isbn978

http://…/revie

w1

Awesome Book

http://…/reviewer

Wael Elrifai

hasReview

reviewer

description

name

Page 40: Peak   cloud based data - linked data

Let’s start to link data

http://…/isbn9

78

Programming the Semantic

Web

978-0-596-15381-6

Toby Segaran

http://…/publisher1 O’Reilly

title

name

author

publisher

isbn

http://…/isbn978

sameAs

http://…/revie

w1

Awesome Book

http://…/revie

wer

Wael Elrifai

hasReview

hasReviewer

description

name

Page 41: Peak   cloud based data - linked data

Data on the Web that is in RDF and

is linked to other RDF data is

LINKED DATA

Page 42: Peak   cloud based data - linked data

Linked Data Principles

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up

(dereference) those names.

3. When someone looks up a URI, provide

useful information.

4. Include links to other URIs so that they can

discover more things.

Page 43: Peak   cloud based data - linked data

Linked Data makes the web appear

a single global database!

The same can be done inside your company!

Page 44: Peak   cloud based data - linked data

What if you wanted to know your company’s

EBITDA for Catalonia in 2010?

You could have a EDW pre-aggregate and

distribute the data, an analyst calculate it on

the spot, or…

Page 45: Peak   cloud based data - linked data

Linked data in your internal semantic

web could relate all transactions to a

linked financial formulae!

You ask the question, tell your system

where to look (as part of the question,

this can be prebuilt) and voilà!

Page 46: Peak   cloud based data - linked data

I can query a database with SQL. Is

there a way to query Linked Data with a

query language?

Page 47: Peak   cloud based data - linked data

Yes! There is actually a standardize

language for that

Page 48: Peak   cloud based data - linked data

FIND all the reviews on the book

“Programming the Semantic Web”

by people who live in London

Page 49: Peak   cloud based data - linked data

http://…/isbn9

78

Programming the Semantic

Web

978-0-596-15381-6

Toby Segaran

http://…/publishe

r1

O’Reilly

title

name

author

publisher

isbn

http://…/isbn978

sameAs

http://…/review1

Awesome Book

http://…/reviewer

Wael Elrifai

http://waelworldwide.com

hasReview

hasReviewer

description

name

sameAs

livesIn

Wael Elrifai name

http://dbpedia.org/London

Page 50: Peak   cloud based data - linked data

This looks cool, but let’s be realistic.

What is the incentive to publish

Linked Data?

Page 51: Peak   cloud based data - linked data

What was your incentive to publish

an HTML (Intranet) page in 1990?

Page 52: Peak   cloud based data - linked data

1) Share data in documents

2) Because you neighbor was doing it

Page 53: Peak   cloud based data - linked data

So why should we publish

Linked Data in 2011?

Page 54: Peak   cloud based data - linked data

1) Share data as data

2) Because you neighbor is doing it

Page 55: Peak   cloud based data - linked data

You’ll be among good company…

Page 56: Peak   cloud based data - linked data

Linked Data Publishers

UK Government

US Government

BBC

Open Calais – Thomson Reuters

Freebase

NY Times

Best Buy

CNET

Dbpedia

Page 57: Peak   cloud based data - linked data

How can I publish Linked Data?

Page 58: Peak   cloud based data - linked data

Publishing Linked Data

• Legacy Data in Relational Databases

• D2R Server

• Virtuoso

• Triplify

• Ultrawrap

• CMS

• Drupal 7

• Native RDF Stores

• Databases for RDF (Triple Stores)

• AllegroGraph, Jena, Sesame, Virtuoso

• Talis Platform (Linked Data in the Cloud)

• In HTML with RDFa

Page 59: Peak   cloud based data - linked data

Consuming Linked Data by Humans

Page 60: Peak   cloud based data - linked data

HTML Browsers

RDF can be serialized in RDFa

Have you heard of

•Yahoo’s Search Monkey

•Google Rich Snippets?

They are consuming RDFa

But WHY?

Page 61: Peak   cloud based data - linked data

Because there is life beyond ten

blue links

Page 62: Peak   cloud based data - linked data

Google and Yahoo are starting to crawl

RDFa!

The Semantic Web is a reality!

Page 63: Peak   cloud based data - linked data

The Reality

• Yahoo is crawling data that is in RDFa and

Microformats under a specific vocabularies

• FOAF

• GoodRelations

• Google is crawling RDFa and Microformaats that

use the Google vocabulary

Page 64: Peak   cloud based data - linked data

Linked Data Browsers

Tabulator

•http://www.w3.org/2005/ajar/tab

OpenLink

•http://ode.openlinksw.com/

Zitgist Dataviewr

•http://dataviewer.zitgist.com/

Marbles

•http://www5.wiwiss.fu-berlin.de/marbles/

Explorator

•http://www.tecweb.inf.puc-rio.br/explorator

Page 65: Peak   cloud based data - linked data

Faceted Browsers

Page 66: Peak   cloud based data - linked data

http://dbpedia.neofonie.de

Page 67: Peak   cloud based data - linked data

http://dev.semsol.com/2010/semtech/

Page 68: Peak   cloud based data - linked data

On-the-fly Mashups

Page 69: Peak   cloud based data - linked data

http://sig.ma

Page 70: Peak   cloud based data - linked data

What’s next?

Page 71: Peak   cloud based data - linked data

Time to create new and innovative

ways to interact with Linked Data

Page 72: Peak   cloud based data - linked data

This may be one of the Killer Apps that we have all been

waiting for

http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg

Page 74: Peak   cloud based data - linked data

• Querying a single dataset is quite boring

compared to:

• Issuing SPARQL queries over multiple datasets

• How can you do this?

1. Issue follow-up queries to different endpoints

2. Querying a central collection of datasets

3. Build store with copies of relevant datasets

4. Use query federation system

Page 75: Peak   cloud based data - linked data

Follow-up Queries

• Idea: issue follow-up queries over other

datasets based on results from previous

queries

• Substituting placeholders in query templates

Page 76: Peak   cloud based data - linked data

Getting Started

• Finding URIs

• Finding Additional Data

• Finding SPARQL Endpoints

Page 77: Peak   cloud based data - linked data

What is a Linked Data application

Software system that makes use of data on the

web from multiple datasets AND that benefits

from links between the datasets

Page 78: Peak   cloud based data - linked data

Characteristics of Linked Data Applications

• Consume data that is published on the web following

the Linked Data principles

• Discover further information by following the links

between different data sources

• Combine the consumed linked data with data from

sources (not necessarily Linked Data)

• Expose the combined data back to the web

following the Linked Data principles

• Offer value to end-users

Page 79: Peak   cloud based data - linked data

Examples

• http://data-gov.tw.rpi.edu/wiki

• http://dbrec.net/

• http://fanhu.bz/

• http://data.nytimes.com/schools/schools.html

• http://sig.ma

• http://visinav.deri.org/semtech2010/

Page 80: Peak   cloud based data - linked data

Hot Research Topics

• Interlinking Algorithms

• Provenance and Trust

• Dataset Dynamics

• UI

• Distributed Query

Page 81: Peak   cloud based data - linked data

Contact

PEAK Consulting

Headquarters

90 Long Acre, Covent Garden

London WC2E 9RZ

United Kingdom

Tel: +44 (0)207 849 3422

Fax: +44 (0)207 990 9478

United States

11 Penn Plaza, 5th floor

New York, NY 1000

United States

Tel: +1 (212) 946 4824

Fax: +1 (212) 946 2801

United Arab Emirates

Unit P12 Rimal, The

Walk

PO Box 487 177 Dubai

United Arab Emirates

Tel: +44 (0)207 849

3422

Fax: +44 (0)207 990

9478

http://www.peakconsulting.eu

[email protected]