peta vs. meta: rethinking data interoperability on the ... · peta vs. meta: rethinking data...

25
Tetherless World Constellation Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for Data Exploration and Applications Rensselaer Polytechnic Institute, USA http://www.cs.rpi.edu/~hendler @jahendler (twitter)

Upload: others

Post on 19-Jan-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Peta vs. Meta: Rethinking Data Interoperability on the

World Wide Web

Jim Hendler

Director, Rensselaer Institute for Data Exploration and Applications

Rensselaer Polytechnic Institute, USA

http://www.cs.rpi.edu/~hendler

@jahendler (twitter)

Page 2: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Office of Research

Roots: Data Exploration

Discover

Integrate

Validate

Explain

Geekopedia: Data exploration helps a data consumer focus an information search on the pertinent

aspect of relevant data before true analysis can be achieved. In large data sets, data is not

gathered or controlled in a focused manner. Even in smaller data sets, it is also true that data

gathered are not in a very rigid and specific technique can result in a disorganized manner and a

myriad of subsets each…

Page 3: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Government Data Sharing

Page 4: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

2012 International Open Government Data Conference—Open Gov Data Tutorial

Semantic Web and Linked Data (UK)

County Council

Ordnance Survey

Royal Mail

9 July 2012 IOGDC Open Data Tutorial 4

Page 5: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Government Data in the linked open data cloud

http://linkeddata.org/

Government Data is

currently over ½ the cloud in

size (~17B triples), 10s of

thousands of links to other

data (within and without)

Page 6: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Distribution Statement

Creating Data Mashups Requires Semantics

More than 50 of these at http://logd.tw.rpi.edu

Page 7: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

BUT: it’s not that easy

Head to head comparions shows that

burglaries in Avon and Somerset (UK) far

exceed those in Los Angeles, California

(one of the highest crime areas in the US)

Page 8: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

The problem is (likely) semantics

Same or

different?

Do the terms mean the same? Are they collected in the same way? Are

they processed differently? …

Page 9: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Metadata

• International data sharing – W3C Govt Linked Data Working Group

– Need for vocabularies within govt sectors

• Esp for cross-langauge use – How can we compare health (or legal, or social, or ….) data

between countries like US, UK, India, Kenya (English) with Norway, China, France, etc.

– How can we link local govts (in traditional languages, local dialects, etc) w/national data

• Modern Metadata design is crucial to govt data sharing

– Needed for search and federation in large data sharing efforts

Page 10: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Traditional Metadata

• Traditionally metadata tries to be comprehensive

–Example:ISO 19115 (GIS standard)

• >400 elements

• 14 “packages”

• Dozens of UML models (not all consistent w/ each other)

Page 11: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Not your “father’s metadata”

• Big Data on the web

– is moving away from traditional relational models (cf. NoSQL)

– Moving towards third party application and extension (cf. Json)

– Focus on interoperability and exchange with “lightweight” semantics

• Using ideas from the Semantic Web – Search: Schema.org

– Social Networking: OGP

Page 12: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Schema.org

Page 13: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Dataset extension to schema.org - April, 2013

Schema.org/Dataset – add this to your pages!

Page 14: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Schema.org/DataBase

Human-readable database

description (HTML)

Page 15: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Schema.org/Database

Embedded meta-

data (RDFa)

Page 16: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Schema.org/Dataset google.com/publicdata (early days)

Page 17: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Distribution Statement

US moving in right direction

Page 18: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

USA “Project Data” – metadata JSON

Aimed at developers

Based on DCAT

Page 19: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

USA “Project Data” – metadata RDFa

Embedded metadata for

Search, Web Apps

Based on Schema.org/Dataset

Page 20: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

EU moving in right direction

ADMS

Page 21: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

My challenge to the comunity

Smith James June 4

Jones Fred May 17

O’Connell Frank April 3

Chang Wu February 21

Hoffman Bernd December 9

Person

Date

It’s not enough just to describe the data elements…

Page 22: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Describing a dataset … requires a context

Smith James June 4

Jones Fred May 17

O’Connell Frank April 3

Chang Wu February 21

Hoffman Bernd December 9

Person

Date

1976 Dates

of Birth

Page 23: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Describing a dataset … requires a context How do we capture more of this information?

Smith James June 4

Jones Fred May 17

O’Connell Frank April 3

Chang Wu February 21

Hoffman Bernd December 9

Person

Date

1976 Cancer

Mortality

dates

Page 24: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Data

MetaData

Challenge: Linked Metadata

“A Small World of Big Data”

Linked Semantic Web “metadata” documents can be used to link very large databases in

distributed data systems. This leads to orders of magnitude reduction in information flow for

large-scale distributed data problems.

Page 25: Peta vs. Meta: Rethinking Data Interoperability on the ... · Peta vs. Meta: Rethinking Data Interoperability on the World Wide Web Jim Hendler Director, Rensselaer Institute for

Tetherless World Constellation

Conclusions

• Open data is becoming “Broad Data”

– World Wide Web trend towards more and more varied data

• In many domains

– E-commerce, Open Govt, many more (cf. Health/Medical care)

• Broad data requires thinking outside the “Database” box

– DIVE: discover, integrate, visualize, explain

• Broad data requires 1.Modern, Web-oriented metadata

2.LINKING the metadata, not the data