caepia 2011

32
The Data Era: Production, Consumption, Challenges Miriam Fernández 8 th November, CAEPIA 2011 Website: http://people.kmi.open.ac.uk/miriam/about/ Twitter: @miri_fs Slide_share: http://www.slideshare.net/miriamfs

Upload: miriamfs

Post on 09-May-2015

352 views

Category:

Technology


0 download

DESCRIPTION

Slides for my keynote at the KEESOS workshop http://ir.ii.uam.es/keesos2011/, CAPEIA 2011

TRANSCRIPT

Page 1: CAEPIA 2011

The Data Era: Production, Consumption, Challenges

Miriam Fernández8th November, CAEPIA 2011

Website: http://people.kmi.open.ac.uk/miriam/about/

Twitter: @miri_fs

Slide_share: http://www.slideshare.net/miriamfs

Page 2: CAEPIA 2011

What is … ?

Page 3: CAEPIA 2011

How do humans infer knowledge?

Alejandro

in Chicago!

Syntactic interpretation

Semantic interpretation

A picture!

Page 4: CAEPIA 2011

How do machines infer knowledge?

Syntactic interpretation

Semantic interpretation

A picture!

Page 5: CAEPIA 2011

The Challenge

• We need to find the way in which machines will interpret and extract knowledge for us!

=

Page 6: CAEPIA 2011

The Challenge

Page 7: CAEPIA 2011

The Data Era

• The 2011 Digital University Study: Extracting Value from Chaos (IDC)

–We have entered the Zettabyte era (a trillion gigabytes or a billion terabytes)

–The great of information growth appears to be exceeding Moore’s Law

http://www.emc.com/collateral/demos/

microsites/emc-digital-universe-

2011/index.htm

Page 8: CAEPIA 2011

Big Value from Data

• Big Data: The next frontier for innovation, competition and productivity (McKinsey)

–$300 billion potential annual value to US health care

–€250 billion potential annual value to Europe’s public sector administration

http://www.mckinsey.com/mgi/publications/big_data/pdfs/M

GI_big_data_full_report.pdf

Page 9: CAEPIA 2011

IBM City Forward

The Smarter Cities Challenge is a competitive grant program

awarding $50 million worth of IBM expertise over the next three

years to 100 cities around the globe. Designed to address the

wide range of challenges facing cities today

Page 10: CAEPIA 2011

Consumption

• We need to provide efficient ways to consume data in order to extract the value out of it, the knowledge

–Syntactic approaches (visual analytics)• The data is collected, centralized and analysed

• Visualizations for humans to extract knowledge

–Semantic approaches• The information is distributed / interlinked

• Semantic structures are added to the data so that machines can better understand it

Page 11: CAEPIA 2011

Syntactic approaches

• Some examples

–Gap Minder

– IBM many eyes

–Google Public Data Explorer

–Google correlate

–Google N-Gram viewer•What is the most popular hair

colour in the literature?

Page 12: CAEPIA 2011

Google N-Gram Viewer

Page 13: CAEPIA 2011

Semantic approaches

• The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation

Tim Berners-Lee, James Hendler,

Ora Lassila, The Semantic Web,

Scientific American, May 2001

Page 14: CAEPIA 2011

The SW vision

• Use semantic structures (ontologies) to represent data. Provide machines with the ability to interpret and extract knowledge

=

Page 15: CAEPIA 2011

Adding Structure

• Two paths towards the SW vision

–Metadata embedded in HTML •Microformats

• RDFa

•Microdata

–Linked Data• Putting the data online in a standard, web enabled representation (RDF)

•Make the data Web addressable (URIs)

Page 16: CAEPIA 2011

Metadata in HTML

• An exampleKnowledge Media Institute

Walton Hall

Milton Keynes

MK7 6AA

<div class="vcard">

<div class="fn org">Knowledge Media Institute</div>

<div class="adr">

<div class="street-address">Walton Hall</div>

<div>

<span class="locality">Milton Keynes</span>,

<span class="postal-code">MK7 6AA</span>

</div>

<div class="country-name">United Kingdom</div>

</div>

</div>

Page 17: CAEPIA 2011

Metadata in HTML

• Schema.org

Semantically enhanced Information Retrieval:

an ontology-based approach

http://people.kmi.open.ac.uk/miriam/about/

Page 18: CAEPIA 2011

Metadata in HTML

• The Open Graph protocol

Page 19: CAEPIA 2011

2007

2008

2009 2010

Linking Open Data cloud diagram,

by Richard Cyganiak and Anja Jentzsch.

http://lod-cloud.net/

Linked Data

Page 20: CAEPIA 2011

Linked Data

• An example

@prefix dbpedia <http://dbpedia.org/resource/>.

@prefix dbterm <http://dbpedia.org/property/>.

dbpedia:Amsterdam

dbterm:officialName “Amsterdam” ;

dbterm:longd “4” ;

dbterm:longm “53” ;

dbterm:longs “32” ;…

@prefix dbpedia <http://dbpedia.org/resource/>.

@prefix dbterm <http://dbpedia.org/property/>.

dbpedia:Amsterdam

dbterm:officialName “Amsterdam” ;

dbterm:longd “4” ;

dbterm:longm “53” ;

dbterm:longs “32” ;…

http://data.semanticweb.org/person/miriam-fernandez/rdf

<ns1:Person rdf:about="http://data.semanticweb.org/person/miriam-

fernandez">

<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

http://data.semanticweb.org/person/miriam-fernandez/rdf

<ns1:Person rdf:about="http://data.semanticweb.org/person/miriam-

fernandez">

<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

Page 21: CAEPIA 2011

Open Government

• Data.gov

• Data.gov.uk

• Many others…

Research Funding Explorer

Page 22: CAEPIA 2011

BBC

• Programs

• Music

• Artist

• World Cup

Who won it? ;)

Page 23: CAEPIA 2011

Open University

ORO

Archive of

Course

Material

Library’s

Catalogue

Of Digital

Content

OpenLearn

Content

A/V Material

Podcasts

iTunesU

Data from

Research

Outputs

BBC

DBPedia

DBLP

RAE

geonames

data.gov.uk

Currently: OU public

data sit in different

systems – hard to

discover, obtain,

integrate by users.

Exposed as linked

data, our data

interlink with each

other and the external

world: become part

of the “global data

space” on the Web

Page 24: CAEPIA 2011

Data.open.ac.uk

data.open.ac.uk

Page 25: CAEPIA 2011

The Value

• Recognized as a critical step forward for the HE sector in the UK

–Favor transparency and reuse of data, both externally and internally

–Reduces cost of dealing with our own public data

–Enable both new kinds of applications, and to make the ones that are already feasible more cost effective

Page 26: CAEPIA 2011
Page 27: CAEPIA 2011

The Value

• Linking educational material across universities http://smartproducts1.kmi.open.ac.uk/

web-linkeduniversities/index.htm

Page 28: CAEPIA 2011

The Value

• Exploring research communities

Page 29: CAEPIA 2011

The Value

• And many others….

Page 30: CAEPIA 2011

Conclusions

• We have reached the Data Era

–Production: currently more than a Zettabyte of information in the digital world and increasing really fast

–Consumption: syntactic and semantic approaches have emerged to extract the value (the knowledge) out of the data

–Challenges: Provide machines with the capabilities to extract the knowledge for us!

Page 31: CAEPIA 2011

Conclusions

• Many more challenges ahead…

–Different formats (text vs. multimedia)

–Different dynamics (time / location)

–Different provenance

–Different topics (heterogeneous)

–Distributed, Massive, stream

–Various quality

–…

Page 32: CAEPIA 2011

THX!

• Any ideas to make me rich? ☺

=

• Slide_share: http://www.slideshare.net/miriamfs

• Website: http://people.kmi.open.ac.uk/miriam/about/

• Twitter: @miri_fs

Thanks to Fouad Zablith and Mathieu d'Aquin ☺ for sharing with me some of their slides and

for their valuable comments on this presentation