caepia 2011

Post on 09-May-2015

352 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides for my keynote at the KEESOS workshop http://ir.ii.uam.es/keesos2011/, CAPEIA 2011

TRANSCRIPT

The Data Era: Production, Consumption, Challenges

Miriam Fernández8th November, CAEPIA 2011

Website: http://people.kmi.open.ac.uk/miriam/about/

Twitter: @miri_fs

Slide_share: http://www.slideshare.net/miriamfs

What is … ?

How do humans infer knowledge?

Alejandro

in Chicago!

Syntactic interpretation

Semantic interpretation

A picture!

How do machines infer knowledge?

Syntactic interpretation

Semantic interpretation

A picture!

The Challenge

• We need to find the way in which machines will interpret and extract knowledge for us!

=

The Challenge

The Data Era

• The 2011 Digital University Study: Extracting Value from Chaos (IDC)

–We have entered the Zettabyte era (a trillion gigabytes or a billion terabytes)

–The great of information growth appears to be exceeding Moore’s Law

http://www.emc.com/collateral/demos/

microsites/emc-digital-universe-

2011/index.htm

Big Value from Data

• Big Data: The next frontier for innovation, competition and productivity (McKinsey)

–$300 billion potential annual value to US health care

–€250 billion potential annual value to Europe’s public sector administration

http://www.mckinsey.com/mgi/publications/big_data/pdfs/M

GI_big_data_full_report.pdf

IBM City Forward

The Smarter Cities Challenge is a competitive grant program

awarding $50 million worth of IBM expertise over the next three

years to 100 cities around the globe. Designed to address the

wide range of challenges facing cities today

Consumption

• We need to provide efficient ways to consume data in order to extract the value out of it, the knowledge

–Syntactic approaches (visual analytics)• The data is collected, centralized and analysed

• Visualizations for humans to extract knowledge

–Semantic approaches• The information is distributed / interlinked

• Semantic structures are added to the data so that machines can better understand it

Syntactic approaches

• Some examples

–Gap Minder

– IBM many eyes

–Google Public Data Explorer

–Google correlate

–Google N-Gram viewer•What is the most popular hair

colour in the literature?

Google N-Gram Viewer

Semantic approaches

• The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation

Tim Berners-Lee, James Hendler,

Ora Lassila, The Semantic Web,

Scientific American, May 2001

The SW vision

• Use semantic structures (ontologies) to represent data. Provide machines with the ability to interpret and extract knowledge

=

Adding Structure

• Two paths towards the SW vision

–Metadata embedded in HTML •Microformats

• RDFa

•Microdata

–Linked Data• Putting the data online in a standard, web enabled representation (RDF)

•Make the data Web addressable (URIs)

Metadata in HTML

• An exampleKnowledge Media Institute

Walton Hall

Milton Keynes

MK7 6AA

<div class="vcard">

<div class="fn org">Knowledge Media Institute</div>

<div class="adr">

<div class="street-address">Walton Hall</div>

<div>

<span class="locality">Milton Keynes</span>,

<span class="postal-code">MK7 6AA</span>

</div>

<div class="country-name">United Kingdom</div>

</div>

</div>

Metadata in HTML

• Schema.org

Semantically enhanced Information Retrieval:

an ontology-based approach

http://people.kmi.open.ac.uk/miriam/about/

Metadata in HTML

• The Open Graph protocol

2007

2008

2009 2010

Linking Open Data cloud diagram,

by Richard Cyganiak and Anja Jentzsch.

http://lod-cloud.net/

Linked Data

Linked Data

• An example

@prefix dbpedia <http://dbpedia.org/resource/>.

@prefix dbterm <http://dbpedia.org/property/>.

dbpedia:Amsterdam

dbterm:officialName “Amsterdam” ;

dbterm:longd “4” ;

dbterm:longm “53” ;

dbterm:longs “32” ;…

@prefix dbpedia <http://dbpedia.org/resource/>.

@prefix dbterm <http://dbpedia.org/property/>.

dbpedia:Amsterdam

dbterm:officialName “Amsterdam” ;

dbterm:longd “4” ;

dbterm:longm “53” ;

dbterm:longs “32” ;…

http://data.semanticweb.org/person/miriam-fernandez/rdf

<ns1:Person rdf:about="http://data.semanticweb.org/person/miriam-

fernandez">

<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

http://data.semanticweb.org/person/miriam-fernandez/rdf

<ns1:Person rdf:about="http://data.semanticweb.org/person/miriam-

fernandez">

<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

Open Government

• Data.gov

• Data.gov.uk

• Many others…

Research Funding Explorer

BBC

• Programs

• Music

• Artist

• World Cup

Who won it? ;)

Open University

ORO

Archive of

Course

Material

Library’s

Catalogue

Of Digital

Content

OpenLearn

Content

A/V Material

Podcasts

iTunesU

Data from

Research

Outputs

BBC

DBPedia

DBLP

RAE

geonames

data.gov.uk

Currently: OU public

data sit in different

systems – hard to

discover, obtain,

integrate by users.

Exposed as linked

data, our data

interlink with each

other and the external

world: become part

of the “global data

space” on the Web

Data.open.ac.uk

data.open.ac.uk

The Value

• Recognized as a critical step forward for the HE sector in the UK

–Favor transparency and reuse of data, both externally and internally

–Reduces cost of dealing with our own public data

–Enable both new kinds of applications, and to make the ones that are already feasible more cost effective

The Value

• Linking educational material across universities http://smartproducts1.kmi.open.ac.uk/

web-linkeduniversities/index.htm

The Value

• Exploring research communities

The Value

• And many others….

Conclusions

• We have reached the Data Era

–Production: currently more than a Zettabyte of information in the digital world and increasing really fast

–Consumption: syntactic and semantic approaches have emerged to extract the value (the knowledge) out of the data

–Challenges: Provide machines with the capabilities to extract the knowledge for us!

Conclusions

• Many more challenges ahead…

–Different formats (text vs. multimedia)

–Different dynamics (time / location)

–Different provenance

–Different topics (heterogeneous)

–Distributed, Massive, stream

–Various quality

–…

THX!

• Any ideas to make me rich? ☺

=

• Slide_share: http://www.slideshare.net/miriamfs

• Website: http://people.kmi.open.ac.uk/miriam/about/

• Twitter: @miri_fs

Thanks to Fouad Zablith and Mathieu d'Aquin ☺ for sharing with me some of their slides and

for their valuable comments on this presentation

top related