Download - CAEPIA 2011
![Page 1: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/1.jpg)
The Data Era: Production, Consumption, Challenges
Miriam Fernández8th November, CAEPIA 2011
Website: http://people.kmi.open.ac.uk/miriam/about/
Twitter: @miri_fs
Slide_share: http://www.slideshare.net/miriamfs
![Page 2: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/2.jpg)
What is … ?
![Page 3: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/3.jpg)
How do humans infer knowledge?
Alejandro
in Chicago!
Syntactic interpretation
Semantic interpretation
A picture!
![Page 4: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/4.jpg)
How do machines infer knowledge?
Syntactic interpretation
Semantic interpretation
A picture!
![Page 5: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/5.jpg)
The Challenge
• We need to find the way in which machines will interpret and extract knowledge for us!
=
![Page 6: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/6.jpg)
The Challenge
![Page 7: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/7.jpg)
The Data Era
• The 2011 Digital University Study: Extracting Value from Chaos (IDC)
–We have entered the Zettabyte era (a trillion gigabytes or a billion terabytes)
–The great of information growth appears to be exceeding Moore’s Law
http://www.emc.com/collateral/demos/
microsites/emc-digital-universe-
2011/index.htm
![Page 8: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/8.jpg)
Big Value from Data
• Big Data: The next frontier for innovation, competition and productivity (McKinsey)
–$300 billion potential annual value to US health care
–€250 billion potential annual value to Europe’s public sector administration
http://www.mckinsey.com/mgi/publications/big_data/pdfs/M
GI_big_data_full_report.pdf
![Page 9: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/9.jpg)
IBM City Forward
The Smarter Cities Challenge is a competitive grant program
awarding $50 million worth of IBM expertise over the next three
years to 100 cities around the globe. Designed to address the
wide range of challenges facing cities today
![Page 10: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/10.jpg)
Consumption
• We need to provide efficient ways to consume data in order to extract the value out of it, the knowledge
–Syntactic approaches (visual analytics)• The data is collected, centralized and analysed
• Visualizations for humans to extract knowledge
–Semantic approaches• The information is distributed / interlinked
• Semantic structures are added to the data so that machines can better understand it
![Page 11: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/11.jpg)
Syntactic approaches
• Some examples
–Gap Minder
– IBM many eyes
–Google Public Data Explorer
–Google correlate
–Google N-Gram viewer•What is the most popular hair
colour in the literature?
![Page 12: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/12.jpg)
Google N-Gram Viewer
![Page 13: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/13.jpg)
Semantic approaches
• The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation
Tim Berners-Lee, James Hendler,
Ora Lassila, The Semantic Web,
Scientific American, May 2001
![Page 14: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/14.jpg)
The SW vision
• Use semantic structures (ontologies) to represent data. Provide machines with the ability to interpret and extract knowledge
=
![Page 15: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/15.jpg)
Adding Structure
• Two paths towards the SW vision
–Metadata embedded in HTML •Microformats
• RDFa
•Microdata
–Linked Data• Putting the data online in a standard, web enabled representation (RDF)
•Make the data Web addressable (URIs)
![Page 16: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/16.jpg)
Metadata in HTML
• An exampleKnowledge Media Institute
Walton Hall
Milton Keynes
MK7 6AA
<div class="vcard">
<div class="fn org">Knowledge Media Institute</div>
<div class="adr">
<div class="street-address">Walton Hall</div>
<div>
<span class="locality">Milton Keynes</span>,
<span class="postal-code">MK7 6AA</span>
</div>
<div class="country-name">United Kingdom</div>
</div>
</div>
![Page 17: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/17.jpg)
Metadata in HTML
• Schema.org
Semantically enhanced Information Retrieval:
an ontology-based approach
http://people.kmi.open.ac.uk/miriam/about/
![Page 18: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/18.jpg)
Metadata in HTML
• The Open Graph protocol
![Page 19: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/19.jpg)
2007
2008
2009 2010
Linking Open Data cloud diagram,
by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/
Linked Data
![Page 20: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/20.jpg)
Linked Data
• An example
@prefix dbpedia <http://dbpedia.org/resource/>.
@prefix dbterm <http://dbpedia.org/property/>.
dbpedia:Amsterdam
dbterm:officialName “Amsterdam” ;
dbterm:longd “4” ;
dbterm:longm “53” ;
dbterm:longs “32” ;…
@prefix dbpedia <http://dbpedia.org/resource/>.
@prefix dbterm <http://dbpedia.org/property/>.
dbpedia:Amsterdam
dbterm:officialName “Amsterdam” ;
dbterm:longd “4” ;
dbterm:longm “53” ;
dbterm:longs “32” ;…
http://data.semanticweb.org/person/miriam-fernandez/rdf
<ns1:Person rdf:about="http://data.semanticweb.org/person/miriam-
fernandez">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
http://data.semanticweb.org/person/miriam-fernandez/rdf
<ns1:Person rdf:about="http://data.semanticweb.org/person/miriam-
fernandez">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
![Page 21: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/21.jpg)
Open Government
• Data.gov
• Data.gov.uk
• Many others…
Research Funding Explorer
![Page 22: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/22.jpg)
BBC
• Programs
• Music
• Artist
• World Cup
Who won it? ;)
![Page 23: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/23.jpg)
Open University
ORO
Archive of
Course
Material
Library’s
Catalogue
Of Digital
Content
OpenLearn
Content
A/V Material
Podcasts
iTunesU
Data from
Research
Outputs
BBC
DBPedia
DBLP
RAE
geonames
data.gov.uk
Currently: OU public
data sit in different
systems – hard to
discover, obtain,
integrate by users.
Exposed as linked
data, our data
interlink with each
other and the external
world: become part
of the “global data
space” on the Web
![Page 24: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/24.jpg)
Data.open.ac.uk
data.open.ac.uk
![Page 25: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/25.jpg)
The Value
• Recognized as a critical step forward for the HE sector in the UK
–Favor transparency and reuse of data, both externally and internally
–Reduces cost of dealing with our own public data
–Enable both new kinds of applications, and to make the ones that are already feasible more cost effective
![Page 26: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/26.jpg)
![Page 27: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/27.jpg)
The Value
• Linking educational material across universities http://smartproducts1.kmi.open.ac.uk/
web-linkeduniversities/index.htm
![Page 28: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/28.jpg)
The Value
• Exploring research communities
![Page 29: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/29.jpg)
The Value
• And many others….
![Page 30: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/30.jpg)
Conclusions
• We have reached the Data Era
–Production: currently more than a Zettabyte of information in the digital world and increasing really fast
–Consumption: syntactic and semantic approaches have emerged to extract the value (the knowledge) out of the data
–Challenges: Provide machines with the capabilities to extract the knowledge for us!
![Page 31: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/31.jpg)
Conclusions
• Many more challenges ahead…
–Different formats (text vs. multimedia)
–Different dynamics (time / location)
–Different provenance
–Different topics (heterogeneous)
–Distributed, Massive, stream
–Various quality
–…
![Page 32: CAEPIA 2011](https://reader033.vdocuments.net/reader033/viewer/2022060108/554d96f3b4c90567188b5559/html5/thumbnails/32.jpg)
THX!
• Any ideas to make me rich? ☺
=
• Slide_share: http://www.slideshare.net/miriamfs
• Website: http://people.kmi.open.ac.uk/miriam/about/
• Twitter: @miri_fs
Thanks to Fouad Zablith and Mathieu d'Aquin ☺ for sharing with me some of their slides and
for their valuable comments on this presentation