exhibit lightweight structured data publishing david huynh + david karger + rob miller mit computer...
TRANSCRIPT
exhibitlightweight structured data publishing
david huynh + david karger + rob miller
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY
2
good ol’ days ... early 1990s
3
sort
filter
search
4
5
early 1990s → 2007
6
PRESENTATIONHTML
Web Browser
File System
Static Files
Web Server
ImagesJavascript CSS
7
PRESENTATION
DATA
HTML
Web Browser
Database
File System
Static Files
Web Server
Images
MySQL / Postgres / Oracle
Javascript CSS
8
PRESENTATION
LOGIC
DATA
HTMLJavascript CSS
XML XSLT
SQL
Web Browser
XmlHttp
Database
File System
Static Files
Application Server Web Server
ASP
ASP.NETCGI
JSP/Java
PHP
Images
MySQL / Postgres / Oracle
9
publishing data is hard.
10
can Semantic Web technologies help?
11
duh!
obviously!
SW technologies are supposed to help!
12
what people want
what SW lets them do
sortfilter
search
13
14
outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min
•implementation: how Exhibit works•real world uses + discussion•related work•future work•conclusion
15
PRESENTATION
LOGIC
DATA
HTMLJavascript CSS
XML XSLT
SQL
Web Browser
XmlHttp
Database
File Systems
Static Files
Application Server Web Server
ASP
ASP.NETCGI
JSP/Java
PHP
Images
MySQL / Postgres / Oracle
16
PRESENTATION
LOGIC
DATA
HTMLJavascript CSS
XML XSLT
SQL
Web Browser
XmlHttp
Database
File Systems
Static Files
Application Server Web Server
ASP
ASP.NETCGI
JSP/Java
PHP
Images
MySQL / Postgres / Oracle
17
PRESENTATION
LOGIC
DATA
HTML
Javascript CSS
Web Browser
XmlHttp
File Systems
Static Files
Web Server
Images
Exhibit API
18
data
Exhibit API
database
expression languagelocalization
imagescss
viewslens
templatefacets exporters
importers
HTML+
Images+
CSS+JS
web browser
HTML
JS
DOM dataexports
<script src= “...... /exhibit-api.js”></script>
19
data
Exhibit API
web browser
presentation
sorting filtering maps
timelines
my users
20
•JSON as default format
•http:// simile . mit . edu / babel /•Bibtex•Excel spreadsheets•Tab separated values•RDF/XML, N3
•Dynamic importers
data formats
JSON files
21
22
JSONP data feedgdata.io.handleScriptLoaded({ ... "entry": [ { "id":{ "$t":"http://spreadsheets.google.com/feeds/list/.../od6/public/basic/cokwr" }, "updated":{"$t":"2007-04-16T18:41:56.378Z"}, "category":[ { "scheme":"http://schemas.google.com/spreadsheets/2006", "term":"http://schemas.google.com/spreadsheets/2006#list" } ], "title": { "type":"text", "$t":"Lord of the Rings: The Return of the King" }, "content": { "type": "text", "$t": "{type}: Movie, {genre}: Drama; Epic, {plot:single}: The former Fellowship of the Ring prepare for the final battle for Middle Earth, while Frodo \u0026 Sam approach Mount Doom to destroy the One Ring., {rating:number}: 4" }, ... }, ... ]})
23
•Javascript is slow, not designed for implementing DBs
•Recommended for < 500 items•Some people have been brave: 2733
items or more
•Not a limitation per se•Exhibit is intended for small data sets
scalability
24
outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min
✓implementation: how Exhibit works•real world uses + discussion•related work•future work•conclusion
25
26
27
28
29
30
31
32
33
oops!
34
35
36
37
38
39
someone is planning a wedding using Exhibit
40
presentationscompany members
software toolsrestaurants3 recipes
radio albumsinstalled fonts
hotels near a dance eventdogs for adoption
lego setsdances, costumes, performances
breweries and distillerieskansai dialect field study data
world conflictswedding attendees
41
presentationscompany members
software toolsrestaurants3 recipes
radio albumsinstalled fonts
hotels near a dance eventdogs for adoption
lego setsdances, costumes, performances
breweries and distillerieskansai dialect field study data
world conflictswedding attendees
If Semantic Web researchers were tobuild a web site with data,
what topic would the data be about?
42
scientific papers
43
pub
licati
on
s
The Long Tail
information topics
quantity or
popularitymerchandises
moviesphotos
newsevents
software
lego setsisrael folk dance videos
breweries and distilleries
in Ontario 1914 - 1915
free laborin addition to grad students
✓ ✓dormant data publishers
44
reuse withoutscraping
have fun!
45
outline✓problem: publishing data is too hard✓demo: using Exhibit to publish data in 10 min
✓implementation: how Exhibit works✓real world uses + discussion•related work•future work•conclusion
46
HTML
Ruby on Rails
Flickr
Google BaseDabbleDBFreeBase
customizedSemantic MediaWiki extension
wiki, blog
Semantic MediaWiki extension
circle size = amount of effort
flexibility of presentation
flexibility of data
modeling
Related Work
custom 3-tier web app
Exhibit
47
Exhibit
personal
Semantic MediaWiki extension
group world
Freebase
data ownership
personal blogpersonal web space
wiki Wikipedia
unstructured
structured
Related Work
DBPediaYAGO
DabbleDB Google Base
48
•database in Javascript•TimBL’s Tabulator
•generic browsing interface•for data consumers to do mash-up
•Exhibit•customizable publishing framework
•for data publishers
related work
49
•feature requests•more views: calendar, histogram, ...•more flexible layouts•visual synchronization, e.g., color
coding•value formats, e.g., $(6,000)•localization
•if there will be a lot of exhibits, let people...•search over them•merge them together
future work
50
•authoring interface•HTML got us so far...•WYSIWYG editors got us further
•Exhibit will get us so far...•A front-end to Exhibit will get us
further
future work
51
conclusion•many dormant data publishers in the long tail
•... with few resources to publish data
•Exhibit•answer real world needs of publishing data
•as easy and expressive as HTML•tap the free labor in the long tail
•produce data that doesn’t have to be scraped
•build a Data Web representative of the Web
52
google for “exhibit”