free your data: instant gratification with the semantic web david karger

70
Free your Data: Instant Gratification with the Semantic Web David Karger

Upload: kelley-stevens

Post on 26-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Free your Data: Instant Gratification with the Semantic Web David Karger

Free your Data: Instant Gratification with the

Semantic Web

David Karger

Page 2: Free your Data: Instant Gratification with the Semantic Web David Karger

Why everyone should be their own database administrator,

UI designer, application developer, and

web site builder, and how they can

David Karger

Page 3: Free your Data: Instant Gratification with the Semantic Web David Karger

A Semantic Web Vision

• Autonomous computational agents perform sophisticated information tasks on behalf of their human users

• Use data that is annotated with rich semantics– Ontologies that explain precisely what the data means

– Schema annotations that explain how to align multiple ontologies

– Rules that explain how new data can be formally derived from existing

– Inference systems that put it all together

– Lots of logicians and AI researchers developing tools

• This vision is frightening– Involves solving problems that have bedeviled AI for decades

– Often used to attack the semantic web

– Or to argue to slow down deployment

* “we can’t put up that data until we have an ontology!”

Page 4: Free your Data: Instant Gratification with the Semantic Web David Karger

Aim Lower: the Semimantic Web

• Not “make computers help” but “make them not hinder”– “First, do no harm”

• Create a tiny bit of structure:– Name objects (with URLs)

– Record named relations between them

– No semantics on relations

– No schemas

– No inference

• This is both– Technically simple

– Immediately useful

• You should do it– And you can right now

Page 5: Free your Data: Instant Gratification with the Semantic Web David Karger

Why Applications?

• Typical user tasks require interaction with multiple pieces of information– Display

– Explore

– Query

– Manipulate

• Applications bring together the data, specialized views, and operations necessary to perform tasks

Page 6: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 7: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 8: Free your Data: Instant Gratification with the Semantic Web David Karger

• Irrelevant info– Distracting

– Covers up more important info

• Artist– Of dance, not music

– ID3v2 added “Composer”

– shown in wrong place

• No “difficulty” field– Place in comment field

– Uses field up

– Where put “tempo”?

• Menu of genre choices– My genre (of dance, not music) missing

– ID3v2 lets user add

Page 9: Free your Data: Instant Gratification with the Semantic Web David Karger

Summary of Problems

• Application has fixed idea of “right” data– Both properties and values for them

• And right way to display that data

• User wants to “stretch” the app to their needs– Cannot hide irrelevant data

– Cannot incorporate new kinds of data

– Cannot change how data is presented

• Perhaps just use generic comment field?– Add what you want

– Format how you want

Page 10: Free your Data: Instant Gratification with the Semantic Web David Karger

• Properties have structure– Used for layout

– And for browsing

Page 11: Free your Data: Instant Gratification with the Semantic Web David Karger

Sometimes, one application isn’t enough

• Applications inappropriately partition task– Because task wasn’t planned for in application design

• No application has all the necessary data, operations– Need to launch several to do task

• Each includes unneeded data, operations– Clutter distracts from what you need to see

• Can’t work with data “across” application boundaries– Can’t record or view data connections

– Have to find it again in second application

– Or enter it manually a second time

* Type budget numbers on postits to move to other application

Page 12: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 13: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 14: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 15: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 16: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 17: Free your Data: Instant Gratification with the Semantic Web David Karger

Why?

• Building applications is hard– Done by expert few for the many

– They determine which data, views, operations are useful

• Applications are “mass produced”– Everybody gets the same one

– And only build for large markets

– Word processor, email, photo album, …

• Problem: different people want different applications– Basket weaving. UFO sightings, junkyard management

– Want to work with unusual information

– Want to see, navigate, manipulate it “their way”

• Developers can’t afford to build these boutique applications

Page 18: Free your Data: Instant Gratification with the Semantic Web David Karger

What about the Web?

• Anything can get a URL

• Anything can go in a page, linked to anything– Common to “schematize on the fly”, making lists of interesting

properties/values

• Support for orienteering– Scan list of choices

– Pick the one that seems to lead in the right direction

– Fact: people orienteer even when there’s an easy query that is faster

– On web, never bounce off an application boundary

Page 19: Free your Data: Instant Gratification with the Semantic Web David Karger

Downside

• Hard to author– Especially if I want to record lots of complex data

• Hard to manipulate, do complex queries– HTML loses meaning of data

– Can’t “switch to tabular view”

• That’s why web sites are backed by databases– Data is kept structured to support complex queries

– Templating engines convert to human readable presentation

• End users aren’t going to manage this kind of web site

• Gives powerful operations, but only “inside” web site– User may discover need to cross site boundaries

– Like applications, web sites create (possibly wrong) data partitions

– So all the problems with applications apply here too

Page 20: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 21: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 22: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 23: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 24: Free your Data: Instant Gratification with the Semantic Web David Karger

Not just music

• Scientific research generates masses of data– E.g. Bioinformatics

• Others want to access that data

• Big standards bodies meet to decide on community standard formats and systems under which everyone will distribute data

• When scientist wants to try or report something new, or needs data from outside the community, stuck.

Page 25: Free your Data: Instant Gratification with the Semantic Web David Karger

Information Wants to be Free

• Applications and Web Sites make assumptions about how their data will be used

• Those assumptions are hard-coded into the interaction with the data

• But no developer can predict all uses of the data

• Fixed interfaces prevent data repurposing

• Solution: give direct access to the data

• Just set up a SQL server?– (A long-running screed of the DB community)

Page 26: Free your Data: Instant Gratification with the Semantic Web David Karger

But it Can’t be Just about the Data

• People need to look at the data– (unless we figure out those autonomous agents…)

• And need to create it in the first place

• Apps and template-driven web sets give us nice interfaces for interacting with the data they manage

• But if we use them we can’t repurpose the data

• And what interface can we use for the repurposed data?

• Web needed a server (of data) and a client (to show it)

• How make viewing, authoring and repurposing arbitrary data as easy as viewing and authoring web pages?– Without knowing precisely what data people will want to view or how

they will want to view it?

Page 27: Free your Data: Instant Gratification with the Semantic Web David Karger

Example: Piggy Bank

• I need data from more than one web site

• And I need to look at it differently than any web site

• What is minimum necessary support?

• Piggy Bank: A firefox plugin for navigating structured data

Page 28: Free your Data: Instant Gratification with the Semantic Web David Karger

• Find some movies

Page 29: Free your Data: Instant Gratification with the Semantic Web David Karger

• Free that data

Page 30: Free your Data: Instant Gratification with the Semantic Web David Karger

• Show it a different way

Page 31: Free your Data: Instant Gratification with the Semantic Web David Karger

• Combine it with other sources

Page 32: Free your Data: Instant Gratification with the Semantic Web David Karger

Mash Ups?

• Developer decides to integrate data from multiple sites

• Writes programmatic “scrapers” – reverse the web site’s templating process to recover data

• Combines resulting data structures

• Presents using their own template driven web site– Thus guilty of same sin as the one they are fighting

– I only get the mash-ups a programmer decides to create

• Piggy bank lets end users do their own mashing

Page 33: Free your Data: Instant Gratification with the Semantic Web David Karger

Data Model

Page 34: Free your Data: Instant Gratification with the Semantic Web David Karger

RDF

• W3C standard

• Minimum data model– URL for arbitrary objects

– Arbitrary named links between two objects

– No schemas

• Much like the web, except– URLs need not be web pages

– Machine readable “anchor text” in links

• Yet Powerful– Relations are natural/universal

– Represent a semantic network

Loew’s

Supermantitle

venue

Kendall Sq.

Movie type

location

8PMtime

Theater type

Page 35: Free your Data: Instant Gratification with the Semantic Web David Karger

Are we done?

• Is RDF the only answer?– SQL/Tuples, XML can represent same info

– So any would do

– And user shouldn’t have to know which we’ve chosen

– But RDF is easiest to create sloppily, incrementally

* So best suited to let enthusiasts create some

– And imposes fewest requirements to be “compatible”

• Is RDF the whole answer?– Still unclear how to interact with it

Page 36: Free your Data: Instant Gratification with the Semantic Web David Karger

Visualization

Page 37: Free your Data: Instant Gratification with the Semantic Web David Karger

Lenses

• If data is amorphous, monolithic UI won’t do– Can’t know in advance what kind of data we’ll need to display

– Or what user will want to do with that data

• Let each type come with “view prescription”– “To display a document, show its title, author, and abstract

– “To display a person, show his name and affiliation”

– Specifies properties to show, and “decoration” (fonts, layouts)

• After you get the data, assemble lenses to show it– (recursively)

• Lenses are described in RDF– So they can be collected, repurposed like any other data

Page 38: Free your Data: Instant Gratification with the Semantic Web David Karger

Fresnel

dsp:publicationLens rdf:type :Lens; :classLensDomain ow:Publication; :group gr:group; :purpose :defaultLens; :showProperties ( dc:description dc:identifier dc:creator dc:contributor dc:date dc:subject dc:type dc:publisher dc:rights ) . dsp:rightsFomat rdf:type :Format; :group gr:group; :propertyFormatDomain dc:rights; :propertyStyle "dspace-rights" .

Page 39: Free your Data: Instant Gratification with the Semantic Web David Karger

Benefits

• Data collected from anywhere can be viewed together– Each piece of data with its own lens

• Lenses are described, not programmed– Enthusiasts can write their own

– (especially if we give them wysiwyg tools)

– No need to build a template driven web site

– Just edit, publish some lenses

Page 40: Free your Data: Instant Gratification with the Semantic Web David Karger

Manipulation

Page 41: Free your Data: Instant Gratification with the Semantic Web David Karger

Application Development by End Users

• People want applications to manipulate their data

• But applications only manipulate developer’s data

• So let end users build their own

• Use lenses, but refract in both directions– Lenses describe how to map data to presentation

– Invert, interpret manipulation of presentation as manipulation of data

* (extend lenses to talk about click, drag, drop)

• Operations represented as web services– Internal and remote operations

– Receive RDF data and act on it

Page 42: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 43: Free your Data: Instant Gratification with the Semantic Web David Karger

The Big Picture

Page 44: Free your Data: Instant Gratification with the Semantic Web David Karger

Sufficient for Nice Applications?

• Application design is impoverished– Divide up the screen

– Put an object in each piece

– Show properties of each object

– With pretty formatting

– Put operations in menus

– And add some toolbars to save time

• This application “vocabulary” is limited enough – to be described instead of programmed

– so it can be edited by end users

Page 45: Free your Data: Instant Gratification with the Semantic Web David Karger

Workspace Designer

• Editing mode for applications

• Define regions of screen– By splitting existing regions

• Resize Regions

• Specify content of each region– Object to be shown (drag and drop object)

– Lens to use to show object (menu of relevant lens)

– Operations to make available on object (drag operations)

Page 46: Free your Data: Instant Gratification with the Semantic Web David Karger

Writing a Brain Research Paper

Page 47: Free your Data: Instant Gratification with the Semantic Web David Karger

Adding “Things to Do” Region

Page 48: Free your Data: Instant Gratification with the Semantic Web David Karger

Revised Application

Page 49: Free your Data: Instant Gratification with the Semantic Web David Karger

Lens Designer

• Specify how a particular object can be shown

• Similar to workspace designer– Lens is “workspace” for viewed object

• Subdivide canvas

• Specify property to show in each region

• Specify lens for value of each property

Page 50: Free your Data: Instant Gratification with the Semantic Web David Karger

5050

Drug Discovery Dashboardhttp://www.w3.org/2005/04/swls/BioDash

Topic: GSK3beta Topic

Target: GSK3beta

Disease: DiabetesT2

Alt Dis: Alzheimers

Cmpd: SB44121

CE: DBP

Team: GSK3 Team

Person: John

Related Set

Path: WNT

Page 51: Free your Data: Instant Gratification with the Semantic Web David Karger

5151

•Lenses can aggregate, accentuate, or even analyze new result sets

• Behind the lens, the data can be persistently stored as RDF-OWL

• Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references

Bridging Chemistry and Molecular Bridging Chemistry and Molecular BiologyBiology

Page 52: Free your Data: Instant Gratification with the Semantic Web David Karger

5252

Pathway PolymorphismsPathway Polymorphisms

•Merge directly onto pathway graph

•Identify targets with lowest chance of genetic variance

•Predict parts of pathways with highest functional variability

•Map genetic influence to potential pathway elements

•Select mechanisms of action that are minimally impacted by polymorphisms

Non-synonymous polymorphisms from db-SNP

Page 53: Free your Data: Instant Gratification with the Semantic Web David Karger

5353

Clinical DashboardClinical Dashboard• Gene Expression Data

• Additional relations and aspects can be defined additionally: Mendelian Index of Man

Diseased Tissue

Links to OMIM (RDF)

Page 54: Free your Data: Instant Gratification with the Semantic Web David Karger

5454

Bar View Lens for Gene Bar View Lens for Gene ExpressionExpression

Page 55: Free your Data: Instant Gratification with the Semantic Web David Karger

5555

ClinDash: Clinical Trials ClinDash: Clinical Trials BrowserBrowser

Clinical Obs

Expression Data

Subjects

•Values can be normalized across all measurables (rows)

•Samples can be aligned to their subjects using RDF rules

•Clustering can now be done over all measureables (rows) and types

Page 56: Free your Data: Instant Gratification with the Semantic Web David Karger

Shattering Applications

• Specific lenses may be too complex for end users to create

• But end users can– Assemble these lenses into “applications”

– Decide at which data these lenses point

• Current application developers can build those views– Much more modular

– Instead of building whole application, just build a lens and add to pool

– Repurposable lenses for repurposable data

• Simpler views can be built by non programmers– Embedding the complex lenses as subparts

Page 57: Free your Data: Instant Gratification with the Semantic Web David Karger

Sharing

Page 58: Free your Data: Instant Gratification with the Semantic Web David Karger

Semantic Bank

• Tools directly collect and manipulate RDF– So sharing just requires publishing the RDF back

• Semantic Bank is just a big RDF repository– GET a resource to fetch the (XML encoding of) RDF about it

– Similarly, upload an XML encoding of the RDF:

* POST /semantic-bank/foo?command=upload&format=rdfxml HTTP/1.1Host: bank.example.orgContent-Length: 317

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about="http://www.example.org/ns#item12345">  <rdfs:label>An Example</rdfs:label>  <rdf:type rdf:resource="http://www.example.org/ns#Thing"/> </rdf:Description></rdf:RDF>

Page 59: Free your Data: Instant Gratification with the Semantic Web David Karger

Getting There

Page 60: Free your Data: Instant Gratification with the Semantic Web David Karger

What’s wrong?

• It seems obvious: RDF lets anyone– Ignore web site and application boundaries

– Gather data they need

– Define their own new attributes and relationships

– Look at it the way that the need

– Manipulate it

– Publish it back for others to use it, without having to manage a web site

• So why don’t we already have it?

Page 61: Free your Data: Instant Gratification with the Semantic Web David Karger

Cost of Getting Started?

• Web:– Download/run a web server (hardest part, happens only once)

– Download a web browser

– Write a web page

• Semantic Web– Install database, define schemas

– Add middleware layer

– Create templating engines

– Develop ontolgies, data import protocols

– …

• Semimantic web– Post some rdf (written in n3) to a semantic bank

– Install piggybank

Page 62: Free your Data: Instant Gratification with the Semantic Web David Karger

Absence of Schemas?

• What good is it to put up RDF without explaining all the properties?

• What happens when different people put up “mismatched” data with different (explicit or implicit) schemas?

• What if there are multiple URLs for the same thing, with inconsistent statements about them?

• How can I use data I collected from somewhere else, if it doesn’t have the same schema as mine?

• But designing schemas is hard– Requires big committees, lots of meetings, deliberation, buy-in

Page 63: Free your Data: Instant Gratification with the Semantic Web David Karger

Data First, Schema later (if ever)

• Need for schemas is a fallacy, blocking progress

• Each site is likely consistent with itself

• And will likely “go with the crowd” and be consistent with others

• If not, let users (not machines) translate– Mapping properties to properties

– As needed, from site to site

* (or site to personal repository)

– Typically only need to blend a few sites

Page 64: Free your Data: Instant Gratification with the Semantic Web David Karger
Page 65: Free your Data: Instant Gratification with the Semantic Web David Karger

There’s no RDF?

• Database backed servers can easily expose RDF, if they want to– E.g., citeseer.csail.mit.edu

– Import into piggy bank

– Browse, query, search in interesting ways

– Maintain collections of references

• If server won’t cooperate, scrape– Piggy bank has a scraper repository

– One person writes scraper, everyone uses

– Or, one scrapes and publishes to semantic bank, others get from bank

– Also unsupervised machine learning approaches

Page 66: Free your Data: Instant Gratification with the Semantic Web David Karger

Clogs and Plogs

• Much blogging is about recycling content

• Clogs (Content Blogs) can manually merge data – Blogger locates sources of data that ought to be in their schema

– Invests work to align properties and instances

– Publishes resulting single (schema unified) blob of data

– No front end

• Plogs (Presentation Blogs) display data– Develop interesting lenses

– Point them at clogger content

– Someone else’s back end

• Separate front and back ends into different web sites

Page 67: Free your Data: Instant Gratification with the Semantic Web David Karger

Chicken and Egg

• RDF-aware clients useless without data, and vice versa

• What can prime the pump?

Page 68: Free your Data: Instant Gratification with the Semantic Web David Karger

Research Projects

• Many of our projects generate interesting data

• Then present through one interface– Eg NLP, speech

• Instead, post it to the semantic bank– Others will find new uses for the data

• Other projects consume data– Get it from the bank

• Let’s talk…

Page 69: Free your Data: Instant Gratification with the Semantic Web David Karger

Conclusion

• We have the tools to separate data from presentation– RDF repositories

– Lenses to display arbitrary data in arbitrary combination

• Doing so would offer substantial benefits– Application barriers go away

– Anyone can create interesting content

– People can repurpose it to their own specific needs

• Semantic Web can be lightweight– Low cost of deployment

– Immediate benefit

– All we need do is ignore semantics

Page 70: Free your Data: Instant Gratification with the Semantic Web David Karger

• Haystack.csail.mit.edu

• Simile.mit.edu