david karger ing. the question how can regular users better manage information?

82
David Karger Qing

Upload: alexia-morris

Post on 29-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Masses' Data: Helping Regular People Communicate on the Web through Rich Interactive Data Visualizations

David KargerQing The QuestionHow can regular users better manage information?The ContextDocumentsTotal flexibility to create and viewBut no structured data processingApplicationsStructure dataPowerful interfaces optimized to specific tasksBut only for fixed, developer-chosen schemas and tasksSpreadsheetsArbitrary data structuringBut terrible UI for specific tasks

Goal: end user tools with the data flexibility of spreadsheets, the interface/processing power of applications, and the easy visual flexibility of documentsThe Microsoft AngleStart with Bings structured data repositoriesLet end users to create their own interfaces to that dataTuned to a specific taskSurfacing entities and properties they care aboutBut reflecting personal tasteFor personal applicationsBuild my own vacation plannerPresent my favorite art from museums world wideAnd businessMake my own storefront catalogFeed users data work back into BingSome Web History5

good old days ... early 1990sSteve Balmer

6

BlogForumWiki7

19908

sortfiltersearchtodaytemplate9

today10BifurcationProfessional web sites have evolvedrich visualizationspowerful interactive exploration and navigationPlain user web sites havent changed11Why?Professional sites implement a rich data modelInformation stored in databasesExtracted using complex queriesFed into templating web servers to create human readable contentRich structure supports rich interactionRich, informative visualizationsFiltering and Sorting Result: fancy, lively web 2.0 sitesPlain authors left behindCant install/operate/define a databaseCant write the queries to extract the dataLimited to unstructured text pages (even in blogs and wikis)Less power to communicate effectivelyLess interest in publishing data12

Plain authors left behind sortfiltersearch13Content CarriersSites designed to hold content of a specific typePhotos on FlickrVideos on YoutubeRecipes on EpicuriousBook reviews on AmazonFriend lists and interests on FacebookData models and interfaces specialized to that type of dataDevelopers define schemas, templates, workflows, etc.Plain users can Contribute data to these content carrier repositoriesBenefit from structure when exploring/consuming that data14Content Carriers Constrain CreativityI have to publish their wayWhat if I dont like their schema/theme/layout/organization?How can my wife show her books sorted chronologically by birthdate of the author?How can I let people filter my folk dance video collection by choreographer, tempo, and year choreographed?How does a biologist display his paradigm-changing gene taxonomy?And theres no carrier for the really unusual stuffWhere to put UFO sightings, sock collections, sea glass, roman coin mints, early 20th Century Canadian Taverns...?15Even Worse Between SitesContent carriers are vertical data silosI get rich interaction with data on one siteBut what if I am interested in its connections to data on another site?Neither web site understands the others dataNeither can offer good interaction with the combined dataResponse: MashupsSomeone finds multiple web sites with info they wantwrites programs to scrape (extract) data from each sitewrites programs to merge data from multiple sitesprograms new (database backed) web site to display merged dataRequires programming and managing a web siteResult: another vertical web site

16The IdealDemocratize creation of rich data interactionAnyone should be able toCreate interesting dataOr, find data on multiple web sites and combine itCreate compelling, useful presentations of that dataWith rich visualization and interactionShare it easily with everyone else on the webAll without knowing How to programHow to install a databaseWhat a schema is17HOW?18Most of the Web is CRUDMost of what happens is direct manipulation of informationCreate information according to some modelRead/explore/visualize/navigate using rich interfacesUpdate using editing interfacesDeleteTrue even on professional web sitesFlickr, Youtube, Epicurious, Amazon, FacebookSites are dumb storageComputation is left to the human users

Large payoff to democratizing just this much power

19ApproachPublishing data is easyJust put a spreadsheet onlineIdentify key elements of interactive data visualizationsAdd them to the HTML document vocabularySo they can be inserted like images or videos todayConfigure them by binding them to underlying dataLike charts in a spreadsheet20

sortfiltersearchtemplate21

Image22

Data23

DataItems (Recipes)Each has propertiesTitleSource magazinePublication dateRatingIngredientsPublish a spreadsheetOne item per rowColumns for propertiesTemplateFormat per item24

ViewsAggregate a collectionSortable list (here)MapTimelineBar chartThumbnail setBound to propertiesWhich property to sort by?Which property to plot by?

25

FacetsWay to filter a collectionSpecify some propertyE.g. ingredientUser clicks to pick someCollection restricted to items that matchAlso text search

26Key Primitives of a Data PageDataA spreadsheetTemplatesExplain how to display a single itemBy describing what properties should be shown, and howViewsWays of looking at collections of itemsLists, Thumbnails, Maps, ScatterplotsSpecify which properties determine layoutFacetsElements for filtering or sorting information based on its structure27Migration to the Web

Text searchFaceted BrowsingSorting by PropertiesTemplated Items

Migration to the WebText searchFaceted BrowsingSorting by PropertiesTemplated Items

Text searchFaceted BrowsingSorting by PropertiesTemplated Items

Text searchFaceted BrowsingSorting by PropertiesTemplated Items

Text searchFaceted BrowsingSorting by PropertiesTemplated Items

Text searchFaceted BrowsingSorting by PropertiesTemplated Items

Text searchFaceted BrowsingSorting by PropertiesTemplated Items

Text searchFaceted BrowsingSorting by PropertiesTemplated ItemsCan people author these?Data?SpreadsheetsViews?Spreadsheet chartsSpecify which columns play which roles in viewFacets?Like viewsSpecify which column to filter onAvailable in ExcelTemplates?Document templates in MSWord

They just arent doing it on the web yet36ExhibitProof-of-concept implementation37ExhibitAn interactive web site from static filesOne file for data --- spreadsheet or CSV, RDF, XML, JSON, One for presentation --- HTMLExtend HTML vocabulary Lens tags for showing data itemsView tags for laying them outFacet tags for searching, filtering, sortingLink to a Javascript library that makes it all workNothing to install or configureAll runs in visitors browser38Demo39

40

41

42

43

44

45

46

47

48

49ScalabilityJavascript is slow, not designed for implementing DBs

Recommended for < 500 itemsOne person used 2733 items

Not a limitation per sePlenty of small data setsIf became part of browser, scale much largerTypical web page today may be 2Mb50,000 data items easy50Incentivizing DataA data-centric web page is betterMore effective communication Easier to maintain (like CSS)Creates enthusiasm for working with dataData is exposed as a side effectEnabling reuseAlternative visualizationsCritiquesSelfish incentives lead to global benefit51

52

53

54

oops!Authoring by CopyingHTML describes visualizationCopy it, change the data(Maybe change the presentation too)55Exhibit and IPEThe Virtuous CycleExhibit Authoring Interfaces58Wibit: Exhibit in a WikiStart with Semantic MediawikiMediawiki (Wikipedia platform) extension for structured dataInfobox contents go into a databaseWikitext syntax for querying the databaseResults are embedded as a table in the page containing the queryEnrich with ExhibitSMW already had results printer for various table formatsShove in Exhibit as other formatsUser specifies views, facets in wikitextReuse preexisting infobox template system for lensesPlay here:http://projects.csail.mit.edu/wibit/59DatapressWordpress pluginUpload or link to dataSpreadsheet, JSON.Then WYSYWIG your visualizationUsing usual Wordpress blog post editor

60

61Show of hands

61

WordPressHeres whats great about WordPress, and blogging platforms in general. They take a technical task, like crafting HTML, configuring a web server, and managing remote files, and they turn it into something we all know how to do.

And they dont just offer text, they offer ways to make our text more effective: styling, such as boldface, italics.And media objects: youtube videos, MP3 files, pictures.

62

WordPress + datapressLoop through a bunch of pictures of bloggers using plain exhibits. 6364

List of financial programs in the US.6465

Guide to vegitarian restaurants in Glasgow6566

67

cvx68

69

Tetherless world Wiki from Jie Bao.

I want to read what he wrote:

Actually, I have thought about visualizing ISWC data using Exhibit, but didnt get time (or too lazy) to program.69DIDO --- Data Integrated Active DocumentRich view of contentEdit it in the documentData AND visualizationBoth stored in documentWYSIWYGSave the resultEmail to a friendCheck into SVN repositoryPut on your web sitehttp://projects.csail.mit.edu/exhibit/Dido

70Other research my group doesBut Wait! Theres More!SummaryAtomate: Automate information tasks using structured data RSS feeds Listit: Dealing with information scraps that dont fit anywhereFeedme:Getting your friends to filter your information for youNb:Collaborative lecture note annotation/discussion72Atomate

music listened to

running

sleep desktop activityphysical locationsevents

documents

messages

travels

friends/enemies74Wouldnt it be great if computers coulduse all this information to do stuff for us?75Examplesremind me to take out the trash when I get home on Tuesdays...

bug my friend who hasnt replied to me in 2 days...

send me my grocery shopping list when I arrive at the grocery store

remind friends about an event I am going to attend

text me important emails when I am traveling

761. a way for users to express: what they want to happen, and when, in terms of predicates relatingthe states and properties of people, places + things in their world.

2. a way to retrieve and interpret data from our many heterogeneous web sources as descriptions of these familiar people, places and things.ATOM/RSS/REST APIs, End-user mashups + RDFControlled Natural Language Interface (CNLI) for RulesactionsconditionspredicatespropertiesentitiesWhat we Need77New OpportunityIdea of agent-based automation is oldRSS + Social networks are newKey idea: a standard for dissemination of structured dataDatapress already hinted at structured data feedsMany other sources of (potentially) structured dataEntities with properties and values is tractable for regular user rule-authoringThis becomes key infrastructure for creating those automated agentsAtomate (our Auron)

ConclusionSeparate data from presentationData filesHTML styling vocabulary for interactive visualizationDoing so would offer substantial benefitsAnyone can create interesting data and visualizationsMotivates authoring of dataWhich is directly useful for readersAnd seeds data for other usersWho can access and repurpose it to their own needsPut people in the drivers seatNot about sophisticated information toolsAbout simple flexible tools to let people do the sophisticated work

80ThanksDennis QuanVineet SinhaKarun BakshiDavid Huynh ***Margaret LeibovicGabriel DurazoNina GuoAdam MarcusTed BensonFabian Howahl81More Infohttp://haystack.csail.mit.edu/ http://simile-widgets.org/exhibit/http://projects.csail.mit.edu/datapress/http://projects.csail.mit.edu/exhibit/Dido/ [email protected]