linked data, free pictures, and markets for semantic data

136
Linked Data, Free Pictures and Markets for Semantic Data Paul Houle [email protected]

Upload: paul-houle

Post on 18-May-2015

1.684 views

Category:

Technology


0 download

DESCRIPTION

Ookaboo is a collection of about 1,000,000 Creative Commons images gathered from social media to 500,000 Linked Data concepts from Freebase and DBpedia. Ookaboo’s semantic API and RDF dump let applications connect topic such as people, places, species and things to free pictures with almost perfect precision. To create Ookaboo’s photo collection and user interface, I had to extensively clean Linked Data and construct a knowledge base about “commonsense” topics such as grammar, the relative importance of things, offensiveness, and the categorization and naming of things. Had this knowledge been commercially available, I could have more time acquiring images and building a community. Although free Linked Data defines a shared vocabulary that enables interoperation, next generation text analysis, data integration, and content generation systems will depend on reusable knowledge bases that take resources and specialized skills to create – a market in semantic data will fill this need.

TRANSCRIPT

Page 1: Linked Data, Free Pictures, and Markets For Semantic Data

Linked Data, Free Pictures and Markets for Semantic Data

Paul [email protected]

Page 2: Linked Data, Free Pictures, and Markets For Semantic Data

Overview

the New taxonomy

Page 3: Linked Data, Free Pictures, and Markets For Semantic Data

Overview

the New taxonomyFreebase and DBpedia

Page 4: Linked Data, Free Pictures, and Markets For Semantic Data

Overview

the New taxonomyfreebase and DBpedia

the social-semantic ecosystem

Page 5: Linked Data, Free Pictures, and Markets For Semantic Data

Overview

the New taxonomyfreebase and DBpedia

the semantic-social ecosystemcommonsense knowledge in practice

Page 6: Linked Data, Free Pictures, and Markets For Semantic Data

Overview

the New taxonomyfreebase and DBpedia

the semantic-social ecosystemcommonsense knowledge in practice

the economics of semantic data

Page 7: Linked Data, Free Pictures, and Markets For Semantic Data

Overview

the New taxonomyfreebase and DBpedia

collecting picturesthe semantic-social ecosystem

commonsense knowledge in practicethe economics of semantic data

proof and trust

Page 8: Linked Data, Free Pictures, and Markets For Semantic Data
Page 9: Linked Data, Free Pictures, and Markets For Semantic Data
Page 10: Linked Data, Free Pictures, and Markets For Semantic Data
Page 11: Linked Data, Free Pictures, and Markets For Semantic Data

virtuous circle

People Use Images

Links

TrafficRevenue

Get Content

Page 12: Linked Data, Free Pictures, and Markets For Semantic Data

animalphotos.info

Page 13: Linked Data, Free Pictures, and Markets For Semantic Data

Scientific Classification of Animals

Page 14: Linked Data, Free Pictures, and Markets For Semantic Data

Vernacular Taxonomy for Animals

Mammals

Primates Rodents Others

Birds

Page 15: Linked Data, Free Pictures, and Markets For Semantic Data
Page 16: Linked Data, Free Pictures, and Markets For Semantic Data
Page 17: Linked Data, Free Pictures, and Markets For Semantic Data

<http://dbpedia.org/resource/Gear>

Page 18: Linked Data, Free Pictures, and Markets For Semantic Data

automating the process

dbpedia flickr

Identify topics search for candidates filter correct images

describe images

Page 19: Linked Data, Free Pictures, and Markets For Semantic Data

amazon mechanical turk

Page 20: Linked Data, Free Pictures, and Markets For Semantic Data

carpictures.cc

Page 21: Linked Data, Free Pictures, and Markets For Semantic Data

201220112010200920082007 …199019891988198719861985

Page 22: Linked Data, Free Pictures, and Markets For Semantic Data

AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo

201220112010200920082007 …199019891988198719861985

Page 23: Linked Data, Free Pictures, and Markets For Semantic Data

CCCC 4MotionEosGtiJettaJetta SportWagenNew BeetleNew Beetle ConvertiblePassatPassat WagonRoutan FWDTiguan 4motionTourareg

201220112010200920082007 …199019891988198719861985

AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo

Page 24: Linked Data, Free Pictures, and Markets For Semantic Data

201220112010200920082007 …199019891988198719861985

AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo

CCCC 4MotionEosGtiJettaJetta SportWagenNew BeetleNew Beetle ConvertiblePassatPassat WagonRoutan FWDTiguan 4motionTourareg

6 speed automatic

5 speed manual

Page 25: Linked Data, Free Pictures, and Markets For Semantic Data

Chevrolet Honda Volkswagen

Civic ElementAccordS360 FCX

Constructed Taxonomy

Page 26: Linked Data, Free Pictures, and Markets For Semantic Data
Page 27: Linked Data, Free Pictures, and Markets For Semantic Data

Good Category…

Page 28: Linked Data, Free Pictures, and Markets For Semantic Data

…Bad Category

Page 30: Linked Data, Free Pictures, and Markets For Semantic Data

“data wiki” -> better data quality

Page 31: Linked Data, Free Pictures, and Markets For Semantic Data

ny-pictures.com

Page 32: Linked Data, Free Pictures, and Markets For Semantic Data
Page 33: Linked Data, Free Pictures, and Markets For Semantic Data

geospatial selection + Wikipedia graph

Page 34: Linked Data, Free Pictures, and Markets For Semantic Data

The only way is no way…

The only limits are no limits…

The only taxonomy is no taxonomy…

Page 35: Linked Data, Free Pictures, and Markets For Semantic Data

network “taxonomy”people

placesinventions

creative works

life forms

Page 36: Linked Data, Free Pictures, and Markets For Semantic Data

What’s out there?Type Count

Person 1,035,529

Location 707,679

Organism Classification 192,632

Organization 177,999

Music Album 118,568

Film 76,681

Structure 74,061

Event 73.992

Written Work 51,937

TV Program 30,094

Fictional Character 29,461

Celestial Object 24,174

Ship 23,006

Page 37: Linked Data, Free Pictures, and Markets For Semantic Data

ookaboo.com

Page 38: Linked Data, Free Pictures, and Markets For Semantic Data

User contributed content

Page 39: Linked Data, Free Pictures, and Markets For Semantic Data

ookaboo semantic API <http://dbpedia.org/resource/Thailand>

API

Thanks: Andyindia, Echiner1, Rene Eherhardt

Page 40: Linked Data, Free Pictures, and Markets For Semantic Data

social-semantic ecosystem

Page 41: Linked Data, Free Pictures, and Markets For Semantic Data

linked data

Page 42: Linked Data, Free Pictures, and Markets For Semantic Data

linked data

human contributions

Page 43: Linked Data, Free Pictures, and Markets For Semantic Data

linked data

human contributions

other online communities

Page 44: Linked Data, Free Pictures, and Markets For Semantic Data

linked data

human contributions

other online communities

knowledge engineering

Page 45: Linked Data, Free Pictures, and Markets For Semantic Data
Page 46: Linked Data, Free Pictures, and Markets For Semantic Data
Page 47: Linked Data, Free Pictures, and Markets For Semantic Data
Page 48: Linked Data, Free Pictures, and Markets For Semantic Data
Page 49: Linked Data, Free Pictures, and Markets For Semantic Data
Page 50: Linked Data, Free Pictures, and Markets For Semantic Data
Page 51: Linked Data, Free Pictures, and Markets For Semantic Data
Page 52: Linked Data, Free Pictures, and Markets For Semantic Data

Text Analysis

Page 53: Linked Data, Free Pictures, and Markets For Semantic Data

Text Analysis

Page 54: Linked Data, Free Pictures, and Markets For Semantic Data

Text Analysis

Car Image CC-BY from http://www.flickr.com/photos/aharden/2618801756/

Page 55: Linked Data, Free Pictures, and Markets For Semantic Data

Text Analysis

Page 56: Linked Data, Free Pictures, and Markets For Semantic Data

commonsense logic?

Page 57: Linked Data, Free Pictures, and Markets For Semantic Data

Number of Facts

Cyc: 3 million Freebase: 600 million

Number of Concepts

SUMO: 1000, DBpedia: 3.9 millionWordNet: 118,000 Freebase: 23 million

Page 58: Linked Data, Free Pictures, and Markets For Semantic Data

Number of Facts

Cyc: 3 million Freebase: 600 million

Number of Concepts

SUMO: 1000, Wikipedia: 3.9 millionWordNet: 118,000 Freebase: 23 million

critical mass?

Page 59: Linked Data, Free Pictures, and Markets For Semantic Data

“Any brain, machine or other thing that has a mind must be composed of smaller things that cannot think at all”

Marvin Minsky

Page 60: Linked Data, Free Pictures, and Markets For Semantic Data
Page 61: Linked Data, Free Pictures, and Markets For Semantic Data
Page 62: Linked Data, Free Pictures, and Markets For Semantic Data

Saturn1

Rome

Deity

Mythology

Page 63: Linked Data, Free Pictures, and Markets For Semantic Data

Saturn1

Rome

Deity

Mythology

Saturn2

Planet

Rings

Astronomy

Page 64: Linked Data, Free Pictures, and Markets For Semantic Data

Saturn1

Rome

Deity

Mythology

Saturn2

Planet

Rings

Astronomy

Page 65: Linked Data, Free Pictures, and Markets For Semantic Data

Saturn1

Rome

Deity

Mythology

Saturn2

Planet

Rings

Astronomy

Page 66: Linked Data, Free Pictures, and Markets For Semantic Data

autocompletion

Page 67: Linked Data, Free Pictures, and Markets For Semantic Data
Page 68: Linked Data, Free Pictures, and Markets For Semantic Data
Page 69: Linked Data, Free Pictures, and Markets For Semantic Data

ad-hoc SPARQL query

Page 70: Linked Data, Free Pictures, and Markets For Semantic Data

a database of names…

Page 71: Linked Data, Free Pictures, and Markets For Semantic Data

… plus subjective importance

Page 72: Linked Data, Free Pictures, and Markets For Semantic Data
Page 73: Linked Data, Free Pictures, and Markets For Semantic Data

yankees vs. red sox

Page 74: Linked Data, Free Pictures, and Markets For Semantic Data

yankees vs. red sox

carbon vs. silicon

Page 75: Linked Data, Free Pictures, and Markets For Semantic Data

yankees vs. red sox

carbon vs. silicon

aerosmith vs. the ramones

Page 76: Linked Data, Free Pictures, and Markets For Semantic Data

yankees vs. red sox

carbon vs. silicon

aerosmith vs. the ramones

Jeopardy vs. family feud

Page 77: Linked Data, Free Pictures, and Markets For Semantic Data
Page 78: Linked Data, Free Pictures, and Markets For Semantic Data
Page 79: Linked Data, Free Pictures, and Markets For Semantic Data
Page 80: Linked Data, Free Pictures, and Markets For Semantic Data
Page 81: Linked Data, Free Pictures, and Markets For Semantic Data
Page 82: Linked Data, Free Pictures, and Markets For Semantic Data

the airports query

Page 83: Linked Data, Free Pictures, and Markets For Semantic Data

Airports in English

Page 84: Linked Data, Free Pictures, and Markets For Semantic Data
Page 85: Linked Data, Free Pictures, and Markets For Semantic Data

空港 の 日本語

Page 86: Linked Data, Free Pictures, and Markets For Semantic Data

A cautionary tale

time

advertisingrevenue

Page 87: Linked Data, Free Pictures, and Markets For Semantic Data

“I know it when I see it”- Supreme Court Justice Potter Stuart

Page 88: Linked Data, Free Pictures, and Markets For Semantic Data

50 offensive categories

Page 89: Linked Data, Free Pictures, and Markets For Semantic Data

50 offensive categories

1000 offensive topics

Page 90: Linked Data, Free Pictures, and Markets For Semantic Data

50 offensive categories

1000 offensive topics

1800 offensive images

Page 91: Linked Data, Free Pictures, and Markets For Semantic Data

50 offensive categories

1000 offensive topics

1800 offensive images

950,000 good images

Page 92: Linked Data, Free Pictures, and Markets For Semantic Data

950,000 good images

Page 93: Linked Data, Free Pictures, and Markets For Semantic Data
Page 94: Linked Data, Free Pictures, and Markets For Semantic Data

99.81% accuracy isn’t good enough!

Page 95: Linked Data, Free Pictures, and Markets For Semantic Data

99.81% accuracy isn’t good enough!

Hyperprecision!

Page 96: Linked Data, Free Pictures, and Markets For Semantic Data
Page 97: Linked Data, Free Pictures, and Markets For Semantic Data

Publishing Knowledge

SPARQL Endpoint Dereferencing

API RDF Dump

Page 98: Linked Data, Free Pictures, and Markets For Semantic Data
Page 99: Linked Data, Free Pictures, and Markets For Semantic Data
Page 100: Linked Data, Free Pictures, and Markets For Semantic Data

Thanks: andrefontana, Isakkk, laynaaa

Page 101: Linked Data, Free Pictures, and Markets For Semantic Data
Page 102: Linked Data, Free Pictures, and Markets For Semantic Data

Clip art licensed from the Clip Art Gallery on DiscoverySchool.com

Page 103: Linked Data, Free Pictures, and Markets For Semantic Data

Dereferencing

Page 104: Linked Data, Free Pictures, and Markets For Semantic Data

Dereferencing<http://rdf.freebase.com/ns/en.graphene>

Page 105: Linked Data, Free Pictures, and Markets For Semantic Data

Dereferencing<http://rdf.freebase.com/ns/en.graphene>

http GET

Page 106: Linked Data, Free Pictures, and Markets For Semantic Data

Dereferencing<http://rdf.freebase.com/ns/en.graphene>

http GET

fbase:en.graphene a fbase:common.topic , fbase:award.award_winning_work , fbase:law.invention; fbase:award.award_winning_work.awards_won fbase:m.0dg75z8 ; fbase:common.topic.article fbase:m.03p5rz ; fbase:common.topic.image fbase:m.089q2k3 , fbase:m.02f5b7f , fbase:m.041wl9z ; fbase:law.invention.inventor fbase:en.andre_geim ...

Page 107: Linked Data, Free Pictures, and Markets For Semantic Data

Thanks: Thomas Shahan

Page 108: Linked Data, Free Pictures, and Markets For Semantic Data
Page 109: Linked Data, Free Pictures, and Markets For Semantic Data

Publishing Knowledge

API RDF Dump

Page 110: Linked Data, Free Pictures, and Markets For Semantic Data

Ookaboo RDF Dump

Metadata for 950,000 Pictures

500,000+ topics

630 MB

50 million facts

Page 111: Linked Data, Free Pictures, and Markets For Semantic Data

Two Challenges

Ookaboo needs better tools to build navigation

Customers need tools to find concepts

Page 112: Linked Data, Free Pictures, and Markets For Semantic Data

:BaseKB is free under CC-BY

Page 113: Linked Data, Free Pictures, and Markets For Semantic Data

Not so “big” …

:BaseKB is 2.8 GB:BaseKB is free under CC-BY

:BaseKB takes an 1 hour to load on a workstation PC

… but very complex

:BaseKB has 11,361 types and 102,949 properties“A isPartOf B” can be expressed in 139 different ways!

Photo credit: http://commons.wikimedia.org/wiki/User:Evan-Amos

Page 114: Linked Data, Free Pictures, and Markets For Semantic Data

RDF Database

N-Triples is compatible with…

Page 115: Linked Data, Free Pictures, and Markets For Semantic Data

RDF Database

awk, sed, grep, …

N-Triples is compatible with…

Page 116: Linked Data, Free Pictures, and Markets For Semantic Data

RDF Database

awk, sed, grep, …

Hadoop

N-Triples is compatible with…

Page 117: Linked Data, Free Pictures, and Markets For Semantic Data

RDF Database

awk, sed, grep, …

Hadoop

Lucene (SIREn)

N-Triples is compatible with…

Page 118: Linked Data, Free Pictures, and Markets For Semantic Data

Data Quality

Page 119: Linked Data, Free Pictures, and Markets For Semantic Data

Data Quality

Quality Perimeter

Page 120: Linked Data, Free Pictures, and Markets For Semantic Data

Repairing Folksonomic Trees

Page 121: Linked Data, Free Pictures, and Markets For Semantic Data

Repairing Folksonomic Trees

Page 122: Linked Data, Free Pictures, and Markets For Semantic Data

Operations ETL Data Warehouse Analytics

Enterprise Data Warehousing

Page 123: Linked Data, Free Pictures, and Markets For Semantic Data

Operations ETL Data Warehouse Analytics

Enterprise Data Warehousing

Page 124: Linked Data, Free Pictures, and Markets For Semantic Data

Knowledge-Based System

Linked Data ETL Data Warehouse Operations

Page 125: Linked Data, Free Pictures, and Markets For Semantic Data

“Businesses often spend five to 10 times more money to correct their data after it is entered into the system than they would have if they had headed the problems off at the source.”

- Larry P. English, Information Impact International

Page 126: Linked Data, Free Pictures, and Markets For Semantic Data

Data Quality Economics

Assume 25 Consumers

Consumers Clean25 x $N = $25 N

Publisher Cleans1 x $N = $N

Page 127: Linked Data, Free Pictures, and Markets For Semantic Data

Reusable Knowledge Baseeffect on schedule

decision point

build

adopt

develop knowledge base

develop knowledge base

time

Page 128: Linked Data, Free Pictures, and Markets For Semantic Data

Build Knowledge

Base

Develop Profitable

Applications

Get Feedback And Revenue

Page 129: Linked Data, Free Pictures, and Markets For Semantic Data

Linked Data Business Models

Free Shared Vocabulary Enables Interconnection……but the profit motive spurs investment to create quality data.

Page 130: Linked Data, Free Pictures, and Markets For Semantic Data
Page 131: Linked Data, Free Pictures, and Markets For Semantic Data

Trust

Proof

Page 132: Linked Data, Free Pictures, and Markets For Semantic Data
Page 133: Linked Data, Free Pictures, and Markets For Semantic Data

Publishers

Page 134: Linked Data, Free Pictures, and Markets For Semantic Data

Publishers

Consumers

Page 135: Linked Data, Free Pictures, and Markets For Semantic Data

A Market in Common Sense

Ookaboo: Free Pictures of Everything On Earth

5000 Topics That Aren’t Safe For Kids

How to conjugate verbs and use the correct article

What is similar to this?

What is this document about?

How well known is this?

Page 136: Linked Data, Free Pictures, and Markets For Semantic Data

… big and ambitious systems

Paul [email protected]