mining big data and open knowledge sources to develop transparent and serendipitous content-based...

39
Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems Cataldo Musto, Giovanni Semeraro, Fedelucio Narducci

Upload: cataldo-musto

Post on 12-May-2015

363 views

Category:

Technology


0 download

DESCRIPTION

World Summit on Big Data and Organization Design - Paris

TRANSCRIPT

Page 1: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous

content-based adaptive systems

Cataldo Musto, Giovanni Semeraro, Fedelucio Narducci

Page 2: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

state of the art.

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 3: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

our research: personalizationC.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 4: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Recommender Systems

Relevant items (movies, news, books, etc.) are pushed to the user according to her preferences or her needs.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 5: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Amazon.com

Recommendations

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 6: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

current recommendation technologies share three important drawbacks.

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 7: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

(1) training is a bottleneck.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 8: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

need for explicit

informationabout

user interests.

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 9: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

(2) recsys are black boxes.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 10: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

(3) suggestions are not surprising.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 11: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

exploiting big data to build a novel generation of content-based adaptive systems

solution

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 12: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

current work.

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

near future work.

Page 13: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 14: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

big data.

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 15: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Information Overload

we can handle 126 bits of informationwe deal with 393 bits of information

ratio: more than 3x(Source: Adrian C.Ott, The 24-hour customer)

consequence:

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 16: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Information Overload

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 17: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Big Data: obstacle or opportunity?

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 18: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

cornestone 1

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

exploit social media to model user preferences.

Page 19: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

social media are an opportunity

provide information about user preferences

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 20: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

example

user preferences in music from FacebookC.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 21: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

implicit preferencesC.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

example

Page 22: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Play.meplaylist

Most popular songs of the artists extracted from Last.fm (as well as those added through the enrichment) are proposed to the user.

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 23: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Myusicrecommendations

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 24: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

cornestone 2exploit entity linking algorithms

to make user profiles more transparent and LOD-aware

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 25: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

MyFeedsRSS recommendations

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 26: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

MyFeedstransparent user preferences

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

extracted from Facebook.

Page 27: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

MyFeedstransparent user preferences

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

further processing

Page 28: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

MyFeedsentity linking algorithms

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

• They map free text with structured information

• Wikipedia pages or DBpedia nodes

• examples

• Tag.me , Wikipedia Miner, DBpedia Spotlight, etc.

Page 29: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Tag.me

extracts the Wikipedia pages the content refers to.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 30: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Linked Open Data Cloud

Structured (RDF)

representation of the information

stored in Wikipedia.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 31: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Linked Open Data Cloud

Profiles based on Tag.me are

LOD-aware

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 32: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

cornestone 3

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

exploit open knowledge sources to make recommendation

techniques more serendipitous.

Page 33: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

‘in vitro’ experimentsWatchmi plug-in

developed by Aprico.tv

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 34: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

From BOW to eBOW

Given a description of a TV show, we exploit ESA to obtain an enhanced representation

The original set of features is enriched with the set of Wikipedia articles related the most with the TV show

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 35: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

TV SHOW Rad an Rad

Die besten Duelle der MotoGP (Wheel to wheel

The best duels in the MotoGP)

Wikipedia(Articles(großer&preis&von&italien&

(motorrad)&großer&preis&von&malaysia&

(motorrad)&großer&preis&von&tschechien&

(motorrad)&scuderia&ferrari&valen8no&rossi&

motorrad9wm9saison&2005&motorrad9wm9saison&2006&

max&biaggi&

großer&preis&der&usa&(motorrad)&motorrad9wm9saison&2008&

rad&(heraldik)&loris&capirossi&shin’ya&nakano&

motogp&

example

From BOW to eBOW

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

Page 36: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

challenges.

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

issues.recommendations.

Page 37: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Challenges and Issues

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

• Main challenge and issue:

• data representation and data filtering

• How to exploit these novel data sylos?

• What information is relevant for personalization?

• What kind of processing do data need?

• Which one is the best representation?

• Do reasoning techniques improve profiles transparency and personalization accuracy?

• Do people accept the exploitation of these data?

• How to model the context?

Page 38: Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Recommendations

C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

• Cornerstones

• Social media-based user profiling

• LOD-aware user profiles

• Open Knowledge Sources for Serendipitous Encounters

• Recommendations

• Promote the LOD initiative, to publish data in a structured form, to enable reasoning on the information

• Make data sylos interconnected

• To design applications able to properly model, manage and exploit the big amount of data coming from social media.