mining big data and open knowledge sources to develop transparent and serendipitous content-based...
DESCRIPTION
World Summit on Big Data and Organization Design - ParisTRANSCRIPT
Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems
Cataldo Musto, Giovanni Semeraro, Fedelucio Narducci
state of the art.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
our research: personalizationC.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Recommender Systems
Relevant items (movies, news, books, etc.) are pushed to the user according to her preferences or her needs.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Amazon.com
Recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
current recommendation technologies share three important drawbacks.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
(1) training is a bottleneck.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
need for explicit
informationabout
user interests.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
(2) recsys are black boxes.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
(3) suggestions are not surprising.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
exploiting big data to build a novel generation of content-based adaptive systems
solution
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
current work.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
near future work.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
big data.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Information Overload
we can handle 126 bits of informationwe deal with 393 bits of information
ratio: more than 3x(Source: Adrian C.Ott, The 24-hour customer)
consequence:
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Information Overload
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Big Data: obstacle or opportunity?
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
cornestone 1
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
exploit social media to model user preferences.
social media are an opportunity
provide information about user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
example
user preferences in music from FacebookC.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
implicit preferencesC.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
example
Play.meplaylist
Most popular songs of the artists extracted from Last.fm (as well as those added through the enrichment) are proposed to the user.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Myusicrecommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
cornestone 2exploit entity linking algorithms
to make user profiles more transparent and LOD-aware
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
MyFeedsRSS recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
MyFeedstransparent user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
extracted from Facebook.
MyFeedstransparent user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
further processing
MyFeedsentity linking algorithms
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• They map free text with structured information
• Wikipedia pages or DBpedia nodes
• examples
• Tag.me , Wikipedia Miner, DBpedia Spotlight, etc.
Tag.me
extracts the Wikipedia pages the content refers to.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Linked Open Data Cloud
Structured (RDF)
representation of the information
stored in Wikipedia.C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Linked Open Data Cloud
Profiles based on Tag.me are
LOD-aware
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
cornestone 3
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
exploit open knowledge sources to make recommendation
techniques more serendipitous.
‘in vitro’ experimentsWatchmi plug-in
developed by Aprico.tv
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
From BOW to eBOW
Given a description of a TV show, we exploit ESA to obtain an enhanced representation
The original set of features is enriched with the set of Wikipedia articles related the most with the TV show
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
TV SHOW Rad an Rad
Die besten Duelle der MotoGP (Wheel to wheel
The best duels in the MotoGP)
Wikipedia(Articles(großer&preis&von&italien&
(motorrad)&großer&preis&von&malaysia&
(motorrad)&großer&preis&von&tschechien&
(motorrad)&scuderia&ferrari&valen8no&rossi&
motorrad9wm9saison&2005&motorrad9wm9saison&2006&
max&biaggi&
großer&preis&der&usa&(motorrad)&motorrad9wm9saison&2008&
rad&(heraldik)&loris&capirossi&shin’ya&nakano&
motogp&
example
From BOW to eBOW
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
challenges.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
issues.recommendations.
Challenges and Issues
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• Main challenge and issue:
• data representation and data filtering
• How to exploit these novel data sylos?
• What information is relevant for personalization?
• What kind of processing do data need?
• Which one is the best representation?
• Do reasoning techniques improve profiles transparency and personalization accuracy?
• Do people accept the exploitation of these data?
• How to model the context?
Recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• Cornerstones
• Social media-based user profiling
• LOD-aware user profiles
• Open Knowledge Sources for Serendipitous Encounters
• Recommendations
• Promote the LOD initiative, to publish data in a structured form, to enable reasoning on the information
• Make data sylos interconnected
• To design applications able to properly model, manage and exploit the big amount of data coming from social media.
questions?Cataldo Musto, Ph.D. - [email protected]