semantics-aware techniques for social media analysis, user modeling and recommender systems -...

cataldo musto and pasquale lops

dept. of computer science

university of bari “aldo moro”, italy

semantics-aware techniques for

social media analysis

user modelling

and recommender systems

tutorial@UMAP 2016 Halifax, Canada – July 16, 2015

About us

pasquale.lops@.uniba.itcataldo.musto@.uniba.it

in this tutorial

how to represent content

to improve information access and build a

new generation of services for social media

analysis, user modeling and

recommender systems?

Agenda

Why do we need intelligent information access?

Why do we need content?

Why do we need semantics?

How to introduce semantics?

Basics of Natural Language Processing

Encoding exogenous semantics (top-down approaches)

Encoding endogenous semantics (bottom-up approaches)

Semantics-aware Recommender Systems

Cross-lingual Recommender Systems

Explaining Recommendations

Semantic User Profiles based on Social Data

Semantic Analysis of Social Streams

Why do we need intelligent information access?

physiologically

impossible

to follow the information flow

in real time

(Source: Adrian C.Ott,

The 24-hour

customer)

we can handle

126 bits of

information/day

we deal with

393 bits of

information/day

ratio: 3x

Information overload (Appeared for the first time in «Future Shock» by Alvin Toffler, 1970)

Information overload

“It is not

information

overload.

It is filter failure”

Clay Shirky

talk @Web2.0 Expo

Challenge

To effectively cope with

information overload

we need to filter the information flow

We need technologies and algorithms for intelligent information access

… and we already have some evidence!

Intelligent Information Access

Information Retrieval (Search Engines)

Intelligent Information Access

Information Filtering (Recommender Systems)

Why? Why do we need content?

Search engines need content

Trivial: search engines can’t work without content

Recommender Systems: not trivial!

Recommender Systems can work without content

Several Recommender Systems

perfectly work without using any

content! (e.g.Amazon)

Collaborative Filtering and Matrix

Factorization are state of the art

techniques for implementing

Recommender Systems

(ACM RecSys 2009,

by Neflix Challenge winners)

Content can tackle some issues of collaborative filtering

Collaborative Filtering issues: sparsity

Collaborative Filtering issues: new item problem

Collaborative Filtering: lack of transparency!

Collaborative Filtering: poor explanations!

Who knows the «customers who bought…»?

User Modeling based on simple graph-based representation of social

connections is quite poor.

User Models can benefit of information about the items the user has

consumed (news content, hashtag contained in the Tweets she liked,

To enrich and improve user modeling

Because a relevant part of the information spread

on social media is content!

And social media really matter

Because a relevant part of the information spread

on social media is content!

can be considered as novel data silos

Social Media

information about preferences

information about

People feelings and connections

Social Media

changed the rule for

user modeling and

personalization

Social Media

Recap #1

In general: to extend and improve user modeling

To exploit the information spread on social media

To overcome typical issues of collaborative filtering

and matrix factorization

Because search engines can’t simply work without

content

Why? Why do we need semantics?

A deep comprehension of the information conveyed by

textual content is crucial to improve the quality of user

profiles and the effectiveness of intelligent information

access platforms.

…some scenarios can be more convincing

(But we need some basics, before)

Basics: Content-based RecSys (CBRS)

Suggest items similar to those the

user liked in the past

Recommendations generated by matching

the description of items with the

profile of the user’s interests

use of specific features

Recommender Systems Handbook,

The Adaptive Web

user profile items

Recommendation are

generated by

matching the features

stored in the user

profile with those

describing the items

to be recommended.

user profile items

Recommendation are

generated by

matching the features

stored in the user

profile with those

describing the items

to be recommended. X

Lack of Semantics in User Models

“I love turkey. It’s my choice

for these #holidays!

Social Media can be helpful to avoid cold start

..but pure content-based representations

can’t handle polysemy

Pure Content-based Representation can easily drive a recommender systems towards failures!

Lack of Semantics in Social Media Analysis

What are people worried about?

Are they worried about the eagle

or about the city of L’Aquila?

Artificial

Intelligence

multi-word concepts

Book recommendation

Artificial

Intelligence

synonymy

…is not only about polysemy

Book recommendation

Most of the preferences regard AI,

but due to synonymy «apple» is the

most relevant feature in the profile

italian

english

Lack of Semantics in CBRS

user profile items

It is likely that the

algorithm is not able

to suggest a

(relevant) english

news since no

overlap between

the features

occurs!

user profile items

It is likely that the

algorithm is not able

to suggest a

(relevant) english

news since no

overlap between

the features

occurs!

Recap #2

In general: to improve

content representation in

intelligent information

access platforms

To avoid typical issues of

natural language

representations (polysemy,

synonymy, etc.)

To better model user

preferences

To better understand the

information spread on social

To provide multilingual

recommendations

Becuase language is

inherently ambiguous

How to introduce semantics?

Information Retrieval and Filtering

Two sides of the same coin (Belkin&Croft,1992)

Information

Retrieval

information need expressed

through a query

goal: retrieve information which

might be relevant to a

Information

Filtering

information need expressed

through a

user profile

goal: expose users to only the

information that is

relevant to them,

according to personal profiles

[Belkin&Croft, 1992] Belkin, Nicholas J., and W. Bruce Croft.

"Information filtering and information retrieval: Two sides of the same

coin?." Communications of the ACM 35.12 (1992): 29-38.

It’s all about searching!

Search (and Content-based Recommendation)

is not so simple as it might seem

Meno: and how will you enquire, Socrates, into that

which you do not know? What will you put forth

as the subject of enquiry? And if you find what

you want, how will you know that this is the

thing you did not know?

Socrates: I know, Meno, what you mean; but just

see what a tiresome dispute you are introducing. You argue that a man cannot search either

for what he knows or for what he does not

know; if he knows it, there is no need to search;

and if not, he cannot; he does not know the very

subject about which he is to search.

Plato Meno 80d-81a

http://www.gutenberg.org/etext/1643

Meno’s Paradox of Inquiry

Meno’s question at our times:

the “vocabulary mismatch” problem (revisited)

How to discover the concepts that connect us to the

the information we are seeking (search task) or we want

to be exposed to (recommendation and user modeling

tasks) ?

We need some «intelligent» support

(as intelligent information access

technologies)

tasks) ?

We need to better understand

and represent the content

technologies)

tasks) ?

We need to better understand

and represent the content

technologies)

…before semantics

some basics

of Natural Language Processing (NLP)

basics of NLP and keyword-based representations

Scenario

Pasquale really loves the movie «The Matrix», and he asks a content-based

recommender system for some suggestions.

How can we feed the algorithm with some textual features related to the movie

to build a (content-based) profile and provide recommendations?

Question

Recommendation

Engine

Scenario

the plot can be a rich source of content-based features

Scenario

…but we need to properly process it through a pipeline of

Natural Language Processing techniques

the plot can be a rich source of content-based features

Basic NLP operations

o normalization strip unwanted characters/markup (e.g.

HTML/XML tags, punctuation, numbers, etc.)

o tokenization break text into tokens

o stopword removal exclude common words having

little semantic content

o lemmatization reduce inflectional/variant forms to base

form (lemma in the dictionary), e.g. am, are, is be

o stemming reduce terms to their “roots”, e.g. automate(s),

automatic, automation all reduced to automat

vocabulary

Example

The Matrix is a 1999 American-Australian neo-noir

science fiction action film written and directed by the

Wachowskis, starring Keanu Reeves, Laurence

Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe

Pantoliano. It depicts a dystopian future in which reality

as perceived by most humans is actually a simulated

reality called "the Matrix", created by sentient machines

to subdue the human population, while their bodies' heat

and electrical activity are used as an energy source.

Computer programmer "Neo" learns this truth and is

drawn into a rebellion against the machines, which

involves other people who have been freed from the

"dream world".

The Matrix is a 1999 American-Australian neo-noir

Wachowskis, starring Keanu Reeves, Laurence

Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe

Pantoliano. It depicts a dystopian future in which reality

as perceived by most humans is actually a simulated

reality called "the Matrix", created by sentient machines

to subdue the human population, while their bodies' heat

and electrical activity are used as an energy source.

Computer programmer "Neo" learns this truth and is

drawn into a rebellion against the machines, which

involves other people who have been freed from the

"dream world".

Example

normalization

Example

The Matrix is a 1999 American Australian neo noir

Wachowskis starring Keanu Reeves Laurence Fishburne

Carrie Anne Moss Hugo Weaving and Joe Pantoliano It

depicts a dystopian future in which reality as perceived

by most humans is actually a simulated reality called the

Matrix created by sentient machines to subdue the

human population while their bodies heat and electrical

activity are used as an energy source Computer

programmer Neo learns this truth and is drawn into a

rebellion against the machines which involves other

people who have been freed from the dream world

tokenization

Tokenization issues

compound words

o science-fiction: break up hyphenated sequence?

o Keanu Reeves: one token or two? How do you decide it is one

token?

numbers and dates

o 3/20/91 Mar. 20, 1991 20/3/91

o 55 B.C.

o (800) 234-2333

Tokenization issues

language issues

o German noun compounds not segmented

Lebensversicherungsgesellschaftsangestellter means

life insurance company employee

o Chinese and Japanese have no spaces between words (not always

guaranteed a unique tokenization)

莎拉波娃现在居住在美国东南部的佛罗里达

o Arabic (or Hebrew) is basically written right to left, but with certain items like

numbers written left to right

Algeria achieved its independence in 1962 after 132 years of French

occupation

Example

stopword removal

Example

stopword removal

Example

human population while their bodyies heat and electrical

lemmatization

Example

Matrix 1999 American Australian neo noir science fiction

action film write direct Wachowskis star Keanu Reeves

Laurence Fishburne Carrie Anne Moss Hugo Weaving

Joe Pantoliano depict dystopian future reality perceived

human simulate reality call Matrix create sentient

machine subdue human population body heat electrical

activity use energy source Computer programmer Neo

learn truth draw rebellion against machine involve people

free dream world

next step: to give a weight to each feature

(e.g. through TF-IDF)

Weighting features: TF-IDF

terms frequency – inverse document

frequency best known weighting scheme in information retrieval.

Weight of a term as product of tf weight and idf weight

tf number of times the term occurs in the document

idf depends on rarity of a term in a collection

tf-idf increases with the number of occurrences within a

document, and with the rarity of the term in the collection.

)df/log()tflog1(w ,, tdt Ndt

Example

Matrix 1999 American Australian neo noir science fiction

action film write direct Wachowskis star Keanu Reeves

Laurence Fishburne Carrie Anne Moss Hugo Weaving

Joe Pantoliano depict dystopian future reality

perceived human simulate reality call Matrix create

sentient machine subdue human population body heat

electrical activity use energy source Computer

programmer Neo learn truth draw rebellion against

machine involve people free dream world

green=high IDF

red=low IDF

The Matrix representation

Matrix

American

Australian

fiction

keywords

a portion of Pasquale’s

content-based profile

given a content-based profile, we can easily build a basic

recommender system through

Vector Space Model and

similarity measures

science

Vector Space Model (VSM)

given a set of n features (vocabulary)

f = {f1, f

2 ,..., f

given a set of M items, each item I

represented as a point in a n-dimensional vector space

I = (wf1

,.....wfn

wfi is the weight of feature i in the item

Similarity between vectors

cosine similarity

1),cos(

dot product unit vectors

Basic Content-based Recommendations

o documents represented as vectors o features identified through NLP operations

o features weigthed using tf-idf

o cosine measure for computing similarity

between vectors

Drawbacks

Recommendation:

Notre Dame de Paris,

by Victor Hugo

Entities as «Hugo

Weaving» were not

modeled

Matrix

American

Australian

fiction

science

Drawbacks

More complex concepts

as «science fiction» were

not modeled as single

features

Recommendation:

The March of Penguins

Matrix

American

Australian

fiction

science

Vision

Bad recommendations

Recap #3

Natural Language Processing

techniques necessary to build a

basic content-based

algorithms can be easily built

through TF-IDF

keyword-based representation

too poor and can drive to bad

modeling of preferences (and

bad recommendations)

we need to shift from keywords to concepts

basics of NLP and keyword-based representation

Semantics-aware Content Representation

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

top-down

approaches based on the

integration of external

knowledge for

representing content. Able to provide the linguistic,

cultural and backgroud

knowledge in the

content representation

Semantics

top-down

approaches based on the

integration of external

knowledge for

representing content. Able to provide the linguistic,

cultural and backgroud

knowledge in the

content representation

bottom-up

approaches that determine

the meaning of a word

by analyzing the rules of its usage in the context of

ordinary and concrete

language behavior

Semantics

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Semantics

Word Sense

Disambiguation

Entity

Linking …….

semantic concepts

Introduce semantics

by linking

the item to

a knowledge graph

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking

the Item to

a knowledge graph

…….

semantic concepts

Semantics

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

How? Encoding exogenous semantics

(top-down approaches)

Semantics

Word Sense

Disambiguation

Entity

Linking …….

semantic concepts

Introduce semantics

by linking the

Item to a

knowledge graph

Word Sense Disambiguation (WSD)

using linguistic ontologies

WSD selects the proper meaning, i.e. sense, for a word in

a text by taking into account the context in which it occurs

Word Sense Disambiguation (WSD)

using linguistic ontologies

WSD selects the proper meaning, i.e. sense, for a word in

a text by taking into account the context in which it occurs

Sense Repository

WordNet groups words into sets of synonyms called synsets

It contains nouns, verbs, adjectives, adverbs

Word Meanings

Word Forms

F1 F2 F3 … … Fn

M1 V(1,1) V(2,1)

M2 V(2,2) V(3,2)

Mm V(m,n)

Synonym

word forms

(synset)

polysemous word:

disambiguation needed

WordNet linguistic ontology [*]

[*] Miller, George A. "WordNet: a lexical database for

English." Communications of the ACM 38.11 (1995): 39-41.

an example of synset

Sense Repository

WordNet linguistic ontology

Sense Repository

WordNet Hierarchies

WordNet linguistic ontology

Word Sense Disambiguation

State of the art: JIGSAW algorithm [*]

o D = {w1, w

2, …. , w

h} document

Output

o X = {s1, s

2, …. , s

k} (kh)

Each si obtained by disambiguating wi based on the context

of each word

Some words not recognized by WordNet

Groups of words recognized as a single concept

[*] Basile, P., de Gemmis, M., Gentile, A. L., Lops, P., & Semeraro, G. (2007, June). UNIBA:

JIGSAW algorithm for word sense disambiguation. InProceedings of the 4th International

Workshop on Semantic Evaluations (pp. 398-401). Association for Computational Linguistics.

How to use WordNet for WSD?

Semantic similarity between synsets inversely proportional to their distance in the WordNet IS-A hierarchy

Path length similarity between synsets used to assign scores to synsets of a polysemous word in order to choose the correct sense

Placental mammal

Carnivore Rodent

Feline, felid

Cat (feline mammal)

Mouse (rodent)

JIGSAW WSD algorithm

SINSIM(cat,mouse) =

-log(5/32)=0.806

Placental mammal

Carnivore Rodent

Feline, felid

Cat (feline mammal)

Mouse (rodent)

Synset semantic similarity

0.806 0.8060.806

through WSD can we obtain a semantics-aware representation

of textual content

Synset-based representation

{09596828} American -- (a native or inhabitant of the United States)

{06281561} fiction -- (a literary work based on the imagination and not necessarily on fact)

{06525881} movie, film, picture, moving picture, moving-picture show, motion picture,

motion-picture show, picture show, pic, flick -- (a form of entertainment that enacts a story…

{02605965} star -- (feature as the star; "The movie stars Dustin

Hoffman as an autistic man")

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

through WSD we process the textual

description of the item and we obtain a semantics-aware

representation of the item as

output

keyword-based features replaced

with the concepts (in this

case WordNet synsets) they refer to

Matrix

American

Australian

fiction

keywords

science

synsets

Word Sense Disambiguation recap

polysemy and synonymy

effectively handled

classical NLP techniques helpful to

remove further noise (e.g.

stopwords)

potentially language-independent

(later)

entities (e.g. Hugo Weaving)

still not recognized

Matrix

American

Australian

fiction

keywords

science

Semantics

Word Sense

Disambiguation

Entity

Linking

Introduce semantics

by linking the item

itself to a

knowledge graph

…….

semantic concepts

• Basic Idea

• Input: free text

• e.g. Wikipedia

abstract

• Output:

identification of

the entities

mentioned in the

Entity Linking Algorithms

Why Entity Linking?

because we need to identify the entities mentioned in the textual description

to better catch user preferences and information needs.

… and many more

Several state-of-the-art implementations are already available

OpenCalais

synsets

Matrix

American

Australian

science

fiction

keywords entities

synsets

Matrix

American

Australian

science

fiction

keywords entities

entities are correctly

recognized and modeled

partially multilingual

(entities are inherently multilingual,

but other concepts aren’t)

common sense and abstract

concepts now ignored.

very transparent and human-readable content representation

non-trivial NLP tasks automatically performed

(stopwords removal, n-grams identification, named entities recognition and

disambiguation)

Tag.me

Output

each entity identified in the content can be a feature of a

semantics-aware content representation

based on entity linking

Tag.me

Output

Advantage #1: several common sense

concepts are now identified

Tag.me

Output

Advantage #2: each entity is a reference

to a Wikipedia page

http://en.wikipedia.org/wiki/The_Wachowskis

not a simple textual feature!

Tag.me

We can enrich this entity-based representation

by exploiting the Wikipedia categories’ tree

Tag.me + Wikipedia Categories

final representation of items obtained by

merging entities identified in the text with

the (most relevant)

Wikipedia

categories each

entity is linked to

+ entities wikipedia categories features =

Tag.me + Wikipedia Categories

synsets

Matrix

American

Australian

science

fiction

keywords Wikipedia pages

synsets

Matrix

American

Australian

science

fiction

keywords Wikipedia pages

entities recognized and

modeled (as in OpenCalais)

Wikipedia-based representation:

some common sense terms

included, and new interesting

features (e.g. «science-fiction fil

director») can be generated

terms without a Wikipedia

mapping are ignored

traditional

resources

collaborative

resources

o manually curated by experts

o available for a few languages

o difficult to maintain and update

o collaboratively built by the crowd

o highly multilingual

o up-to-date

Babelfy

we have both Named Entities and Concepts!

Babelfy

synsets

Matrix

American

Australian

science

fiction

keywords Babel synsets

synsets

Matrix

American

Australian

science

fiction

keywords Babel synsets

entities recognized and

modeled (as in OpenCalais

and Tag.me)

Wikipedia-based representation:

some common sense terms

included, and new interesting

features (e.g. «science-fiction

director) can be generated

includes linguistic knowledge

and is able to disambiguate terms

also multilingual!

Recap #4

o «Exogenous» techniques use

external knowledge sources to inject

semantics

o Word Sense Disambiguation

algorithms process the textual

description and replace keywords with

semantic concepts (as synsets)

o Entity Linking algorithms focus on

the identification of the entities. Some

recent approaches also able to identify

common sense terms

o Combination of both

approaches is potentially the

best strategy

encoding exogenous semantics

by processing textual descriptions

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking the

Item to a

knowledge graph

…….

semantic concepts

Ontologies

o used to describe domain-specific

knowledge

o hierarchies of

concepts with

attributes and relations

o “An ontology is a formal,

explicit specification of

a shared conceptualization”

Guarino, Nicola. "Understanding, building and using ontologies." International Journal of Human-Computer

Studies 46.2 (1997): 293-310.

Exogenous Semantics through Ontologies

why do we need an ontology?

to share common understanding of the structure of information

o among people

o among software agents

to enable reuse of domain knowledge

o to avoid “re-inventing the wheel”

o to introduce standards to allow interoperability

why do we need an ontology?

to share common understanding of the structure of information

o among people

o among software agents

to enable reuse of domain knowledge

o to avoid “re-inventing the wheel”

o to introduce standards to allow interoperability

…let’s have an example!

A Movie Ontology

(a small portion, actually)

we formally encode relevant aspects and the relationships among them

A Movie Ontology

every item formally modeled by following this structure

and encoded through a Semantic Web language (e.g. OWL, RDF)

A Movie Ontology

every item formally modeled by following this structure

and encoded through a Semantic Web language (e.g. OWL, RDF)

A Movie Ontology

why is it useful?

A Movie Ontology

why is it useful? each feature has a non-ambiguous «meaning»

A Movie Ontology

why is it useful?

we don’t need to process unstructured content

A Movie Ontology

why is it useful? we can perform some «reasoning» on user preferences. How?

The Movie Ontology

We can reason on the preferences and infer

that a user interested in The Matrix

(SciFi_and_Fantasy genre) is interested in

Imaginational_Entertainment and potentially

in Logical_Thrilling

from a flat representation

toward a graph-based

representation

toward a graph-based

representation

semantics explicitly encoded

explicit relations between

concepts exist: reasoning to infer

interesting information

ontologies typically

domain-dependant

huge effort to build and

mantain an ontology

very huge effort to

populate an ontology!

Vision

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking the

Item to a

knowledge graph

…….

semantic concepts

Linked Open Data

the giant global graph

Linked Open Data (cloud)

what is it?

(large) set of interconnected

semantic datasets

statistics

149 billions triples, 3,842 datasets (http://stats.lod2.eu)

DBpedia

core of the LOD cloud

RDF mapping of Wikipedia

Linked Open Data

cornerstones

methodology to publish, share and link

structured data on the Web

use of RDF

o every resource/entity/relation identified by a (unique) URI

o URI: http://dbpedia.org/resource/Halifax

re-use of existing properties to express an

agreed semantics and connect data sources

representation

The Matrix dbpedia

l:directo

r dbo:r

interesting non-trivial features come into play

dcterms:subject dcterms:subject

toward a (richer) graph-based

representation

we have the advantage of formal semantics defined in RDF, with

interesting features coming from Wikipedia

without the need of building and manually populating an ontology

graph-based data models can be exploited to define more semantic

features based on graph topology

another advantage

i4 (bipartite graph)

users = nodes

items = nodes

preferences = edges

Very intuitive

representation!

Graph-based Data Model

Semantic Graph-based Data Model

i4 DBpedia

mapping

dcterms:subject Films about

Rebellions

Quentin Tarantino

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

(1-hop)

u3 i3 u4

Rebellions

Quentin Tarantino

1999 films

dcterms:subject

American film

directors

dbo:award Lynne Thigpen

http://dbpedia.org/resource/Lynne_Thigpen

(2-hop)

u3 i3 u4

Rebellions

Quentin Tarantino

1999 films

dcterms:subject

American film

directors

dbo:award Lynne Thigpen

http://dbpedia.org/resource/Lynne_Thigpen

(n-hop)

PageRank

Spreading activation

Average Neighbors

Degree Centrality

(Feature Generation)

new semantic features describing the

item can be inferred by mining the

structure of the tripartite graph

Recap #5

o Ontologies enrich the representation

by introducing formal semantics, but

they are very complicated to maintain

and build

o Linked Open Data merge the

advantages of ontologies with the

simplicity of a collaborative knowledge

source as Wikipedia

o Such approaches build a

graph-based representation

that triggers the generation of

semantic topological features

o Inherently multilingual!

encoding exogenous semantics through Knowledge Graphs

How? Encoding endogenous semantics

(bottom-up approaches)

Insight

Very huge availability of textual content

Insight

We can use this huge amount of content to

directly learn a representation of words

Insight

What is «Peroni» ?

Pass me a Peroni!

I like Peroni

Football and Peroni, what a perfect Saturday!

Insight

What is «Budweiser» ?

Pass me a Budweiser!

I like Budweiser

Football and Budweiser, what a perfect Saturday!

Insight

What is «Budweiser» ?

Pass me a Budweiser!

I like Budweiser

Football and Budweiser, what a perfect Saturday!

Insight

What is «Peroni» ?

Pass me a Peroni!

I like Peroni

Football and Peroni, what a perfect Saturday!

Insight

Distributional Hypothesis

«Terms used in similar contexts

share a similar meaning»

Insight

The semantics learnt according to

terms usage is called «distributional»

Distributional Semantics

L.Wittgenstein

(Austrian philosopher)

by analyzing large corpora of

textual data it is possible to

infer information about the

usage (about the meaning) of

the terms

Definition co-occurrence co-occurrence

co-occurrence co-occurrence (*) Firth, J.R. A synopsis of linguistic theory

1930-1955. In Studies in Linguistic Analysis,

pp. 1-32, 1957.

by analyzing large corpora of

textual data it is possible to

infer information about the

usage (about the meaning) of

the terms

Definition co-occurrence co-occurrence

co-occurrence co-occurrence

Beer and wine share a similar meaning since

they are often used in similar contexts

(*) Firth, J.R. A synopsis of linguistic theory

1930-1955. In Studies in Linguistic Analysis,

pp. 1-32, 1957.

Term-Contexts Matrix

A vector-space representation is learnt

by encoding in which context each term is used

(This representation is called WordSpace)

A vector-space representation is learnt

by encoding in which context each term is used

Each row of the matrix is a vector!

beer vs wine: good overlap

Similar!

beer vs wine: no overlap

Not Similar!

WordSpace

beer wine

mojito

A vector space representation (called WordSpace)

is learnt according to terms usage in contexts

WordSpace

beer wine

mojito

A vector space representation (called WordSpace)

is learnt according to terms usage in contexts

Terms sharing a

similar usage

are very close

in the space

Key question: what is the context?

These approaches are very flexible since the «context» can

be set according to the granularity required by the

representation

Coarse-grained granularity:

context=whole document

Term-Contexts Matrix = Term-Document Matrix

(This is Vector Space Model!)

Vector Space Model is a Distributional Model

Fine-grained granularities:

context=paragraph, sentence, window of words

Fine-grained granularities:

PROs: the more fine-grained the representation, more precise the vectors

CONs: the more fine-grained the representation, the bigger the matrix

The flexibility of distributional semantics models

also regards the rows of the matrix

Keywords can be replaced with concepts

(as synsets or entities!)

Keywords can be replaced with concepts

(as synsets or entities!)

✔ ✔

Keanu Reeves and Al Pacino

are «connected» because they

both acted in Drama Films

Representing Documents

Given a WordSpace, a vector space representation of

documents (called DocSpace) is typically built as the

centroid vector of word representations

✔ ✔

Representing Documents

✔ ✔

DocSpace

Given a WordSpace, a vector space representation of

documents (called DocSpace) is typically built as the

centroid vector of word representations

Matrix Revolutions

Donnie Darko

similarity

calculations

between items

semantic

representation

The Matrix

• We can exploit the (big) corpora of

data to directly learn a semantic

vector-space representation of

the terms of a language

• Lightweight semantics, not

formally defined

• High flexibility: everything is a

vector: term/term similarity, doc/term,

term/doc, etc..

• Context can have different

granularities

• Huge amount of content is needed

• Matrices are particularly huge and

difficult to build

• Too many features: need for

dimensionality reduction

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

models share the same

insight but have important

distinguishing aspects

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

models share the same

insight but have important

distinguishing aspects

Explicit Semantic Analysis (ESA)

ESA builds a vector-space

semantic

representation of natural language texts in a

high-dimensional space of

comprehensible

concepts derived from

Wikipedia [Gabri06]

[Gabri06] E. Gabrilovich and S. Markovitch. Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text

Categorization with Encyclopedic Knowledge. In Proceedings of the 21th National Conf. on Artificial Intelligence and the

18th Innovative Applications of Artificial Intelligence Conference, pages 1301–1306. AAAI Press, 2006.

Panthera

World War II

Jane Fonda

Island

ESA matrix

ESA Concept 1 … Concept n

term 1 TF-IDF TF-IDF TF-IDF

… TF-IDF TF-IDF TF-IDF

term k TF-IDF TF-IDF TF-IDF Term

ESA is a Distributional

Semantic model which

uses Wikipedia

articles as context

Wikipedia articles

ESA matrix

term k TF-IDF TF-IDF TF-IDF Term

Wikipedia articles

semantic relatedness

between a word and a concept

TF-IDF score

Every Wikipedia article represents a concept

Article words are associated with the concept (TF-IDF)

Each Wikipedia page can be described in terms of

the words with the highest TF/IDF score (this is a

column of ESA

matrix)

ESA matrix

term k TF-IDF TF-IDF TF-IDF

The vector-space representation of each term is called

semantic interpretation vector

Every Wikipedia article represents a concept

Article words are associated with the concept (TF-IDF)

The semantics of a word is the

vector of its associations with

Wikipedia concepts

Important: the semantics of the words is not static.

It changes as Wikipedia articles are modified or

new articles are introduced.

ESA provides a representation which evolves over time!

«web» in 1980 «web» in 2000

The semantics of a text fragment is the

centroid of the semantics of its words

Game Controller

[0.32]

Mickey Mouse [0.81]

Game Controller

[0.64]

A semantic representation of an item can be built as the

centroid vector of the semantic interpretation vectors of

the terms in the plot.

A semantic representation of an item can be built as the

centroid vector of the semantic interpretation vectors of

the terms in the plot.

Representation can be further improved and enriched by

processing content through exogenous techniques (e.g.

entity linking) in order to catch complex concepts

semantic

relatedness

of a pair of text fragments

(e.g. description of two

items) computed by

comparing their

semantic

interpretation

vectors using the

cosine metric

Matrix Revolutions

Donnie Darko

The Matrix

Another advantage: ESA can be also used to generate a set

of relevant extra concepts describing the items.

Another advantage: ESA can be also used to generate a set

of relevant extra concepts describing the items.

The Wikipedia pages with the highest TF/IDF score in the

semantic interpretation vector of the item!

Artificial Intelligence

[0.61]

Distributional Model which uses

Wikipedia Article as context

Very Transparent representation

(columns have an explicit meaning)

Representation can evolve over time!

Also language-independent, thanks to

cross-language Wikipedia links

The whole matrix is very huge

«Empirical» tuning of the parameters:

how many articles? How many terms?

What is the thresholding?

When transparency is not so important,

it is possible to learn a more compact

vector-space representation of terms and items

Dimensionality Reduction techniques

a.k.a. Word embedding techniques Embedding = a smaller representation of words

(more recent – equivalent - buzzword )

a.k.a. Word embedding techniques Embedding = a smaller representation of words

Is this new?

Dimensionality reduction techniques

Latent Semantic Analysis (LSA) is a widespread

distributional semantics model which builds

a term/term matrix and calculates SVD over that matrix.

Dumais, Susan T. "Latent semantic

analysis." Annual review of information science

and technology 38.1 (2004): 188-230.

Dumais, Susan T. "Latent semantic

analysis." Annual review of information science

and technology 38.1 (2004): 188-230.

Truncated Singular Value Decomposition

induces higher-order (paradigmatic) relations through the truncated SVD

Latent Semantic Analysis (LSA) is a widespread

distributional semantics model which builds

a term/context matrix and calculates SVD over that matrix.

Singular Value Decomposition

PROBLEM

the huge co-occurrence matrix

SOLUTION

don’t build the huge co-occurrence matrix!

Use incremental and scalable techniques

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Dimensionality reduction

Random Indexing

It is an incremental and scalable technique

for dimensionality reduction.

M. Sahlgren. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations

between Words in High-dimensional Vector Spaces. PhD thesis, Stockholm University, 2006.

Random Indexing

Insight

Assign a vector to each context (word, documents, etc.). The

vector can be as big as you want.

Fill the vector with (almost) randomly assigned values.

Given a word, collect the contexts where that word appears.

Update the representation according to term co-occurrences.

The final representation is the «sum» of the contexts.

Obtain a (smaller but equivalent) vector space representation of

the terms

Random Indexing

Insight

Assign a vector to each context (word, documents, etc.). The

vector can be as big as you want.

Fill the vector with (almost) randomly assigned values.

Given a word, collect the contexts where that word appears.

Update the representation according to term co-occurrences.

The final representation is the «sum» of the contexts.

Obtain a (smaller but equivalent) vector space representation of

the terms

Random Indexing

Algorithm

Step 1 - definition of the context granularity:

Document? Paragraph? Sentence? Word?

Step 2 – building the random matrix R

each ‘context’ (e.g. sentence) is assigned a

context vector

dimension = k

allowed values = {-1, 0, +1}

small # of non-zero elements, i.e. sparse vectors

values distributed in a random way

Random Indexing

Context vectors of dimension k = 8

Each row is a «context»

Random Indexing

Algorithm

Step 3 – building the reduced space B

the vector space representation of a term t

obtained by combining the random vectors

of the context in which it occurs in

t1 ∈ {c1, c2, c5}

Random Indexing

Algorithm

Step 3 – building the reduced space B

0, 0, -1, 1, 0, 0, 0, 0

r2 1, 0, 0, 0, 0, 0, 0, -1

r3 0, 0, 0, 0, 0, -1, 1, 0

r4 -1, 1, 0, 0, 0, 0, 0, 0

1, 0, 0, -1, 1, 0, 0, 0

rn …

t1 ∈ {c1, c2, c5}

0, 0, -1, 1, 0, 0, 0, 0

r2 1, 0, 0, 0, 0, 0, 0, -1

r5 1, 0, 0, -1, 1, 0, 0, 0

t1 2, 0, -1, 0, 1, 0, 0, -1

Output: WordSpace

Random Indexing

Algorithm

Step 4 – building the document space

the vector space representation of a

document d obtained by

combining the vector space representation

of the terms that occur in the document

Output: DocSpace

WordSpace and DocSpace

4 … c

DocSpace WordSpace

Uniform representation

k is a simple

parameter

of the model

..even if it sounds weird

theory: Johnson-Lindenstrauss’ lemma [*]

Bm,k ≈ Am,n Rn,k k << n

distances between the points in the reduced space approximately preserved if

context vectors are nearly orthogonal

(and they are)

[*] Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings

into a Hilbert space. Contemporary mathematics, 26(189-206), 1.

Random Indexing

…. is also multilingual!

the same concept, expressed in different languages,

assumes the same position in language-based geometric

spaces

the position of beer in a geometric space based on English

and the position of birra in a geometric space based on

Italian are (almost) the same

Random Indexing

Italian WordSpace English WordSpace

cucchiaio

bicchiere

The position in the space can be slightly different, but the

relations similarity between terms still hold

Random Indexing

Incremental and Scalable technique for

learning word embeddings

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Word2Vec

• Distributional Model to learn Word Embeddings.

• Uses a two-layers neural network

• Training based on the Skip-Gram methodology

• Update of the network through Mini-batch or Stochastic Gradient

Descent

In a nutshell

Word2Vec

(Partial) Structure of the network

Input Layer:

• Vocabulary V

• |V| number of terms

• |V| nodes

• Each term is

represented through

a «one hot

representation»

Word2Vec

(Partial) Strucure of the network

Input Layer:

• Vocabulary V

• |V| nodes

• One-hot representation

Hidden Layer:

• N nodes

• N = size of the embeddings

• Parameter of the model

Word2Vec

Hidden Layer:

• N nodes

Weight of the network:

• Randomly set (initially)

• Updated through the training

Input Layer:

• Vocabulary V

• |V| nodes

Word2Vec

Hidden Layer:

• N nodes

Weight of the network:

• Randomly set (initially)

• Updated through the training

Final Representation for term tk

• Weights Extracted from the network

• tk=[wtkv1, wtkv2 … wtkvn]

Input Layer:

• Vocabulary V

• |V| nodes

Word2Vec

Training Procedure: how to create training examples?

Skip-Gram Methodology Continuous Bag-of-Words

Methodology

Given a word w(t), predict its

context w(t-2), t(t-1).. w(t+1), w(t+2) Given a context w(t-2), t(t-1)..

w(t+1), w(t+2) predict word w(t)

Word2Vec

Training Procedure: how to create training examples?

Skip-Gram Methodology

Given a word w(t), predict its

context w(t-2), t(t-1).. w(t+1), w(t+2)

Example Input: ”the quick brown fox

jumped over the lazy dog”

Window Size: 1

Contexts:

• ([the, brown], quick)

• ([quick, fox], brown)

• ([brown, jumped], fox) ...

Training Examples:

• (quick, the)

• (quick, brown)

• (brown, quick)

• (brown, fox) ...

Word2Vec

Training Procedure: how to optimize the model?

And probability is calculated through soft-max

The model tries to maximize The

probability of predicting a context C

given a word w

Given a corpus, we create of training examples through Skip-Gram.

Word2Vec

given a word w

Intuitively, probability is high when scalar product

is close to 1 when vectors are similar!

Given a corpus, we create a training examples through Skip-Gram.

Word2Vec

Given a corpus, we create a training examples through Skip-Gram.

given a word w

Intuitively, probability is high when scalar product

is close to 1 when vectors are similar!

Word2Vec is a distributional model since it learns a

representation such that couples (word,context)

appearing together have similar vectors

The error is collected and weights in the network are updated

accordingly. Typically is used Stochastic Gradient Descent or Mini-

Batch (every 128 or 512 training examples)

Representation can be really really

small (size<100, typically)

Trending - Recent and Very Hot

technique

Word2Vec

Learning Word Embeddings

through Neural Networks: it is not

based on «counting» co-

occurrences. It relies on «predict»

the distribution

Not transparent anymore

Needs more computational resources

…Let’s put everything together

Semantics

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Build a

Graph-based Data

Semantics

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Build a

Graph-based Data

Work on

Vector Space Model Work on

Vector Space Model

Semantics

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Build a

Graph-based Data

Work on

Vector Space Model Work on

Vector Space Model

Can Exogenous and Endogenous approaches be combined?

Exogenous Approaches as Entity Linking and WSD

work on the row of the matrix

Endogenous Approaches as ESA or Word2Vec

work on the columns of the matrix

Endogenous Approaches as ESA or Word2Vec

work on the columns of the matrix

Both approaches can be combined to obtain richer

and more precise semantic representations

(e.g. Word2Vec over textual description processed with WSD)

semantics-aware recommender systems

Semantics

Word Sense

Disambiguation

Entity

Linking …….

semantic concepts

Introduce semantics

by linking the

Item to a

knowledge graph

synsets

through WSD we process the

textual description of the item

and we obtain a semantics-

aware representation of the

item as output.

In this case, keyword-based

features are replaced with the

concepts (in this case, a

WordNet synset) they refer to.

Matrix

American

Australian

fiction

keywords

science

Synset-based representation

Artificial

Intelligence

M. Degemmis, P. Lops, and G. Semeraro. A Content-collaborative Recommender that Exploits WordNet-based User Profiles for

Neighborhood Formation. User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI), 17(3):217–255,

Springer Science + Business Media B.V., 2007.

G. Semeraro, M. Degemmis, P. Lops, and P. Basile. Combining Learning and Word Sense Disambiguation for Intelligent User Profiling. In M.

M. Veloso, editor, IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-

12, 2007 , pages 2856–2861. Morgan Kaufmann, 2007.

M.Degemmis, P. Lops, G. Semeraro, Pierpaolo Basile: Integrating tags in a semantic content-based recommender

ACM Conference on Recommender Systems, RecSys 2008: 163-170

Keywords- vs synsets-based profiles

EachMovie dataset

o 1,628 movies grouped into 10 categories

o Users who rated between 30 and 100 movies

o Movie content crawled from IMDb

In the context of cultural heritage personalization, does the integration of UGC and textual description of artwork collections cause an increase of the prediction accuracy in the process of recommending artifacts to users?

Results in a cultural heritage scenario

In RecSys ’08, Proceed. of the 2nd ACM Conference on Recommender Systems, pages 163–170, October 23-25, 2008,

Lausanne, Switzerland, ACM, 2008.

5-point rating scale

Textual description of items (static content)

Personal Tags

Social Tags (from other users): caravaggio, deposition, christ, cross, suffering, religion

Social Tags

passion

Personal Tags

Static Content

Social Tags

o Artwork representation

o Artist

o Title

o Description

o Tags

o change of text representation from vectors of words (BOW) into vectors of WordNet synsets (BOS)

o From tags to semantic tags

o supervised Learning

o Bayesian Classifier learned from artworks labeled with user ratings and tags

* Results averaged over the 30 study subjects

-based

Overall accuracy F1 ≈ 85%

personalized museum tours by arranging the most interesting

items for the active user

step forward to take into account spatial layout & time constraint

L. Iaquinta, M. de Gemmis, P. Lops, G. Semeraro: Recommendations toward Serendipitous Diversions. ISDA 2009: 1049-1054

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking the

Item to a

knowledge graph

…….

semantic concepts

Semantic Analysis using Ontologies

Quickstep & Foxtrot

o on-line academic research

papers recommenders

o items and user profiles represented through a research

topic ontology

o is-a relationships exploited to

infer general interests when specific

topics are observed

o match based on the correlation

between the topics in the user

profile and topics in papers

ACM Transactions

on Information Systems, 22(1):54–88, 2004

. In W. Nejdl, J. Kay,

P. Pu, and E. Herder, editors, Adaptive Hypermedia and AdaptiveWeb-Based Systems, volume 5149 of Lecture Notes in Computer

Science, pages 279–283. Springer, 2008.

News@hand

o user interests propagation from concepts which received the user feedback to

others related ones though spreading activation

o contextualized propagation strategies of user interests

horizontal propagation among siblings

anisotropic vertical propagation, i.e. user interests propagated differently

upward and downward

. Inf. Sci.,

250:40–60, 2013.

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking the

Item to a

knowledge graph

…….

semantic concepts

Linked Open Data & RecSys

structured information source

for item descriptions

LOD & Recommender Systems

Vector Space Model for LOD

Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based

Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS) - 2012 (Best Paper Award)

Vector Space Model for LOD

(starring, directors,

subject, etc.)

Property subset evaluation

subject+broader solution

better than only subject or

subject+more broaders

too many broaders

introduce noise

best solution achieved

subject+broader+genres

Graph-based RecSys

Recommendations

obtained by mining

the graph

Graph-based RecSys

Recommendations

obtained by mining

the graph

Identification of the

most relevant (target)

nodes, according to

the recommendation

scenario

Graph-based RecSys

Recommendations

obtained by mining

the graph

Identification of the

most relevant (target)

nodes, according to

the recommendation

scenario

PageRank

Spreading Activation

Personalized PageRank

Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco De

Gemmis, Giovanni Semeraro.

UMAP 2016

Graph-based RecSys

Personalized PageRank [*] to identify

the most relevant nodes in the graph

[*] T. H. Haveliwala. Topic-

Sensitive PageRank: A

Context-Sensitive Ranking

Algorithm for Web Search.

IEEE Trans. Knowl. Data

Eng., 15(4):784–796, 2003.

Graph-based RecSys

MovieLens 100K dataset

G = Personalized PageRank on Bipartite User-Item Graph

G+LOD = Tripartite Graph modeling also Linked Open Data

Graph-based RecSys

is it necessary to inject all the properties available in LOD cloud?

Graph-based RecSys

what are the most

promising properties

to include?

manual selection o domain-specific

properties

o most frequent properties

automatic selection

o more difficult to

implement

Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco De

Gemmis, Giovanni Semeraro.

UMAP 2016

Feature selection

selecting the most promising subset of LOD-based

properties

possible techniques

PageRank

Principal Component Analysis

Information Gain

Information Gain Ratio

Mininum Redundancy Maximum Relevance

Graph-based RecSys

tradeoff between accuracy and diversity

Graph-based RecSys

Comparison to state of the art

we can further build some

extra features by

mining the paths

occurring in the graph

path acyclic sequence of relations ( s , .. rl , .. rL )

frequency of pathj in the sub-graph related to u ad x

s i2 p

(s, p2 ,

𝑤𝑢𝑥(𝑗) = #𝑝𝑎𝑡ℎ𝑢𝑥(𝑗)

#𝑝𝑎𝑡ℎ𝑢𝑥(𝑗)𝑗

(Path Based Features)

Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi: Top-N recommendations from implicit feedback leveraging

linked open data. RecSys 2013: 85-92

Sprank Systems

o analysis of complex relations between user

preferences and the target item

o extraction of path-based features

Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi: Top-N recommendations from implicit feedback leveraging

linked open data. RecSys 2013: 85-92

wu3X1 ?

path1 (s, s, s) : 1

path1 (s, s, s) : 2

path2 (s, p2, p1) : 1

path1 (s, s, s) : 2

path2 (s, p2, p1) : 2

path1 (s, s, s) : 2

path2 (s, p2, p1) : 2

path3 (s, p2, p3, p1) : 1

path1 (s, s, s) : 2

path2 (s, p2, p1) : 2

path3 (s, p2, p3, p1) : 1

𝑤𝑢3𝑥1 1 = 2

𝑤𝑢3𝑥1 2 = 2

𝑤𝑢3𝑥1 3 = 1

Evaluation:

LOD-based overcomes state-of-the-art

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Word2Vec

• Empirical Comparison of Word Embedding Techniques

for Content-based Recommender Systems [*]

• Methodology

• Build a WordSpace using different Word Embedding

techniques (and different sizes)

• Build a DocSpace as the centroid vectors of term vectors

• Build User Profiles as centroid of the items they liked

• Provide Users with Recommendations

• Compare the approaches

European Conference on Information Retrieval

Word2Vec

Results on Dbbook and MovieLens data

European Conference on Information Retrieval

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

• Enhanced Vector Space Model [*]

• Content-based Recommendation Framework

• Cornerstones

• Semantics modeled through Distributional Models

• Random Indexing for Dimensionality Reduction

• Negative Preferences modeled through Quantum Negation [^]

• User Profiles as centroid vectors of items representation

• Recommendations through Cosine Similarity

[*] Musto, Cataldo. "Enhanced vector space models for content-based recommender systems." Proceedings of the fourth

ACM conference on Recommender systems. ACM, 2010.

Mathematics of language

Musto, Cataldo. "Enhanced vector space models for content-based recommender systems." Proceedings of the fourth ACM

conference on Recommender systems. ACM, 2010.

Distributional Models

to build DocSpace of the items

(whole document used as context)

Random Indexing for

Dimensionality Reduction

Cornerstones

• Given two vectors a e b

• Through Quantum Negation we can define a Vector (a ∧¬b) • Formally:

• Projection of vector a on the subspace orthogonal to that generated

by vector b

• Intuitively:

• Vector «a» models «positive» preferences

• Vector «b» models «negative» preferences

• Through quantum negation we get a unique vector modeling both

aspects

• Close to vectors containing as many as possible features from «a»

and as less as possible features from «b»

Mathematics of language

• User Profiles

• Calculated as centroid vectors of the items the user liked/disliked

Random Indexing-

based Profiles (RI)

Musto, Cataldo. "Enhanced vector space

models for content-based recommender

systems." Proceedings of the fourth ACM

conference on Recommender systems. ACM,

Cornerstones

• User Profiles

• Calculated as centroid vectors of the items the user liked/disliked

Random Indexing-

based Profiles (W-RI)

Quantum Negation- based

Profiles (W-QN)

Musto, Cataldo. "Enhanced vector space

models for content-based recommender

systems." Proceedings of the fourth ACM

conference on Recommender systems. ACM,

Cornerstones

• Recommendations

• Similarity Calculations on DocSpace

Experiments

The size of the embeddings does not significantly affect the overall

accuracy of eVsm (MovieLens data)

Experiments

Quantum Negation improves the accuracy of the model

(MovieLens data, embedding size=100)

Experiments

eVSM significantly overcame all the baselines.

(MovieLens data, embedding size=400)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Word Sense

Disambiguation

Entity

Linking

C-eVSM

• Contextual Enhanced Vector Space Model [*]

• Extension of eVSM: context-aware Framework

• Cornerstones

• Entity Linking of the content through Tag.me

• Semantics modeled through Distributional Models

• Random Indexing for Dimensionality Reduction

• Distributional Models also used to build a representation of

the context

• Context-aware User Profiles as centroid vectors

• Recommendations through Cosine Similarity

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

C-eVSM

• Context-aware User Profiles

Let u be the target user

Let ck be a contextual variable (e.g. task, mood, etc.)

Let vj be its value (e.g. task=running, mood=sad, etc.)

C-eVSM

profile

C-eVSM

C-WRI(u,ck,v

j) = α * WRI(u) + (1-α) * context(u,c

profile

A Vector representing value v for

context c is introduced (e.g.

company=friends)

C-eVSM

C-WRI(u,ck,v

profile

company=friends)

Linear Combination

(a=1 eVSM no context

taken into account)

C-eVSM

C-WRI(u,ck,v

profile

company=friends)

Linear Combination

(a=1 eVSM no context

taken into account)

C-eVSM

C-WRI(u,ck,v

profile

company=friends)

Why this

formula?

C-eVSM

• Why this formula?

Insight: it exists a set of terms that is more descriptive

of items relevant in that specific context

for a romantic dinner, e.g. candlelight, seaview, violin

C-eVSM

Thanks to Distributional Semantics Models it is

possible to build a vector-space representation of the

context which emphasize the importance of those

terms, since they are more used ( more important) in

that specific contextual setting.

C-eVSM

Entities are better than simple keywords!

Selection of Results

C-eVSM

Incorporating contextual information in recommender systems using a multi-

dimensional approach

Compared to Context-aware Collaborative Filtering (CACF)

[*] algorithm: better in 7 contextual segments

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Text Categorization [Gabri09] experiments on diverse datasets

Semantic relatedness of

words and texts [Gabri09] cosine similarity between vectors of ESA concepts

Information Retrieval [Egozi08, Egozi11] ESA-based IR algorithm enriching documents and queries

ESA effectively used for

[Gabri09] E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. Journal of Artificial

Intelligence Research 34:443-498, 2009.

[Egozi08] Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch: Concept-Based Feature Generation and Selection for Information Retrieval.

AAAI 2008, 1132-1137, 2008.

[Egozi11] Ofer Egozi, Shaul Markovitch, Evgeniy Gabrilovich. Concept-Based Information Retrieval using Explicit Semantic Analysis.

ACM Transactions on Information Systems 29(2), April 2011.

what about ESA for Information Filtering?

Electronic Program Guides

problem

description of TV shows too short or

poorly meaningful to feed a

content-based recommendation algorithm

solution

Explicit Semantic Analysis exploited to obtain an

enhanced representation

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199.

Springer, 2012

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012

Wikipedia Articles related to the

TV show are added to the

description

user profile tv show

motogp

sports

motorbike

competition

2012 Superbike

Italian Grand Prix

motogp

sports

motorbike

competition

2012 Superbike

Italian Grand Prix

X No matching!

2012 Superbike

Italian Grand Prix

motogp

superbike

sports

motorbike

formula 1

competition

Through ESA we can

add new features to

the profile and we

can improve the

overlap between

textual description

2012 Superbike

Italian Grand Prix

motogp

superbike

sports

motorbike

formula 1

competition

Matching!

results on Aprico.tv data

The more Wikipedia Concepts are added to the textual description

of the items (eBOW+60), the best the precision of the algorithm

eBOW = Bag of Words + Wikipedia Concepts

Semantics User Profiles

• Research Question

• Is it possible to exploit semantic representation

techniques to improve the quality of user profiles?

• Methodology

• Build User Profiles by extracting data available from social

networks

• Process User Profiles through Semantics-aware

Techniques

• Evaluate the effectiveness of the profiles

• Accuracy

• Transparency

• Serendipity

Narducci, F., Musto, C., Semeraro, G., Lops, P., & de Gemmis, M.

(2013, August). Exploiting big data for enhanced representations

in content-based recommender systems. In International

Conference on Electronic Commerce and Web Technologies (pp. 182-

193). Springer Berlin Heidelberg.

• Techniques

• Keyword-based Profiles

• Entity Linking (Tag.me)

• Explicit Semantic Analysis

• Scenario I’m in trepidation for my first riding lesson!, I’m really anxious

for the soccer match :( , This summer I will flight by Ryanair to London!, Ryanair

really cheapest company!, Ryanair lost my luggage :(, This summer holidays are

really amazing!, Total relax during these holidays!

Results – Experiment 1

• User Study involving 63 users • Twitter and Facebook as Social Networks

• Users were provided with a set of personalized news

• Answers gathered through User Feedback on recommended news

• Users were provided with a set of personalized news

• Answers gathered through User Feedback on recommended news

No relevance feedback

rating

1.49 0 5 1.12

1.89 0 5 1.47

2.86 1 5 1.30

2.59 0 5 1.37

Transparency Serendipity

KEYWORDS 1.33 0 3 0.65 0.42 0 2 0.57

TAG.ME 3.88 2 5 0.82 0.54 0 2 0.61

ESA 1.16 0 4 1.00 3.24 0 5 1.24

• Users were provided with a tag cloud describing their profile

• Answers gathered through Questionnaires

Cross-lingual recommender systems

Cross-lingual access: motivations

relevant information exist in different languages

Web is becoming more and more multilingual

users are becoming increasing polyglot

more than half of the world population bilingual

cross-language recommender systems

can likely increase the number of tail products suggested (25-30% of sales for online stores)

Cross-lingual access: problems

Vocabulary mismatch

use of different languages

extreme case of vocabulary mismatch

Semantic analytics for cross-lingual access

Sense-based representations

inherently multilingual

terms in each specific language change, while

concepts (word meanings) remain the same

across different languages

match between

items and user profiles at a conceptual level

(Cross-lingual) Concept-based representations

MultiWordNet

Sense-based representations

Word Sense Disambiguation (JIGSAW) based

on Multiwordnet as sense repository

multilingual lexical database that supports

English, Italian, Spanish, Portuguese, Hebrew,

Romanian, Latin

alignment between synsets in the different languages

semantic relations imported and preserved

MultiWordNet

Bag of MultiWordNet synsets

bag of

MultiWordNet

synsets

bag of

MultiWordNet

synsets

between

senses

Some results

cross-language movie recommendation scenario

profiles learned from ENG/ITA descriptions

recommendation provided on ITA/ENG descriptions

MovieLens dataset, F1 measure, Wikipedia source for descriptions

Cross-lingual representation: Tagme

Cross-language links

Cross-lingual representation: Tagme

Film d’azione Lana e Lilly Wachowskis

Laurence Fishburne Keanu Reeves

distopia …

action film The Wachowskis

Keanu Reeves Laurence Fishburne

Dystopia Perception

Carrie-Anne Moss Cyberspace

Text Language L1

Text Language Ln

Text Language L2

Translated Text PIVOT LANGUAGE

C1 C2 C3 … … Cn

ESA MATRIX PIVOT LANGUAGE

Wikipedia articles

TF-IDF

TRANSLATION PROCESS

Cross-lingual representation: ESA

Translation-based ESA

ESA MATRIX L1 C1-L1 C2-L1 … Cn-L1

ESA MATRIX L2 C1-L2 C2-L2 … Cn-L2

ESA MATRIX LN C1-LN C2-LN … Cn-LN

Cross-lingual representation: ESA

Cross-language ESA

Cross-lingual representation: Babelfy

Preliminary results

effectiveness of knowledge-based strategies to provide

cross-lingual recommendations

MovieLens and DBbook datasets

F1 measure, Wikipedia source for descriptions

English and Italian languages

Cross-lingual representation: Distributional models

distribution of the terms

(almost) the same in different languages

o cross-lingual representation comes with

no costs thanks to the distributional hypothesis

…. is also multilingual! (Recap)

Italian WordSpace English WordSpace

cucchiaio

bicchiere

The position in the space can be slightly different, but the

relations similarity between terms still hold

Multilingual DocSpace

Italian DocSpace English DocSpace

D7_L2 D8_L2

By following the same procedure we can

obtain a multilingual DocSpace

Different documents in different languages are represented in a uniform space

D7_L2 D8_L2

By following the same procedure we can

obtain a multilingual DocSpace

How to build a cross-lingual recommender?

D7_L2 D8_L2

We build a user profile in L1 (Italian DocSpace)

D7_L2 D8_L2

We can «move» the profile in L2 (EnglishDocSpace)

D7_L2 D8_L2

We can «move» the profile in L2 (EnglishDocSpace) We can use similarity measures to suggest items in different language

Some results

effectiveness of knowledge-based strategies to provide

cross-lingual recommendations

MovieLens dataset

F1 measure

comparable results (gap not statistically significant)

C. Musto, F. Narducci, P. Basile, P. Lops, M. de Gemmis, G. Semeraro:"Cross-language information filtering: Word sense disambiguation

vs. distributional models." AI*IA 2011: 250-261

Explaining recommendations

Graph-based RecSys: explanations

are properties useful for providing

explanations?

advantages readability of properties

disadvantages difficult to generate

natural language

explanations

selection of the properties

to include in the

explanation

main intuition of the EXPLOD system

o graph connecting user preferences and

recommendations through a set of

LOD-based properties

o scoring and ranking properties

Cataldo Musto, Fedelucio Narducci , Pasquale Lops, Marco de Gemmis, Giovanni Semeraro:

. RecSys 2016, to appear

Scoring properties in EXPLOD

items in the

user profile

items in the

recommendation list property

number of edges

connecting the propertyc

with the items in

the user profile

number of edges

connecting the propertyc

with the items in

the recommendation set

weighting factors

adaptation of the

Inverse

Document

Frequency

higher score to properties highly connected to the items in

both the user profile and the recommendation list,

and which are not common.

From properties to Natural Language explanation

I recommend you The Dark Knight because you often like Films

shot in the United Kingdom as Inception and Forrest Gump. In

addition, you sometimes like Films produced by Christopher

Nolan as Inception and Screenplays by Christopher Nolan as

Inception

From properties to Natural Language explanation

o top-ranked properties to fill in a template-based structure of

the explanation

o LOD cloud-properties mapped to natural language

expressions

o adverbs obtained by mapping the normalized occurrence

of that property to a different range of the score

preliminary results

aim question

transparency I understood why this movie

was recommended to me

o topic

o director o distributor

o music composer

persuasion

The explanation made the

recommendation more

convincing

o awards

o director

o location

o producer

engagement

The explanation helped me

discover new information about

this movie

o writer

o director

o producer

o distributor

The explanation increased my

trust in the recommender

system

o awards o composer

o producer

o topic

effectiveness I like this recommendation o director

o writer

o location o composer

user study involving 308 subjects

Semantic analysis of social streams

Social Networks

novel data silos

To design a unique framework implementing

a pipeline of semantic analysis techniques

Our Contribution: CrowdPulse

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Our Contribution: CrowdPulse

CrowdPulse Workflow

Each «project» is an instance of such a workflow

CrowdPulse Workflow

Each «project» is an instance of such a workflow

Every module is set according to the needs of the scenario

CrowdPulse

Solution:

Input: textual content

Output: identification and

disambiguation of the

entities mentioned in the

CrowdPulse

Solution:

CrowdPulse

Solution:

Overall sentiment: :-(

We implemented:

• A lexicon-based approach, which assigns a sentiment

according to the words in the social content

• A supervised classification algorithm, which exploits

labeled examples to learn a classification model

Supervised learning

Unsupervised learning

Linguistic Analysis (Distributional Models)

classification, regression tasks

clustering

building word spaces, similarity between

concepts, analysis of terms usage, etc.

CrowdPulse

Step 4: Domain-Specific Processing

CrowdPulse natively supports all these methodologies

The choice is typically scenario-dependent

CrowdPulse

CrowdPulse Workflow

(recap)

Research Question:

Is it possible to extract and process social

media to monitor in real time people feelings,

opinions and sentiments about the current

state of the social capital of L’Aquila?

Use Case

L’Aquila Social Urban Network

Use Case

Heuristics:

- Twitter users (local newspapers, mention to politicians)

- Twitter content+geo (50km around l’Aquila and/or specific hashtags as #laquila

#earthquake, etc)

Use Case

Extracted Content

Use Case

Semantic and Sentiment Analysis of Extracted Content

Use Case

Domain-specific Task

Given a piece of social content,

we have to classify it against the social indicators

Use Case

Classification Task

Build a supervised classification model

for each social indicator

Use Case

Input: Tweet + Social Indicators Classification Model

Use Case

Output: Social Indicators and Sentiment conveyed

Use Case

The «score» of a social Indicator is the sum of the

Sentiment conveyed by the Tweets

Use Case

Overall score of the social indicators between March and August 2014

Use Case

COMMUNITY PROMOTER

DEFINES SOME INITIATIVES TO EMPOWER THE SOCIAL CAPITAL

MONITORS THE STATE OF THE SOCIAL INDICATORS

Connecting Real Communities

With Virtual Communities

Use Case

The Italian Hate Map

http://users.humboldt.edu/mstephens/hate/hate_map.html Inspired by the

Hate Map built by the

Humboldt University

Is it possible to exploit

techniques for semantic

analysis of social media to

detect intolerant content

posted on social networks and

identify the most at-risk areas

of the Italian country?

Research Question

Use Case

Heuristics: Twitter content

- 76 intolerant seed terms, defined by the psychologists teams

- 5 intolerance dimensions: violence (against women), racism,

homophobia, disability, anti-semitism

Use Case

Many non-intolerant Tweets are extracted!

Use Case

Solution: replace a simple keyword-based approach

with a supervised classification model

Use Case

Entities and Wikipedia categories used as features

Use Case

Sample of Tweet is manually labeled to build classification models

Use Case

Tweet about an Italian ministry

Tweet about an Italian football player

Violence against women Disability

Racism Homophobia

Use Case

Given the maps and given the output of the linguistic analysis of

intolerant Tweets (co-occurrences between terms, time lapse, etc.), the

psychologists team defined some guidelines to tackle and prevent

intolerant behaviors.

These guidelines have been freely distributed to public

administrations on early 2015.

CrowdPulse

Lessons Learned

Pipeline of state of the art techniques Entity Linking, Sentiment Analysis, Machine Learning, Data Visualization

Use Cases.

REAL-TIME SEMANTIC CONTENT ANALYSIS

The outcomes of both use cases showed that very

complex phenomena can be analyzed in a totally new

way, thanks to the huge availability of textual data

Readings

Semantics-aware Recommender Systems

o C. Musto, G. Semeraro, M. de Gemmis, P. Lops: Learning Word Embeddings from Wikipedia for Content-

Based Recommender Systems. ECIR 2016: 729-734

o M. de Gemmis, P. Lops, C. Musto, F.Narducci, G. Semeraro: Semantics-Aware Content-Based

Recommender Systems. Recommender Systems Handbook 2015: 119-159

o C. Musto, G. Semeraro, M. de Gemmis, P. Lops: Word Embedding Techniques for Content-based

Recommender Systems: An Empirical Evaluation. RecSys Posters 2015

o C. Musto, P. Basile, M. de Gemmis, P. Lops, G. Semeraro, S. Rutigliano: Automatic Selection of Linked

Open Data Features in Graph-based Recommender Systems. CBRecSys@RecSys 2015: 10-13

o P. Basile, C. Musto, M. de Gemmis, P. Lops, F. Narducci, G. Semeraro: Content-Based Recommender

Systems + DBpedia Knowledge = Semantics-Aware Recommender Systems. SemWebEval@ESWC

2014: 163-169

o C. Musto, P. Basile, P. Lops, M. de Gemmis, G. Semeraro: Linked Open Data-enabled Strategies for Top-N

Recommendations. CBRecSys@RecSys 2014: 49-56

o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: Combining Distributional Semantics and Entity Linking

for Context-Aware Content-Based Recommendation. UMAP 2014: 381-392

o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: Contextual eVSM: A Content-Based Context-Aware

Recommendation Framework Based on Distributional Semantics. EC-Web 2013: 125-136

o C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, R. Clout:

Enhanced Semantic TV-Show Representation for Personalized Electronic Program Guides. UMAP

2012: 188-199

o M. Degemmis, P. Lops, G. Semeraro: A content-collaborative recommender that exploits WordNet-based

user profiles for neighborhood formation. User Model. User-Adapt. Interact. 17(3): 217-255 (2007)

o G. Semeraro, M. Degemmis, P. Lops, P. Basile: Combining Learning and Word Sense Disambiguation for

Intelligent User Profiling. IJCAI 2007: 2856-2861

Readings

Cross-language Recommender Systems

o C. Musto, F. Narducci, P. Basile, P. Lops, M. de Gemmis, G. Semeraro: Cross-Language Information

Filtering: Word Sense Disambiguation vs. Distributional Models. AI*IA 2011: 250-261

o P. Lops, C. Musto, F. Narducci, M. de Gemmis, P. Basile, G. Semeraro: Cross-Language Personalization

through a Semantic Content-Based Recommender System. AIMSA 2010: 52-60

Explanations

o C. Musto, F. Narducci, P. Lops, M. de Gemmis, G. Semeraro: ExpLOD: a framework for Explaining

Recommendations based on the Linked Open Data cloud. RecSys 2016, to appear

Semantic Analysis of Social Streams

o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: CrowdPulse: A framework for real-time semantic

analysis of social streams. Inf. Syst. 54: 127-146 (2015)

semantics-aware techniques for social media analysis, user modeling and recommender systems -...

Technology

umap 2005 vol. 25 no. 4

modelling collaborative semantics with a geographic...

performance prediction in recommender...

semantics-aware content-based recommender systems

[umap 2016] user-oriented context suggestion

visio 2011 umap

dimension reduction pca, tsne, umap€¦ · dimension...

la cartographie collaborative avec umap

umap 2016 opening ceremony

visio 2011 umap - presentación

tutoriel umap #01 : créer une carte personnalisée

recommender introduction to recommender systems and

umap 2003 vol. 24 no. 3

umap 1998 vol. 19 no. 1 segmento

umap module 750

umap - wordpress.com · top: tiny umap badge on boot,...

workshop umap - heidelberg 18. and 21.09.2019 - euyouth...

ユニメディア umap資料

umap 2010 big island - hawai

ieb2011, umap