social data exploration - paris dauphine universitypoc/spoc/sihemaysocomp.pdfsocial data exploration...
TRANSCRIPT
![Page 1: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/1.jpg)
Social Data Exploration
Sihem Amer-Yahia DR CNRS @ LIG
Big Data & Optimization Workshop 12ème Séminaire POC
LIP6 Dec 5th, 2014
![Page 2: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/2.jpg)
Collaborative data model
Item 1
Item 2
Item 3
Item 4
Item 5
Item 6
Item 7
boolean, rating, tag, sentiment…
User space (with attributes)
Item space (with attributes)
user1
user2
user3
user4
user5
User6
2
![Page 3: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/3.jpg)
3
MovieLens instances
ID Title Genre Director Name Gender Location Rating
1 Titanic Drama James Cameron
Amy Female New York 8.5
2 Schindler’s List
Drama Steven Speilberg
John Male New York 7.0
ID Title Genre Director Name Gender Location Tags
1 Titanic Drama James Cameron
Amy Female New York love, Oscar
2 Schindler’s List
Drama Steven Speilberg
John Male New York history, Oscar
![Page 4: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/4.jpg)
4
More on MovieLens datasets http://grouplens.org/datasets/movielens/
![Page 5: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/5.jpg)
5
n Rating exploration q Meaningful Interpretations of Collaborative Ratings
n Tag exploration q Who Tags What? An Analysis Framework
n Perspectives
Social Data Exploration
![Page 6: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/6.jpg)
6
Meaningful Interpretations of Collaborative Ratings
![Page 7: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/7.jpg)
7
Data Model
n Collaborative rating site: Set of Items, Set of Users, Ratings q Rating tuple: <item attributes, user attributes, rating>
n Group: Set of ratings describable by a set of attribute values
n Notion of group based on data cube q OLAP literature for mining multidimensional data
ID Title Genre Director Name Gender Location Rating
1 Titanic Drama James Cameron
Amy Female New York 8.5
2 Schindler’s List
Drama Steven Speilberg
John Male New York 7.0
![Page 8: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/8.jpg)
8
Exploration Space
Partial Rating Lattice for a Movie
(M:Male, Y:Young, CA:California, S:Student)
Each node/cuboid in lattice is a group
A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student
Task Quickly identify
“good” groups in the lattice that help analysts
understand ratings effectively
![Page 9: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/9.jpg)
9
DEM: Meaningful Description Mining
n For an input item covering RI ratings, return set C of groups, such that: description error is minimized, subject to:
n |C| ≤ k; n coverage ≥ α
Description Error
Measures how well a group average rating approximates each individual rating belonging to it
Coverage: measures percentage of ratings covered by returned groups n DEM is NP-Hard: proof details in [1] [1] MRI: Meaningful Interpretations of Collaborative Ratings, S. Amer-Yahia, Mahashweta Das, Gautam Das and Cong Yu. In the Proceedings of the International Conference on Very Large Databases (PVLDB), 2011.
![Page 10: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/10.jpg)
10
DEM: Meaningful Description Mining
q Identify groups of reviewers who consistently share similar ratings on items
![Page 11: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/11.jpg)
11
DEM: Meaningful Description Mining
To verify NP-completeness, we reduce the Exact 3-Set Cover problem (EC3) to the decision version of our problem. EC3 is the problem of finding an exact cover for a finite set U, where each of the subsets available for use contain exactly 3 elements. The EC3 problem is proved to be NP-Complete by a reduction from the Three Dimensional Matching problem in computational complexity theory
![Page 12: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/12.jpg)
12
DEM Algorithms
n Exact Algorithm (E-DEM) q Brute-force enumerating all possible combinations of cuboids in
lattice to return the exact (i.e., optimal) set as rating descriptions
n Random Restart Hill Climbing Algorithm q Often fails to satisfy Coverage constraint; Large number of
restarts required q Need an algorithm that optimizes both Coverage and Description
Error constraints simultaneously
n Randomized Hill Exploration Algorithm (RHE-DEM)
![Page 13: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/13.jpg)
13
RHE-DEM Algorithm
C= {Male, Student} {California, Student}
Satisfy Coverage
Minimize Error
![Page 14: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/14.jpg)
14
RHE-DEM Algorithm
Say, C does not satisfy Coverage Constraint
C= {Male, Student} {California, Student}
Satisfy Coverage
Minimize Error
![Page 15: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/15.jpg)
15
RHE-DEM Algorithm
C= {Male, Student} {California, Student}
Satisfy Coverage
Minimize Error
C= {Male} {California,Student}
C= {Student} {California,Student}
![Page 16: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/16.jpg)
16
RHE-DEM Algorithm
C= {Male} {California, Student}
Say, C satisfies Coverage Constraint
Satisfy Coverage
Minimize Error √
![Page 17: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/17.jpg)
17
RHE-DEM Algorithm
C= {Male} {California, Student}
Satisfy Coverage
Minimize Error √
![Page 18: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/18.jpg)
18
RHE-DEM Algorithm
C= {Male} {California, Student}
Satisfy Coverage
Minimize Error √
![Page 19: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/19.jpg)
19
RHE-DEM Algorithm
C= {Male} {Student}
Satisfy Coverage
Minimize Error √ √
![Page 20: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/20.jpg)
20
DIM: Meaningful Difference Mining
n For an input item covering RI+ RI
- ratings, return set C of cuboids, such that:
q difference balance is minimized, subject to: n |C| ≤ k; n ≥ α ∩ ≥ α
![Page 21: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/21.jpg)
21
DIM: Meaningful Difference Mining
Difference Balance
Measures whether the positive and negative ratings are “mingled together" (high balance) or “separated apart" (low balance)
Coverage
Measures the percentage of +/- ratings covered by returned groups n DIM is NP-Hard: proof details in [1] [1] S. Amer-Yahia, Mahashweta Das, Gautam Das, Cong Yu: MRI: Meaningful Interpretations of Collaborative Ratings,. In PVLDB 2011.
![Page 22: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/22.jpg)
22
DIM: Meaningful Difference Mining
q Identify groups of reviewers who consistently disagree on item ratings
![Page 23: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/23.jpg)
![Page 24: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/24.jpg)
24
DIM: Meaningful Difference Mining
NP-Completeness: reduction of the Exact 3-Set Cover problem (EC3).
![Page 25: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/25.jpg)
25
DIM Algorithms
n Exact Algorithm (E-DIM) n Randomized Hill Exploration Algorithm (RHE-DIM)
q Unlike DEM “error”, DIM “balance” computation is expensive n Quadratic computation scanning all possible positive and negative ratings for
each set of cuboids
q Introduce the concept of fundamental regions to aid faster balance computation n Each rating tuple is a k-bit vector where a bit is 1 if the tuple is covered by a
group n A fundamental region is the set of rating tuples that share the same signature n Partition space of all ratings and aggregate rating tuples in each region
![Page 26: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/26.jpg)
26
DIM Algorithms: Fundamental Region
set of k=2 cuboids having 75 ratings (46+, 33-)
balance =
C1 = {Male, Student}
C2 = {California, Student}
![Page 27: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/27.jpg)
27
Summary of Rating Exploration
n DEM and DIM are hard problems q Leverage the lattice structure to improve coverage q Exploit properties of rating function for faster error computation
n Explore other rating aggregation functions n Explore other constraints: e.g., group size n Explore other optimization dimensions: group diversity
![Page 28: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/28.jpg)
28
n Rating exploration q MRI: Meaningful Interpretations of Collaborative Ratings
n Tag exploration q Who Tags What? An Analysis Framework
n Perspectives
Social Data Exploration
![Page 29: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/29.jpg)
29
Collaborative Tagging Site (Amazon)
![Page 30: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/30.jpg)
30
Collaborative Tagging Site (LastFM)
![Page 31: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/31.jpg)
31
Exploring Collaborative Tagging in MovieLens
Tag Signature for all Users
Tag Signature for all CA Users
![Page 32: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/32.jpg)
32
Exploring Collaborative Tagging
n Exploration considers three dimensions
q User, Item, Tag n and two alternative measures
q Similarity, Diversity
![Page 33: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/33.jpg)
33
q Tagging action tuple: <user attributes, item attributes, tags>
ID Title Genre Director Name Gender Location Tags
1 Titanic Drama James Cameron
Amy Female New York love, Oscar
2 Schindler’s List
Drama Steven Speilberg
John Male New York history, Oscar
Data Model
![Page 34: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/34.jpg)
34
Tagging Behavior Dual Mining Problem (TagDM)
![Page 35: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/35.jpg)
35
Tagging Behavior Dual mining Problem Instance
![Page 36: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/36.jpg)
Problem: Tagging Behavior Dual Mining (TagDM)
n Identify similar groups of reviewers who share similar tagging behavior for diverse set of items
36
Male Young
comedy, drama, friendship
drama, friendship
![Page 37: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/37.jpg)
37
Tagging Behavior Dual Mining Instance
![Page 38: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/38.jpg)
Tagging Behavior Dual Mining Instance
n Identify diverse groups of reviewers who share diverse tagging behavior for similar items
38
Male Teen Female Teen
gun, special effects violence, gory
![Page 39: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/39.jpg)
TagDM is NP-Hard (proof details in [2])
39
[2] Mahashweta Das, Saravanan Thirumuruganathan, Sihem AmerYahia, Gautam Das, Cong Yu: Who Tags What? An Analysis Framework, In PVLDB 2012.
![Page 40: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/40.jpg)
n LSH (Locality Sensitive Hashing) based algorithm to handle TagDM problem instances optimizing similarity
n FDP (Facility Dispersion Problem) based algorithm handles TagDM problem instances optimizing diversity
n Both rely on computing tag signatures for groups q Latent Dirichlet Allocation to aggregate tags q Comparison between signatures based on cosine
40
Two algorithms
![Page 41: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/41.jpg)
n LSH (Locality Sensitive Hashing) based algorithm handles TagDM problem instances optimizing similarity
n LSH is popular to solve nearest neighbor search problems in high dimensions
n LSH hashes similar input items into same bucket with high probability q We hash group tag signature vectors into buckets, and then rank the
buckets based on the strength of their (tag) similarity n SM-LSH
q Returns a set of groups, ≤ k having maximum similarity in tagging behavior, measured by comparing distances between group tag signature vectors
n SM-LSH-Fi: Handles hard constraints by Filtering result of SM-LSH n SM-LSH-Fo: Handles hard constraints by Folding them to SM-LSH
41
Algorithm: LSH Based
![Page 42: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/42.jpg)
n Hashing function for SM-LSH q We use LSH scheme in [3] that employs a family of hashing functions
based on cosine similarity
where Trep(g) is the tag signature vector for group g
q Probability of finding the optimal result set by SM-LSH is bounded by: (proof details in paper)
where d’ is the dimensionality of hash signatures (buckets)
q We employ iterative relaxation to tune d’ in each iteration (Monte Carlo randomized algorithm) so that post-processing of hash tables yields non-null result set
42
[3]: M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002
Algorithm: LSH Based
![Page 43: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/43.jpg)
n SM-LSH-Fi: Dealing with constraints by Filtering q For each hash table, check for satisfiability of hard constraints in
each bucket, and then rank filtered buckets on tagging similarity n Often yields null results
n SM-LSH-Fo: Dealing with constraints by Folding q Fold hard constraints maximizing similarity as soft constraints into
SM-LSH q Hash similar input tagging action groups (similar with respect to
group tag signature vector and user and/or item attributes) into the same bucket with high probability
43
However, it is non-obvious how LSH hash functions may be inversed to account for dissimilarity while preserving LSH properties
Algorithm: LSH Based
![Page 44: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/44.jpg)
n FDP (Facility Dispersion Problem) based algorithm handles TagDM problem instances optimizing diversity
n FDP problem locates facilities on a network in order to maximize distance between facilities q We find tagging groups maximizing diversity (distance) betweeen tag
signature vectors
n We intialize a pair of facilities with maximum weight, and then add nodes with maximum distance to those selected, in each subsequent iteration [4]
44
Algorithm: FDP Based
[4]: S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Facility dispersion problems: Heuristics and special cases. In WADS, 2002
![Page 45: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/45.jpg)
n DV-FDP q Returns a set of groups, ≤ k having maximum diversity in tagging
behavior, measured by maximizing average pairwise distance between group tag signature vectors
q If Gopt and Gapp represent the set of k ( k ≥ 2) tagging action groups
returned by optimal and our approximate DV-FDP algorithm, and tag signature vectors satisfy triangular inequality: (Proof details in paper)
n DV-FDP-Fi q Handles hard constraints by Filtering result of DV-FDP
n DV-FDP-Fo q Handles hard constraints by Folding them to DV-FDP
45
Algorithm: FDP Based
![Page 46: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/46.jpg)
46
Some anecdotal evidence on analysts’ prefs
Users prefer TagDM Problems 2 (find similar user sub-populations who agree most on their tagging behavior for a diverse set of items), 3 (find diverse user sub-populations who agree most on their tagging behavior for a similar set of items) and 6 (find similar user sub-populations who disagree most on their tagging behavior for a similar set of items), having diversity as the measure for exactly one of the tagging component: item, user and tag respectively.
![Page 47: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/47.jpg)
47
n The notion of group is central to social data exploration q Because it is meaningful to analysts: groups are describable q Because group relationships can be explored for efficient space
exploration
Summary and Perspectives
![Page 48: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/48.jpg)
48
n Rating exploration q A single dimension was optimized at a time (error or balance) q We could formulate a problem that seeks k most uniform
(minimize error) and most diverse groups (least overlapping)
Perspective 1
![Page 49: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/49.jpg)
Perspective 2
n There is a total of 112 concrete problem instances that our TagDM framework captures!
n And those optimize the tagging dimensions only
49
![Page 50: Social Data Exploration - Paris Dauphine Universitypoc/spoc/SihemAYSoComp.pdfSocial Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa142880e62504b3f47a5c7/html5/thumbnails/50.jpg)
50
n Social data exploration over time
Perspective 3