master project - dbisdbis.informatik.uni-freiburg.de/content/courses/ss17/projekt... · 03.05.2017...

43
Albert-Ludwigs-Universität Freiburg Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Master Project Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Upload: lamkiet

Post on 06-Feb-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Albert-Ludwigs-Universität Freiburg

Various Aspects of Recommender Systems

May 2nd, 2017

Master project SS17

Master Project

Prof. Dr. Georg Lausen

Dr. Michael Färber

Anas Alzoghbi

Victor Anthony Arrascue Ayala

Agenda

Organization

Recommender Systems

Topics- Finding complementary products (Anthony)

- Cross-domain recommendations (Anthony)

- Scientific Paper recommendation (Anas)

- Recommending new Wikipedia articles (Michael)

- Recommending references for (scientific) texts (Michael)

03.05.2017 Various Aspects of Recommender Systems SS17 2

Requirements

Study regulations (Studienordnung)

- 16 ECTS → 480 hours

Master project

- Team size: 1-3 students

- Project report: ~10-12 pages per student

- Short presentations: 2-3 (individual as needed)

- Final presentation: 25 min

Some preconditions

- Recommended lecture “Data Analysis and Query Language” or similar

03.05.2017 3Various Aspects of Recommender Systems SS17

General goals

Collective work on a project

Gain experience in research and development method

Improve individual programming skills

Incorporate in new topics (Semantic Web, Recommender systems,…)

Learn about problems of larger projects

03.05.2017 4Various Aspects of Recommender Systems SS17

Assessment

Workload of every student must be clearly distinguishable

Some Criteria

- Methodology

- The scope and difficulty of the work / implementation

- Individual contribution

- Team performance: a successful project has a positive effect

- Role and participation in the team (coordination, etc.)

- Quality of code (formatting, documentation, testing)

- Individual report (project report)

- Presentations (especially the final presentation)

03.05.2017 5Various Aspects of Recommender Systems SS17

Organization

6

Meetings

- Building 51 – SR 01 029

Website

- Apply via HISinOne

SVN repository

Various Aspects of Recommender Systems SS1703.05.2017

Master projects

1. Finding complementary products (Anthony)

2. Cross-domain recommendations (Anthony)

3. Scientific Paper recommendation (Anas)

4. Recommending new Wikipedia articles (Michael)

5. Recommending references for (scientific) texts (Michael)

7Various Aspects of Recommender Systems SS1703.05.2017

Finding complementary products - 1st project

03.05.2017 8

“Products that are sold separately but that are used together, each creating a demand for the other”

Click

Various Aspects of Recommender Systems SS17

CP – Traditional Approaches

03.05.2017 10

Data Mining (Association Rules)

- Require transactions

Limitations

- Cold start for new items

- Unpopular products

- No explanations

Various Aspects of Recommender Systems SS17

CP – Problem

03.05.2017 11

Predict if complementary relationship holds

No transactions

Using Semantic Web technologies

- Linked Open Data (DBpedia): knowledge graph

Based on product‘s meta-data

- Publicly available

Various Aspects of Recommender Systems SS17

CP – Solution scheme

03.05.2017 12

Learning to Identify Complementary Products from Dbpedia. Victor Anthony Arrascue Ayala, Trong-Nghia Cheng, Anas Alzoghbi, Georg Lausen. LDOW@WWW 2017

Evaluation using Amazon’s data

Various Aspects of Recommender Systems SS17

1. Reproduce pipeline

2. Add new features

- Observable graph-features

- Meta-data: e.g. price

3. Extend evaluation

- Other categories (Books, Movies and TV, etc.)

- Ranking vs. classification

03.05.2017 13

Goal: improving the scheme

Various Aspects of Recommender Systems SS17

Compulsory task

14

1. Read the paper

2. Extract products attributes

- Smallest category

- Using NER tool (Alchemy / Spotlight)

3. Create knowledge graph

- Crawl links between attributes from DBpedia

4. Data analysis

- Products coverage

- Interconnection’s quality

- Etc…

Various Aspects of Recommender Systems SS1703.05.2017

Submission of compulsory task

15

Pre-requisite to participation

Report- Introduction

- Problem statement (1 page)

- Solution proposal (1 page)

- Data analysis (2 pages)

- Related work (1 pages)

1 team, max. 3 students

Deadline: 16.05.2017, 12:00

Various Aspects of Recommender Systems SS1703.05.2017

Cross-domain recommendations - 2nd project

03.05.2017 16

“The research on cross-domain recommendation generally aims to exploit knowledge from a source domain DS to perform or improve

recommendations in a target domain DT” [RS Handbook]

?

?

?

Various Aspects of Recommender Systems SS17

CDRS – Problem

03.05.2017 17

For each user

- Given a set of likes for items in DS

- Predict items in DT

Using Semantic Web technologies

- Linked Open Data (DBpedia): knowledge graph

- Items are interconnected

Various Aspects of Recommender Systems SS17

CP – Solution scheme (not assessed)

03.05.2017 18

Learning to Identify Complementary Products from Dbpedia. Victor Anthony Arrascue Ayala, Trong-Nghia Cheng, Anas Alzoghbi, Georg Lausen. LDOW@WWW 2017

Evaluation using Facebook’s data (likes)

Various Aspects of Recommender Systems SS17

CP – Solution scheme (not assessed)

03.05.2017 19

Learning to Identify Complementary Products from Dbpedia. Victor Anthony Arrascue Ayala, Trong-Nghia Cheng, Anas Alzoghbi, Georg Lausen. LDOW@WWW 2017

Evaluation using Facebook’s data (likes)

Liked ?

Various Aspects of Recommender Systems SS17

1. Reproduce pipeline

2. Implement a recommender on top

- Predict if a user would like the item

- Predict top-k recommendations

- *Optional: Integrate into RecRD4J

3. Evaluate the recommender

- Use standard metrics: Precision, Recall

03.05.2017 20

Goal: try the scheme

Various Aspects of Recommender Systems SS17

Compulsory task

21

1. Read the paper

2. Build infrastructure- Large dataset (approx. 15 GB)

3. Data analysis

- For each domain (books, movies, music)

- Interconnection’s quality

- Long-tail

- Sparsity

- Etc…

Various Aspects of Recommender Systems SS1703.05.2017

Submission of compulsory task

22

Pre-requisite to participation

Report- Introduction

- Problem statement (1 page)

- Solution proposal (1 page)

- Data analysis (2 pages)

- Related work (1 pages)

1 team, max. 3 students

Deadline: 16.05.2017, 12:00

Various Aspects of Recommender Systems SS1703.05.2017

Scientific Paper recommendation- 3rd project

Recommend Scientific papers to users

Content-Based, Collaborative filtering and Hybrid

Papers features (meta-data)- Textual features: Title, Abstract, Keyword list

- Non-textual features: Publication year, Authors, Venue, Publisher, …

03.05.2017 23Various Aspects of Recommender Systems SS17

Textual paper representation

24

𝑘1 … 𝑘𝑖 𝑘𝑖+1 … 𝑘𝑛

1 … 1 tf-idfi+1 tf-idfn

Term

Extraction

Paper Paper Vector

Scientific Paper recommendation- 3rd project

Various Aspects of Recommender Systems SS1703.05.2017

Scientific Paper recommendation- 3rd project

Rating Matrix

03.05.2017 25Various Aspects of Recommender Systems SS17

HyPRec

Master Project WS 2016

Scientific papers recommender

Probabilistic Topic Modeling (LDA)

Matrix factorization (ALS Algorithm)

Python

GitHub https://github.com/mostafa-mahmoud/sahwaka

03.05.2017 26Various Aspects of Recommender Systems SS17

HyPRec - Architecture

03.05.2017 27

Data Parser

Evaluator

Citeulike Dataset

(csv files)

Recommender

Mysql DB

Metrics Calculator

Train-Test splitter

Content-Based Filtering

Collaborative FilteringMatrix Factorization CF

Item-based CBF

Hybrid

Weighted

(Linear Combination)Papers Model

Textual representation

Latent topics

Features

LDA

Tf-IDF

Publication year, authors,

publisher,...

MRR, NDCG, Recall

User-Based K-Fold Split

Various Aspects of Recommender Systems SS17

Regulations

28

One team – max 3 students

Weekly meetings

Programming language: Python

Various Aspects of Recommender Systems SS1703.05.2017

Regulations

29

Compulsory task (Deadline: 17.05.2017, Pre-requisite toparticipation)

- Get familiar with HyPRec

- Implement a simple Recommender (User-based CF)

- Submit evaluation results (small presentation)

Starting Report (Submission: 24.05.2017)

- Problem statement (1 page)

- Solution proposal (1 page)

Various Aspects of Recommender Systems SS1703.05.2017

New Wikipedia Article Recommendation - 4th project

03.05.2017 30Various Aspects of Recommender Systems SS17

03.05.2017 31

Motivation: Writing New Wikipedia Articles

What to

write

about?

Michael

Slager

LG G4

Dan

Fredinburg

Oleg

Kalashnikov

Adult

Beginners

What to do?

1. Use list of requested articleshttps://en.wikipedia.org/wiki/Wikipedia:Requested_articles

2. Read news or consume other media.

Automatically recommendrelevant novel Wikipedia articles based on newsstream.

Various Aspects of Recommender Systems SS17

Distinguish between notable and not-notable entities

32

Various Aspects of Recommender Systems SS1703.05.2017

Approach: Use diff between Wikipedia dumps

03.05.2017 33Various Aspects of Recommender Systems SS17

Existing Approach for Recommending New Wikipedia Articles

03.05.2017 34

see Färber et al.: „On Emerging Entity Detection“, EKAW 2016.

Various Aspects of Recommender Systems SS17

Task

Build a „live system“ for Wikipedia article recommendation.

03.05.2017 35Various Aspects of Recommender Systems SS17

Task

Improve the system via…

- Better selection of news sources

- Distributed processing of news articles (especially text annotation)

- Considering also very recently added Wikipedia pages

- Find and implement better features / adapt existing features

- Improve binary classification, e.g., by using a Recurrent NeuralNetwork.

- Using word embeddings for better representation of candidates in news articles.

- Using other Knowledge Graphs, e.g., Wikidata or CrunchBase.

03.05.2017 36Various Aspects of Recommender Systems SS17

Compulsory task

1. Read related work (esp., „On Emerging Entity Detection“, EKAW 2016).

2. Extract Wikipedia articles which were inserted between twoWikipedia dumps (given the Wikipedia indices).

3. Annotate news articles (from between the Wikipedia versions) via an entity linking tool and extract noun phrases.

4. Calculate statistics about annotations.

5. Correlate new Wikipedia articles and their mentions with meta-information of news articles (e.g., which sources are suitable forpredicting new Wikipedia articles).

03.05.2017 37Various Aspects of Recommender Systems SS17

Submission of compulsory task

03.05.2017 38

1 team, max. 2 students

Report, Deadline: 16.05.2017, 12:00, Pre-requisite toparticipation

- Introduction (1 page)

- Data analysis (2 pages)

- Related work (1 page)

Project proposal (24.05.2017)

- Additional sections: Problem statement (1 page), proposedapproach/improvements of the system (2 pages), proposedevaluation (1 page)

Various Aspects of Recommender Systems SS17

Citation Recommendation - 5th project

Idea: Enrich (scientific) text with citation markers (e.g, “[1]”) and references.

03.05.2017 39Various Aspects of Recommender Systems SS17

Approach

1. Create model:

- Extract citations with context from publication corpus.

- Develop & implement features for ranking publications.

2. Apply model:

- Extract citation contexts from input text.

- Determine which publications to cite in which context.

- Add citations to text.

03.05.2017 40Various Aspects of Recommender Systems SS17

Useful Data Sets

„Scholarly“

- 101k papers in computer science domain, PDF+metadata

arXiv.org

- Over 1M papers (PDF+metadata)

- Different fields: Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics

CiteSeerX

- Database with publications and citations

- Ca. 7M papers

DBLP, Microsoft Academic Graph, …

03.05.2017 41Various Aspects of Recommender Systems SS17

Compulsory task

Read related work

Analyze and compare existing data sets for citationrecommendation, including

- citation context extraction

- publication meta-data retrieval

- citation graph creation

- incorporating external data sets (e.g., DBLP, PageRank, …)

03.05.2017 42Various Aspects of Recommender Systems SS17

Submission of compulsory task

03.05.2017 43

1 team, max. 3 students

Report, Deadline: 16.05.2017, 12:00, Pre-requisite forparticipation

- Introduction (1 page)

- Analysis & comparison of data sets and tools (2 pages)

- Related work (for task in general) (2 pages)

Project proposal (24.05.2017)

- Additional sections: Problem statement (1 page), proposedapproach (2 pages), proposed evaluation (1 page)

Various Aspects of Recommender Systems SS17

Thank you!

Any questions?

03.05.2017 44Various Aspects of Recommender Systems SS17