ece 4502/6502 & cs 6501: graph...

53
ECE 4502/6502 & CS 6501: Graph Mining Instructor: Jundong Li Spring 2020, University of Virginia Midterm Exam Review

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

ECE 4502/6502 & CS 6501: Graph Mining

Instructor: Jundong LiSpring 2020, University of Virginia

Midterm Exam Review

Page 2: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Logistics of the exam• Format: Online and Open Book• Time: Thursday, March 19, 3:30PM - 4:45PM• Where to download/upload exam questions?

• Collab->Assignment• When can I download exam questions?

• Available to download from 3:10 pm (20 minutes before the exam)

• When do I need to upload my answers?• Submission protocol will be closed at 5:05 pm (20

minutes after the exam)• How can I upload my answers?

• (i) Electronically; or (ii) handwriting, then scan (or) take photos

Page 3: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

The covered topics

• Chapter 1. Introduction• Chapter 2. Graph Essentials• Chapter 3. Network Measures• Chapter 4. Network Models• Chapter 5. Data Mining Essentials• Chapter 6. Community Analysis• Chapter 7. Information Diffusion• Chapter 8. Recommender System

Page 4: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Organization of the exam

• 1~2 true/false questions and 1 regular question for each chapter (10 T/F questions and 8 regular questions in total)

• For the T/F questions, you need to decide whether the statement is true or not

• For the regular questions, you will be asked to write down your thoughts about a problem or do some simple calculations

Page 5: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Chapter 1: Introduction

Page 6: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Graph mining

• What are graphs/networks?

• Why graphs are important?

• What is graph mining?

• What are interesting applications of graph mining?

Page 7: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Chapter 2: Graph Essentials

Page 8: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Graph basics• What is a network/graph?• Nodes and edges• Undirected networks vs. directed networks• In-degree and out-degree• What kind of degree distribution? – power-law• What is the power-law distribution and how to

interpret it?

Page 9: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Graph representation

• Adjacency matrix• Adjacency list• Edge list• Given a graph, can you draw the corresponding

adjacency matrix, adjacency list and edge list?

Page 10: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Connectivity in graphs

• What is random walk and how to perform random walk in the graph?

• What is a component of an undirected graph, and what are strongly and weakly connected components of directed graph?• What is the shortest path between two nodes?• What is the diameter of the graph and how to

calculate it?

Page 11: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Special subgraphs

• What is a minimum spanning tree (MST)?

• What is a complete graph?• All possible edges exist

• What is a regular graph? • In a k-regular graph, all nodes have degree k

Page 12: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Graph algorithms

• Depth first search and breadth first search• Depth first search – stack structure• Breadth first search – queue structure• How to find the shortest path with Dijkstra’s

algorithm? • How to find the minimum spanning tree with

the Prim’s algorithm?• Given a piece of pseudo code, can you have a

sense what it is about?

Page 13: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Chapter 3: Network Measures

Page 14: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Centrality• Degree centrality and normalized version

• Formulation• Eigenvector centrality

• Advantages over degree centrality• Formulation • Which eigenvalue-eigenvector pair should we choose?

• Katz centrality• What are the problems of eigenvector centrality?• Formulation – why the bias term 𝛽 helps?

• PageRank• What are problems of proceeding algorithms?• Formulation of PageRank• How to choose the parameter 𝛼 in PageRank?• What is power method and why?

Page 15: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Centrality

• Betweenness centrality and normalized version• Formulation and calculation

• Closeness centrality • Formulation and calculation

Page 16: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Transitivity and reciprocity

• What is transitivity?• Clustering coefficient measures transitivity in

undirected graphs – how to calculate it?• Local clustering coefficient measures transitivity

at the node level – how to calculate it?

• What is reciprocity?• Given a directed graph, how to calculate its

reciprocity?

Page 17: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Balance and social status

• What is social balance theory?• In which case a triangle relationship is stable?

• What is social status theory?• How to determine if the status theory is

violated?

Page 18: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Similarity

• What is structural equivalence?• Look at the neighborhood shared by two nodes• How to calculate Jaccard and Cosine similarity?• How to measure the significance of similarity?

• Compare the calculated similarity value with its expected value where vertices pick their neighbors at random

• What is regular equivalence?• How neighborhoods themselves are similar• How to calculate the regular equivalence?

Page 19: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Chapter 4: Network Measures

Page 20: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Properties of real-world networks

• Degree distribution• Power-law degree distribution - scale-free networks• How to decide if it follows power-law distribution?

• Clustering coefficient• High clustering coefficient

• Average path length• The average path length is small

Page 21: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Random graphs

• What is a random graph?• How to build a random graph?• 𝐺(𝑛,𝑚)model• 𝐺(𝑛, 𝑝) model

• When is the 1st and the 2nd transition phase?• Properties of random graphs compared with

real-world networks• Degree distribution?• Clustering coefficient – what is the expected local

and global clustering coefficient?• Average path length?

Page 22: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Small-world model

• What is a small-world graph?• How to construct a small-world graph?

• Why start from regular (ring) lattice?• How to add randomness into the regular lattice?

• Properties of small-world graphs compared with real-world networks• Degree distribution?• Clustering coefficient? How to calculate the global and

local clustering coefficient of regular lattice?• Average path length?

• How will the clustering coefficient and average path length change?

Page 23: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Preferential attachment model

• What’s the basic intuition of preferential attachment model? - The rich get richer• How to construct a preferential attachment

model?• Properties of preferential attachment model

compared with real-world networks• Degree distribution?• Clustering coefficient?• Average path length?

• What are the major differences of different network models?

Page 24: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Chapter 5: Data Mining Essentials

Page 25: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Data mining and data

• What is the KDD process?• What are the differences between data mining

and database?• What are typical data types (nominal, ordinal,

interval, ratio) and permissible operations?• Text representation• Vector space model• TF-IDF – how to obtain the TF-IDF representation?

Page 26: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Data quality and processing

• What data qualities need to be checked before applying data mining algorithms?

• What are the typical data processingtechniques and when will they be used?

Page 27: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Supervised learning

• What is the process of supervised learning such as classification and regression?• Classification models• Decision tree classifier • K-nearest neighbor classifier• Naïve Bayes classifier• Classification with network information

• Regression models• Linear regression• Logistic regression

Page 28: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Evaluation of supervised learning

• Training set and test set• What is leave-one-out and what is K-fold cross

validation?• Evaluation of classification• Accuracy• Precision, recall, and F1-measure

• Evaluation of regression• RMSE• MAE

Page 29: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Unsupervised learning

• What is the target of clustering?• Different distance measures

• Euclidean distance• L1-norm distance• Cosine distance

• Clustering algorithm such as k-means• How the algorithm works?• When does it stops?• What is the objective function?

• Evaluation of clustering results• With ground truth• Without ground truth

Page 30: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Chapter 6: Data Mining Essentials

Page 31: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Community detection

• Why analyze communities?

• What are explicit and implicit communities?

• What is community detection?

• What’s the difference between community detection and conventional clustering?

Page 32: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Member-based community detection• What are member-based methods?• Nodes with similar characteristics are in a

community

• Node characteristics• Degree, e.g., clique percolation method (CPM)• Reachability, e.g., k-clique, k-club, and k-clan• Similarity, e.g., Jaccard and Cosine similarity

Page 33: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Group-based community detection• What are group-based methods?• The global network information and topology is

considered to determine communities• Balanced communities - spectral clustering• Community detection à a minimum cut problem• Find a graph partition such that the number of

edges between the two sets is minimized• What are ratio cut and normalized cut?• Formulation of spectral clustering – which

eigenvector to use?

Page 34: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Group-based community detection• Modular communities - modularity

maximization• Modularity is a measure that defines how likely

the community structure found is created at random • How to calculate the modularity?• How to maximize the modularity?• How to obtain the community assignment from

the modularity matrix?

Page 35: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Group-based community detection• Hierarchical communities: hierarchical

clustering

• How to build a hierarchical structure of communities?• Divisive hierarchical clustering• Agglomerative hierarchical clustering

Page 36: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Community evolution

• Evolution of networks

• Interesting patterns in dynamic networks • Decreasing probability of new connections between

two nodes with increasing distance• Many new connections occur as triadic closures• Density increases with the network growth• Average distance between nodes decreases

Page 37: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Community evaluation

• How the evaluate a community detection assignment?

• Evaluation with ground truth• Precision and Recall, or F-Measure• Normalized Mutual Information (NMI)

• Evaluation without ground truth• Use domain experts or conduct user studies• Use multiple community detection methods

Page 38: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Chapter 7: Information Diffusion

Page 39: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Information diffusion

• What is information diffusion?

• Key components of information diffusion• Sender(s)• Receiver(s)• Medium

Page 40: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Herd behavior

• What is herding behavior?• In what conditions the herding behavior will

happen? – global information• Herd behavior experiment - urn experiment• How to use mathematical tools to validate that herd

behavior will happen?

• How to interrupt the herding behavior

Page 41: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Information cascade

• What is information cascade?• What’s the major difference between

information cascade and herding behavior?• Independent Cascade Model (ICM) - each node

has one chance to activate its neighbors• How it works and which set of nodes will get

activated at the end?

Page 42: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Maximizing the spread of cascades• To trigger a large spread, which set of

individuals should be targeted at the very beginning?• Given a parameter k (budget), find a k-node set

S to maximize f(S)• A constrained optimization problem with f(S)

as the objective function• What are the key properties of f(S)?• How to optimize it? Greedy algorithm• How good is the greedy algorithm?

Page 43: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Diffusion of innovations and epidemic models• What are the key characteristics of diffusion of

innovations?• How to model diffusion of innovation?• External-influence, internal-influence, and mixed-

influence• What are epidemics? Three components:

pathogen, hosts, and spreading mechanism• What are differences between epidemics and

cascades?

Page 44: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Epidemics Models

• SI model• SIR model• two cases of SIR model

• SIS model• SIRS model• How to perform epidemic intervention?

Page 45: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Chapter 8: Recommendation

Page 46: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Recommendation

• What’s the difference between search and recommendation?

• Challenges of recommender systems• The cold start problem• Data sparsity • Attacks• Privacy• Explanation

Page 47: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Content-based recommendation

• Assumption: a user’s interest should match the description of the items that the user should be recommended by the system

• Detailed steps:• Describe the items to be recommended• Create a profile of the user that describes the types

of items the user likes• Compare items with the user profile to determine

what to recommend

Page 48: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Collaborative filtering

• What is collaborative filtering (CF)?

• What’s the advantage of CF over content based recommendation?

• Types of collaborative filtering algorithms• Memory-based (recommendation is directly based

on previous ratings in the stored matrix that describes user-item relations)• Model-based (assumes that an underlying model

(hypothesis) governs how users rate items)

Page 49: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Memory-based collaborative filtering• User-based CF• Users with similar previous ratings for items are

likely to rate future items similarly • Item-based CF• Items that have received similar ratings previously

from users are likely to receive similar ratings from future users

• How to measure the similarity? - cosine similarity or Pearson correlation coefficient• How to get the final ratings in user-based CF

and item-based CF?

Page 50: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Model-based collaborative filtering • We focus on a well-established technique using

singular value decomposition (SVD)• What is SVD on matrix X? - Decompose X into

3 matrices (UΣVT)• Matrices U ∈ ℝ!×! and V ∈ ℝ#×# are

orthogonal and matrix Σ ∈ ℝ!×# is diagonal• The product of these matrices is equivalent to

the original matrix – no information loss• How to use SVD for the recommendation?

Page 51: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Recommendation to groups

• Find content of interest to all members of a group of socially acquainted individuals

• Three strategies:• Maximizing average satisfaction• Least misery• Most pleasure

Page 52: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Social recommendation

• Leverage social connections to improve recommendation performance• Three different ways:• Using social context alone – link prediction• Extending classical models – matrix factorization

and social similarity regularization• Constrain using social context - limit the set of

individuals that can contribute to the ratings of a user to the set of friends of the user

Page 53: ECE 4502/6502 & CS 6501: Graph Miningpeople.virginia.edu/~jl6qk/sp20-graph-mining/examreview.pdfMember-based community detection •What are member-based methods? •Nodes with similar

Evaluating recommender systems

• Predictive accuracy • MAE, RMSE – how to calculate it?

• Classification accuracy• Precision, Recall – how to calculate it?

• Rank accuracy• Spearman’s Rank Correlation, Kendall’s 𝜏 −how to calculat it?