skew-symmetric matrix completion for rank aggregation

Skew-symmetric matrix completion for rank aggregation !and other matrix computations DAVID F. GLEICH PURDUE UNIVERSITY COMPUTER SCIENCE DEPARTMENT February 24th, 12pm

Purdue ML Seminar David Gleich, Purdue 1/40

Skew-symmetric matrix completion for rank aggregation !and other matrix computations DAVID F. GLEICH PURDUE UNIVERSITY COMPUTER SCIENCE DEPARTMENT February 24th, 12pm


Skew-symmetric matrix completion for rank aggregation !and other matrix computations DAVID F. GLEICH PURDUE UNIVERSITY COMPUTER SCIENCE DEPARTMENT January 24th, 12pm


Images copyright by their respective owners 4/

40

Matrix computations are the heart

(and not brains) of

many methods of computing.


Matrix computations Physics

Statistics Engineering

Graphics Databases

… Machine learning


Matrix computations

A =

2

66664

A1,1 A1,2 · · · A1,n

A2,1 A2,2 · · ·...

.... . .

. . . Am�1,nAm,1 · · · Am,n�1 Am,n

3

77775

Ax = b min kAx � bk Ax = �x

Linear systems Least squares Eigenvalues Purdue ML Seminar David Gleich, Purdue 7/

40

NETWORK and MATRIX COMPUTATIONS

Why looking at networks of data as a matrix is a powerful and successful paradigm.

RAPr on WikipediaE [x(A)]

United States

C:Living people

France

United Kingdom

Germany

England

Canada

Japan

Poland

Australia

Std [x(A)]

United States

C:Living people

C:Main topic classif.

C:Contents

C:Ctgs. by country

United Kingdom

France

C:Fundamental

England

C:Ctgs. by topic

Gleich (Stanford) Random sensitivity Ph.D. Defense 23 / 41

A new matrix-based sensitivity analysis of Google’s PageRank.

Presented at" WAW2007, WWW2010

Published in the

J. Internet Mathematics

Led to new results on uncertainty quantification in

physical simulations published in SIAM J. Matrix

Analysis and SIAM J. Scientific Computing.

Patent Pending

Improved web-spam detection!

Collaborators Paul Constantine, Gianluca Iaccarino (physical simulation)

PageRank (I � ↵P)x = (1 � ↵)vSimRank

BlockRank TrustRank

ObjectRank HostRank

Random walk with restart

GeneRank

DiffusionRank IsoRank

ItemRank ProteinRank

SocialPageRank FoodRank FutureRank TwitterRank

Network alignment

Matching and overlapSquares produce overlap ! bonus for some �� and �j !

P��j

�r

t

s

j

��t�

Square

A L B

Variables, Data�� = edge indicator�� = weight of edgesS�j squares in S

e� 2 Le� = (t,�)�� =�t�

Problem

m�ximize��

X

�:e�2L�� +X

�,j2S��j

subject to � is a matching$

m�ximizex

wTx+ 12x

TSxsubject to Ax e

�� 2 {0,1}

David F. Gleich (Stanford) Network alignment Southeast Ranking Workshop 11 / 29Purdue ML Seminar David Gleich, Purdue

Bayati, Gerritsen, Gleich, Saberi, and Wang, ICDM2009 Bayati, Gleich, Saberi and Wang, Submitted

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) Network alignment INFORMS Seminar 17 / 40

Network alignment

NETWORK ALIGNMENT

m�ximize �wTx+ �2x

TSxsubject to Ax e,�� 2 {0,1}

History

… QUADRATIC ASS IGNMENT… MAXIMUM COMMON SUBGRAPH… PATTERN RECOGNIT ION… ONTOLOGY MATCHING… B IOINFORMATICS

Sparse problemsSparse L often ignored (afew exceptions).Our paper tackles thatcase explicitly.We do large problems,too.

Conte el al. Thirty years of graph matching, 2004.; Melnik et al. Similarity flooding, 2004; Blondel et al. SIREV 2004;Singh et al. RECOMB 2007; Klau, BMC Bioinformatics 10:S59, 2009.

10/4

0

Purdue ML Seminar David Gleich, Purdue

Overlapping clusters!for distributed computation Andersen, Gleich, and Mirrokni, WSDM2012

1 1.1 1.2 1.3 1.4 1.5 1.6 1.70

0.5

1

1.5

2

Volume Ratio

Rel

ativ

e W

ork

Metis Partitioner

Swapping Probability (usroads)PageRank Communication (usroads)Swapping Probability (web−Google)PageRank Communication (web−Google)

How much more of the graph we need to store.

11/4

0

Tweet along @dgleich

MAIN RESULTS – SLIDE THREE

David F. Gleich (Sandia) ICME la/opt seminar 4 of 50


TOP-K ALGORITHM FOR KATZ

Approximate    

                            where     is sparse Keep     sparse too Ideally, don’t “touch” all of    

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47

Gleich et al. "J. Internet Mathematics, to appear.

Local methods for massive network analysis

Can solve these problems in milliseconds even with 100M edges!

12/4

0

Rank aggregation

DAVID F. GLEICH (PURDUE) & LEK-HENG LIM (UNIV. CHICAGO)

Purdue ML Seminar David Gleich, Purdue 13

Which is a better list of good DVDs? Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of … Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The Two Towers Lord of the Rings 2: The Two Towers Lost: Season 1 Star Wars V: Empire Strikes Back Battlestar Galactica: Season 1 Raiders of the Lost Ark Fullmetal Alchemist Star Wars IV: A New Hope Trailer Park Boys: Season 4 Shawshank Redemption Trailer Park Boys: Season 3 Star Wars VI: Return of the Jedi Tenchi Muyo! Lord of the Rings 3: Bonus DVD Shawshank Redemption The Godfather

Nuclear Norm "based rank aggregation

(not matrix completion on the netflix rating matrix)

Standard "rank aggregation"(the mean rating)


0

Rank Aggregation Given partial orders on subsets of items, rank aggregation is the problem of finding an overall ordering. Voting Find the winning candidate Program committees Find the best papers given reviews Dining Find the best restaurant in Chicago


0

Ranking is really hard

All rank aggregations involve some measure of compromise

A good ranking is the “average” ranking under a permutation distance

Ken Arrow John Kemeny Dwork, Kumar, Naor, !

Sivikumar

NP hard to compute Kemeny’s ranking


0

Given a hard problem, what do you do?!!Numerically relax!!!It’ll probably be easier.

Embody chair!John Cantrell (flickr)


0

Suppose we had scores


Suppose we had scoresLet   be the score of the ith movie/song/paper/team to rank

Suppose we can compare the ith to jth:

 

Then   is skew-symmetric, rank 2.

Also works for   with an extra log.

Kemeny and Snell, Mathematical Models in Social Sciences (1978)

Numerical ranking is intimately intertwined with skew-symmetric matrices

David F. Gleich (Purdue) KDD 2011 6/20 18/4

0

Using ratings as comparisons

Arithmetic Mean

Log-odds


Ratings induce various skew-symmetric matrices. From David 1988 – The Method of Paired Comparisons 19

/40

Extracting the scores


Extracting the scores

Given   with all entries, then   is the Bordacount, the least-squares solution to  

How many   do we have? Most.

Do we trust all   ? Not really. Netflix data 17k movies,

500k users, 100M ratings–99.17% filled

105

107

101

101

David F. Gleich (Purdue) KDD 2011

105

Number of Comparisons

Movie

Pair

s

8/20 20/4

0

Only partial info? COMPLETE IT!


Only partial info? Complete it!Let   be known for   We trust these scores.

Goal Find the simplest skew-symmetric matrix that matches the data  

   


noiseless

noisy

Both of these are NP-hard too.9/20

21/4

0

Solution GO NUCLEAR!

Purdue ML Seminar David Gleich, Purdue From a French nuclear test in 1970, image from http://picdit.wordpress.com/2008/07/21/8-insane-nuclear-explosions/

22/4

0

The nuclear norm!The analog of the 1-norm or ℓ𝓁1-norm for matrices


The nuclear norm

For vectors

is NP-hard while

 

is convex and gives the same answer “under appropriate circumstances”

For matrices

Let   be the SVD.

 

  best convex under-estimator of rank on unit ball.


The analog of the 1-norm or   for matrices.

11/20

23/4

0

Only partial info? COMPLETE IT!




 

 

NP hard

Convex

Heu

rist

ic

David F. Gleich (Purdue) KDD 2011 12/20

24/4

0

Solving the !nuclear norm problem


Solving the nuclear norm problemUse a LASSO formulation

 

 Jain et al. propose SVP for

this problem without  

1.  2. REPEAT

3.   = rank-k SVD of  

4.  5.  6. UNTIL  

Jain et al. NIPS 2010David F. Gleich (Purdue) KDD 2011 13/20

25/4

0

Skew-symmetric SVD


Skew-symmetric SVDsLet   be an   skew-symmetric matrix with

eigenvalues   , where   and   . Then the SVD of   is given by

 for   and   given in the proof.Proof Use the Murnaghan-Wintner form and the SVD of a

2x2 skew-symmetric block

This means that SVP will give us the skew-symmetric constraint “for free”


26/4

0

Matrix completion A fundamental question is matrix completion is when do these problems have the same solution?




 

 

NP hard

Convex

Heu

rist

ic


27/4

0

Exact recovery results


Exact recovery resultsDavid Gross showed how to recover Hermitian matrices.

i.e. the conditions under which we get the exact  

Note that   is Hermitian. Thus our new result!

Gross arXiv 2010.David F. Gleich (Purdue) KDD 2011

 15/20

indices. Instead we view the following theorem as providingintuition for the noisy problem.

Consider the operator basis for Hermitian matrices:

H = S [K [D where

S = {1/p2(eie

Tj + eje

Ti ) : 1 i < j n};

K = {ı/p2(eie

Tj � eje

Ti ) : 1 i < j n};

D = {eieTi : 1 i n}.

Theorem 5. Let s be centered, i.e., sT e = 0. Let Y =seT � esT where ✓ = maxi s

2

i /(sT s) and ⇢ = ((maxi si) �

(mini si))/ksk. Also, let ⌦ ⇢ H be a random set of elementswith size |⌦| � O(2n⌫(1 + �)(log n)2) where ⌫ = max((n✓ +1)/4, n⇢2). Then the solution of

minimize kXk⇤subject to trace(X⇤W i) = trace((ıY )⇤W i), W i 2 ⌦

is equal to ıY with probability at least 1� n��.

The proof of this theorem follows directly by Theorem 4 ifıY has coherence ⌫ with respect to the basis H. We nowshow this result.

Definition 6 (Coherence, Gross [2010]). Let A ben ⇥ n, rank-r, and Hermitian. Let UU⇤ be an orthogonalprojector onto range(A). Then A has coherence ⌫ with

respect to an operator basis {W i}n2

i=1

if both

maxi trace(W iUU⇤W i) 2⌫r/n, and

maxi trace(sign(A)W i)2 ⌫r/n2.

For A = ıY with sT e = 0:

UU⇤ =ssT

sT s� 1

neeT and sign(A) =

1

ksk pnA.

Let Sp 2 S, Kp 2 K, and Dp 2 D. Note that becausesign(A) is Hermitian with no real-valued entries, both quan-tities trace(sign(A)Di)

2 and trace(sign(A)Si)2 are 0. Also,

because UU⇤ is symmetric, trace(KiUU⇤Kp) = 0. Theremaining basis elements satisfy:

trace(SpUU⇤Sp) =1n+

s2i + s2j2sT s

(1/n) + ✓

trace(DpUU⇤Dp) =1n+

s2isT s

(1/n) + ✓

trace(sign(A)Kp)2 =

2(si � sj)2

nsT s (2/n)⇢2.

Thus, A has coherence ⌫ with ⌫ from Theorem 5 and withrespect to H. And we have our recovery result. Although,this theorem provides little practical benefit unless both ✓and ⇢ are O(1/n), which occurs when s is nearly uniform.

6. RESULTSWe implemented and tested this procedure in two synthetic

scenarios, along with Netflix, movielens, and Jester joke-setratings data. In the interest of space, we only present asubset of these results for Netflix.

102

103

104

0

0.2

0.4

0.6

0.8

1

Fra

ctio

n o

f tr

ials

reco

vere

d

Samples

5n

2n lo

g(n

)

6n lo

g(n

)

102

103

104

0

0.2

0.4

0.6

0.8

1

Fra

ctio

n o

f tr

ials

reco

vere

d

Samples200 1000 50000

0.01

0.02

0.03

0.04

0.05

No

ise

leve

l

Samples

5n

2n lo

g(n

)

6n lo

g(n

)

Figure 2: An experimental study of the recoverabil-ity of a ranking vector. These show that we needabout 6n log n entries of Y to get good recovery inboth the noiseless (left) and noisy (right) case. See§6.1 for more information.

6.1 RecoveryThe first experiment is an empirical study of the recover-

ability of the score vector in the noiseless and noisy case. Inthe noiseless case, Figure 2 (left), we generate a score vectorwith uniformly distributed random scores between 0 and 1.These are used to construct a pairwise comparison matrixY = seT � esT . We then sample elements of this matrixuniformly at random and compute the di↵erence betweenthe true score vector s and the output of steps 4 and 5 ofAlgorithm 2. If the relative 2-norm di↵erence between thesevectors is less than 10�3, we declare the trial recovered. Forn = 100, the figure shows that, once the number of samplesis about 6n log n, the correct s is recovered in nearly all the50 trials.Next, for the noisy case, we generate a uniformly spaced

score vector between 0 and 1. Then Y = seT � esT +"E, where E is a matrix of random normals. Again, wesample elements of this matrix randomly, and declare atrial successful if the order of the recovered score vector isidentical to the true order. In Figure 2 (right), we indicatethe fractional of successful trials as a gray value between black(all failure) and white (all successful). Again, the algorithmis successful for a moderate noise level, i.e., the value of ",when the number of samples is larger than 6n log n.

6.2 SyntheticInspired by Ho and Quinn [2008], we investigate recovering

item scores in an item-response scenario. Let ai be the centerof user i’s rating scale, and bi be the rating sensitivity of useri. Let ti be the intrinsic score of item j. Then we generateratings from users on items as:

Ri,j = L[ai + bitj + Ei,j ]

where L[↵] is the discrete levels function:

L[↵] = max(min(round(↵), 5), 1)

and Ei,j is a noise parameter. In our experiment, we drawai ⇠ N(3, 1), bi ⇠ N(0.5, 0.5), ti ⇠ N(0.1, 1), and Ei,j ⇠"N(0, 1). Here, N(µ,�) is a standard normal, and " is anoise parameter. As input to our algorithm, we sampleratings uniformly at random by specifying a desired numberof average ratings per user. We then look at the Kendall⌧ correlation coe�cient between the true scores ti and theoutput of our algorithm using the arithmetic mean pairwiseaggregation. A ⌧ value of 1 indicates a perfect orderingcorrelation between the two sets of scores.

Gross, arXiv, 2010

28/4

0


Recovery Discussion and ExperimentsConfession If   , then just look at differences from

a connected set. Constants? Not very good.

  Intuition for the truth.

   


29/4

0

Recovery Experiments


Recovery Discussion and ExperimentsConfession If   , then just look at differences from

a connected set. Constants? Not very good.

  Intuition for the truth.

   


30/4

0

The ranking algorithm


The Ranking Algorithm0. INPUT   (ratings data) and c

(for trust on comparisons)

1. Compute   from  

2. Discard entries with fewer than c comparisons

3. Set   to be indices and values of what’s left

4.   = SVP(   )

5. OUTPUT  


31/4

0

Synthetic evaluation


Item Response ModelThe synthetic results came from a model inspired by Ho and

Quinn [2008].

 

  - center rating for user $i$   - sensitivity of user $i$   - value of item $j$   - error level in ratings

Sample ratings uniformly at random such that there for expected ratings per user.


32/4

0

Evaluation


0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Media

n K

endall’s

Tau

20

10

5

2

1.5

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Media

n K

endall’s

Tau

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Media

n K

endall’s

Tau

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Media

n K

endall’s

Tau

Figure 3: The performance of our algorithm (left)and the mean rating (right) to recovery the order-ing given by item scores in an item-response theorymodel with 100 items and 1000 users. The variousthick lines correspond to average number of ratingseach user performed (see the in place legend). See§6.2 for more information

Figure 3 shows the results for 1000 users and 100 itemswith 1.1, 1.5, 2, 5, and 10 ratings per user on average. We alsovary the parameter " between 0 and 1. Each thick line withmarkers plots the median value of ⌧ in 50 trials. The thinadjacency lines show the 25th and 75th percentiles of the50 trials. At all error levels, our algorithm outperforms themean rating. Also, when there are few ratings per-user andmoderate noise, our approach is considerably more correlatedwith the true score. This evidence supports the anecdotalresults from Netflix in Table 2.

6.3 NetflixSee Table 2 for the top movies produced by our technique

in a few circumstances using all users. The arithmetic meanresults in that table use only elements of Y with at least 30pairwise comparisons (it is a am all 30 model in the codebelow). And see Figure 4 for an analysis of the residualsgenerated by the fit for di↵erent constructions of the matrixY . Each residual evaluation of Netflix is described by a code.For example, sb all 0 is a strict-binary pairwise matrix Yfrom all Netflix users and c = 0 in Algorithm 2 (i.e. acceptall pairwise comparisons). Alternatively, am 6 30 denotesan arithmetic-mean pairwise matrix Y from Netflix userswith at least 6 ratings, where each entry in Y had 30 userssupporting it. The other abbreviations are gm: geometricmean; bc: binary comparison; and lo: log-odds ratio.

These residuals show that we get better rating fits by onlyusing frequently compared movies, but that there are onlyminor changes in the fits when excluding users that ratefew movies. The di↵erence between the score-based residu-als

��⌦(seT � esT )� b�� (red points) and the svp residuals��⌦(USV T )� b

�� (blue points) show that excluding compar-isons leads to “overfitting” in the svp residual. This suggeststhat increasing the parameter c should be done with careand good checks on the residual norms.To check that a rank-2 approximation is reasonable, we

increased the target rank in the svp solver to 4 to investigate.For the arithmetic mean (6,30) model, the relative residualat rank-2 is 0.2838 and at rank-4 is 0.2514. Meanwhile, thenuclear norm increases from around 14000 to around 17000.These results show that the change in the fit is minimal andour rank-2 approximation and its scores should represent areasonable ranking.

0.2 0.3 0.4 0.5 0.6 0.7

am all 30am 6 30gm 6 30gm all 30am all 100am 6 100sb 6 30sb all 30gm all 100gm 6 100bc 6 30bc all 30

bc 6 100bc all 100lo all 30lo 6 30lo 6 100lo all 100

sb all 100sb 6 100

am 6 0am all 0

bc 6 0bc all 0lo 6 0lo all 0gm 6 0gm all 0

sb 6 0sb all 0

Relative Residual

Figure 4: The labels on each residual show how wegenerated the pairwise scores and truncated the Net-flix data. Red points are the residuals from thescores, and blue points are the final residuals fromthe SVP algorithm. Please see the discussion in §6.3.

7. CONCLUSIONExisting principled techniques such as computing a Ke-

meny optimal ranking or finding a minimize feedback arc setare NP-hard. These approaches are inappropriate in largescale rank aggregation settings. Our proposal is (i) measurepairwise scores Y and (ii) solve a matrix completion problemto determine the quality of items. This idea is both princi-pled and functional with significant missing data. The resultsof our rank aggregation on the Netflix problem (Table 2)reveal popular and high quality movies. These are interestingresults and could easily have a home on a “best movies inNetflix” web page. Such a page exists, but is regarded ashaving strange results. Computing a rank aggregation withthis technique is not NP-hard. It only requires solving aconvex optimization problem with a unique global minima.Although we did not record computation times, the mosttime consuming piece of work is computing the pairwise com-parison matrix Y . In a practical setting, this could easily bedone with a MapReduce computation.

To compute these solutions, we adapted the svp solver formatrix completion [Jain et al., 2010]. This process involved(i) studying the singular value decomposition of a skew-symmetric matrix (Lemmas 1 and 2) and (ii) showing thatthe svp solver preserves a skew-symmetric approximationthrough its computation (Theorem 3). Because the svp solvercomputes with an explicitly chosen rank, these techniqueswork well for large scale rank aggregation problems.

We believe the combination of pairwise aggregation andmatrix completion is a fruitful direction for future research.We plan to explore optimizing the svp algorithm to exploitthe skew-symmetric constraint, extending our recovery resultto the noisy case, and investigating additional data.

Acknowledgements. The authors would like to thank Amy Langville,

Carl Meyer, and Yuan Yao for helpful discussions.

Nuclear norm ranking Mean rating

33/4

0

Conclusions and Future Work

Our motto “aggregate, then complete”

Rank aggregation with "the nuclear norm is principled easy to compute The results are much better than simple approaches.

1.  Additional comparison 2.  Noisy recovery! More

realistic sampling. 3.  Skew-symmetric Lanczos

based SVD?


0

Current research

Purdue ML Seminar David Gleich, Purdue 35

Data driven surrogate functions Beyond spectral methods for UQ


0

Graph spectra


Graph spectra

37/4

0


1.5, 0.5

1.33 (two!)

1.5

1.5 (two)

1.833 0.565741"1.767592

0.725708"1.607625

Spectral spikes

38/4

0

Google nuclear ranking gleich


0

skew-symmetric matrix completion for rank aggregation

Technology

seminar david gleich

pm purdue

chicago1540 purdue

gleich purdue lekheng

withicme laopt seminar

matrix computationsdavid

matrix computationsare

return of lord