svd classification. does it lend itself to ptree processing? what rendition of the many that are...

19
SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD uses gradient optimization similar to what we have been doing in our Variance maximization. The differences is, of course, that in our gradient optimization so far, we applied it to a single entity table or training table, such as IRIS, CEMENT, WINE, etc.) to find the optimal unit vector on which to project to get good gaps, whereas, a pTree SVD will apply gradient optimization of the error as a function of the aspects of users preferences and movie characteristics. The error here is the SVD predicted rating error (dot prod of the user feature vector and the movie feature vector). We follow gradients to minimize the error, not to maximize a variance. I predict a quantum leap forward in recommenders, text mining, market basket research and many other application areas. We need to learn SVD training code. Funk makes it sound simple, but has many speedup work-arounds that go unexplained. Foray-1: Translate Funk's code to HPVD (AKA pTree code) and look for speedups (undo his workarounds, find our own...). Foray-2: Go back to basics SVD as matrix factorization for primary eigen value discovery, translate to pTrees w speedups). Next find: Basic SVD, the boiled down Funk SVD, Funk's code and a little of the WP (Wettstein-Perrizo) pTree Netflix code. Foray-3: Modify mpp-user.C to train a [Modified] Funk SVD and then (simple part) make predictions using it (don't think we need movie-vote.C or user-vote.C or prune.C, so that reduces the understanding to just mpp-mpred.C and mpp-user.C. Foray-4 : Code all for the netflix dataset and find all kinds of [speed] improvements (so we can pick the number of aspects to be what every we want - like 10,000??. I guarantee we will find killer ideas that will change the Machine Teasing seas!!!

Upload: judith-boone

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD

SVD classification.Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition?Funk SVD uses gradient optimization similar to what we have been doing in our Variance maximization.  The differences is, of course, that in our gradient optimization so far, we applied it to a single entity table or training table,

such as IRIS, CEMENT, WINE, etc.) to find the optimal unit vector on which to project to get good gaps, whereas, a pTree SVD will apply gradient optimization of the error as a function of the aspects of users preferences and movie characteristics.

The error here is the SVD predicted rating error (dot prod of the user feature vector and the movie feature vector).We follow gradients to minimize the error, not to maximize a variance.

I predict a quantum leap forward in recommenders, text mining, market basket research and many other application areas.

We need to learn SVD training code.  Funk makes it sound simple, but has many speedup work-arounds that go unexplained.

Foray-1: Translate Funk's code to HPVD (AKA pTree code) and look for speedups (undo his workarounds, find our own...).

Foray-2: Go back to basics SVD as matrix factorization for primary eigen value discovery, translate to pTrees w speedups).

Next find: Basic SVD, the boiled down Funk SVD, Funk's code and a little of the WP (Wettstein-Perrizo) pTree Netflix code.

Foray-3: Modify mpp-user.C to train a [Modified] Funk SVD and then (simple part) make predictions using it (don't think we need movie-vote.C or user-vote.C or prune.C, so that reduces the understanding to just mpp-mpred.C and mpp-user.C. 

Foray-4: Code all for the netflix dataset and find all kinds of [speed] improvements (so we can pick the number of aspects to be what every we want - like 10,000??. I guarantee we will find killer ideas that will change the Machine Teasing seas!!!

Questions to ponder (and answer):

a. Funk identifies 1 feature at a time (Each step, it's the SV (Feature among the remaining which maximally reducing error.)Why can't we start pealing off "keeper aspects" until the error stops reducing much (thereby not having to specify the

number of aspects up front, but ending up with the "perfect number of aspects" in the end).

b. What are the [speed] bottlenecks for SVD as described by Funk and in the litereature so far?  Solve each using pTrees.

c.  Given the nature of pTree processing and the advantages it gives for speed, can we do things differently (inspired by SVD) to get a breakthrough?  (maybe we gradient optimize something else other than the  error?  Maybe we skip steps we don't need but they do?  Maybe add steps they can't?

d.  Once we have our set of optimal aspects for our SVD prediction, maybe we start with, as parameters, the set of reductions in residual error that each gained us, and hill climb those as vote weights to further decrease the error???

Page 2: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD

Simon Funk: Netflix provided a database of 100M ratings (1 to 5) of 17K movies by 500K users. as a triplet of numbers: (User,Movie,Rating). The challenge: For (User,Movie,?) not in the database, predict how the given User would rate the given Movie.

Think of the data as a big sparsely filled matrix, with userIDs across the top and movieIDs down the side (or vice versa then transpose everything), and each cell contains an observed rating (1-5) for that movie (row) by that user (column), or is blank meaning you don't know.

This matrix would have 8.5B entries, but you are only given values for 1/85 th of those 8.5B cells (or 100M of them). The rest are all blank. Netflix posed a "quiz" of a bunch of question marks plopped into previously blank slots, and your job is to fill in best-guess ratings in their place.Squared error (se) measures accuracy (You guess=1.5, actual=2, you get docked (2-1.5)2=.25. They use root mean squared error (rmse) but if we

minimize mse, we minimize rmse. There is a date for ratings and question marks (so a cell can potentially have >=1 rating in it.Any movie can be described in terms of some features (or aspects) such as quality, action, comedy, stars (e.g., Pitt), producer, etc.A user's preferences can be described in terms how they rate the same features (quality/action/comedy/star/producer/etc.).Then ratings ought to be explainable by a lot less than 8.5 billion numbers (e.g., a single number specifying how much action a particular movie has

may help explain why a few million action-buffs like that movie.). SVD: Assume 40 features. A movie, m, is described by mF[40] = how much that movie exemplifies each aspect. A user, u, is described by uF[40] = how much he likes each aspect. Pu,m=uFomF erru,m=Pu,m- ru,m

ua+= lrate (u,i * iaT - * ua ) where u,i = pu,i - ru,i and ru,i = actual rating

SVD is a trick which finds UT, M which minimize mse(k) (one k at a time). So, the rank=40 SVD of the 8.5B Training matrix, is the best (least error) approx we can get within limits of our user-movie-rating model. I.e., the SVD has found the "best" feature generalizations.

To get the SVD matrixes we take the gradient of mse(k) and follow it.This has a bonus - we can ignore the unknown error on the 8.4B empty slots.

Take gradient of mse(k) (just the given values, not empties), one k at a time.userValue[user] += lrate*err*movieValue[movie];

movieValue[movie] += lrate*err*userValue[user];

More correctly: uv = userValue[user] += err * movieValue[movie]; movieValue[movie] += err * uv; finds the most prominent feature remaining (most reduces error). When it's good, shift it onto done features, start a new one (cache residuals of the 100M. "What does that mean for us???).

This Gradient descent has no local minima, which means it doesn't really matter how it's initialized.

With Horizontal data, the code is evaluated for each rating. So, to train for one sample: real *userValue= userFeature[featureBeingTrained]; real *movieValue= movieFeature[featureBeingTrained]; real lrate = 0.001;

UT a1 a40 u1

u500K

u uF

M m1 m m17K a1

mF

a40

o

P m1 m m17K u1

.

.

u500K

u Pu,m

=

= k=1..40uFk*mFk - ru,m

m=1..17K; u=1..500K( )2/8.5B k=1..40uFk*mFk - ru,mmse = mse/uFh = (2/8.5B) m=1..17K; u=1..500K (erru,m)[ ( )/uFh] k=1..40uFk*mFk - ru,m

= (2/8.5B) m=1..17K; u=1..500K (erru,m)[

mFh ]

mse/mFh = (2/8.5B) m=1..17K; u=1..500K (erru,m)[ uFk ]So, we increment each uFh+ = 2mse * mFhand we increment each mFh+ = 2mse * uFh+This is a big move and may overshoot the minimum, so the 2 is replaced by a smaller learning rate, lrate (e.g., Funk takes lrate=0.001)

Page 3: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD

Moving on: 20M free params is a lot for a 100M TrainSet. Seems neat to just ignore all blanks, but we have expectations about them. As-is, this modified SVD algorithm tends to make a mess of sparsely observed movies or users. If you have a user who has only rated 1 movie, say

American Beauty=2 while the avg is 4.5, and further that their offset is only -1, we'd, prior to SVD, expect them to rate it 3.5. So the error given to the SVD is -1.5 (the true rating is 1.5 less than we expect).

m(Action) is training up to measure the amount of Action, say, .01 for American Beauty (ust slightly more than avg). SVD optimize predictions, which it can do by eventually setting our user's preference for Action to a huge -150. I.e., the alg naively looks at the only example it has of this user's preferences and in the context of only the one feature it knows about so far (Action), determines that our user so hates action movies that even the tiniest bit of action in American Beauty makes it suck a lot more than it otherwise might. This is not a problem for users we have lots of observations for because those random apparent correlations average out and the true trends dominate.

We need to account for priors. As with the average movie ratings, blend our sparse observations in with some sort of prior, but it's a little less clear how to do that with this incremental algorithm. But if you look at where the incremental algorithm theoretically converges, you get:

userValue[user] = [sum residual[user,movie]*movieValue[movie]] / [sum (movieValue[movie]^2)]

The numerator there will fall in a roughly zero-mean Gaussian distribution when charted over all users, which through various gyrations:

userValue[user] = [sum residual[user,movie]*movieValue[movie]] / [sum (movieValue[movie]^2 + K)] And finally back to: userValue[user] += lrate * (err * movieValue[movie] - K * userValue[user]);movieValue[movie] += lrate * (err * userValue[user] - K * movieValue[movie]);

This is equivalent to penalizing the magnitude of the features. To cut over fitting, allowing use of more features.

If m only appears once with r(m,u)=1 say, AvgRating(m)=1? Probably not! View r(m,u)=1 as a draw from a true prob dist who's avg you want...View that true average itself as a draw from a prob dist of averages--the histogram of average movie ratings. Assume both distributions Gaussian,

then the best-guess mean should be lin combo of observed mean and apriori mean, with a blending ratio equal to the ratio of variances.

If Ra and Va are the mean and variance (squared standard deviation) of all of the movies' average ratings (which defines your prior expectation for a new movie's average rating before you've observed any actual ratings) and Vb is the average variance of individual movie ratings (which tells you how indicative each new observation is of the true mean--e.g,. if the average variance is low, then ratings tend to be near the movie's true mean, whereas if the avg variance is high, ratings tend to be more random and less indicative) then:

BogusMean = sum(ObservedRatings)/count(ObservedRatings) K = Vb/VaBetterMean = [GlobalAverage*K + sum(ObservedRatings)] / [K + count(ObservedRatings)]

The point here is simply that any time you're averaging a small number of examples, the true average is most likely nearer the apriori average than the sparsely observed average. Note if the number of observed ratings for a particular movie is zero, the BetterMean (best guess) above defaults to the global average movie rating as one would expect.

Refinements: Prior to starting SVD, Note: AvgRating(movie), AvgOffset(UserRating, MovieAvgRating), for every user. I.e.:static inline real predictRating_Baseline(int movie, int user) {return averageRating[movie] + averageOffset[user];}

So, that's the return value of predictRating before the first SVD feature even starts training. You'd think avg rating for a movie would just be... its average rating! Alas, Occam's razor was a little rusty that day.

Page 4: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD

Two choices for G proved useful. 1. clip the prediction to 1-5 after each component is added. E.g., each feature is limited to only swaying rating within the valid range, and any excess beyond that is lost rather than carried over. So, if the first feature suggests +10 on a scale of 1-5, and the second feature suggests -1, then instead of getting a 5 for the final clipped score, it gets a 4 because the score was clipped after each stage. The intuitive rationale here is that we tend to reserve the top of our scale for the perfect movie, and the bottom for one with no redeeming qualities whatsoever, and so there's a sort of measuring back from the edges that we do with each aspect independently. More pragmatically, since the target range has a known limit, clipping is guaranteed to improve our perf, and having trained a stage with clipping on, use it with clipping on. I did not really play with this extensively enough to determine there wasn't a better strategy.

A second choice for G is to introduce some functional non-linearity such as a sigmoid. I.e., G(x) = sigmoid(x). Even if G is fixed, this requires modifying the learning rule slightly to include the slope of G, but that's straightforward. The next question is how to adapt G to the data. I tried a couple of options, including an adaptive sigmoid, but the most general and the one that worked the best was to simply fit a piecewise linear approximation to the true output/output curve. That is, if you plot the true output of a given stage vs the average target output, the linear model assumes this is a nice 45 degree line. But in truth, for the first feature for instance, you end up with a kink around the origin such that the impact of negative values is greater than the impact of positive ones. That is, for two groups of users with opposite preferences, each side tends to penalize more strongly than the other side rewards for the same quality. Or put another way, below-average quality (subjective) hurts more than above-average quality helps. There is also a bit of a sigmoid to the natural data beyond just what is accounted for by the clipping. The linear model can't account for these, so it just finds a middle compromise; but even at this compromise, the inherent non-linearity shows through in an actual-output vs. average-target-output plot, and if G is then simply set to fit this, the model can further adapt with this new performance edge, which leads to potentially more beneficial non-linearity and so on... This introduces new free parameters and encourages over fitting especially for the later features which tend to represent small groups. We found it beneficial to use this non-linearity only for the first twenty or so features and to disable it after that.

Moving on: Despite the regularization term in the final incremental law above, over fitting remains a problem. Plotting the progress over time, the probe rmse eventually turns upward and starts getting worse (even though the training error is still inching down). We found that simply choosing a fixed number of training epochs appropriate to the learning rate and regularization constant resulted in the best overall performance. I think for the numbers mentioned above it was about 120 epochs per feature, at which point the feature was considered done and we moved on to the next before it started over fitting. Note that now it does matter how you initialize the vectors: Since we're stopping the path before it gets to the (common) end, where we started will affect where we are at that point. I wonder if a better regularization couldn't eliminate overfitting altogether, something like Dirichlet priors in an EM approach--but I tried that and a few others and none worked as well as the above.

Here is the probe and training rmse for the first few features with and w/o regularization term "decay" enabled. Same thing, just the probe set rmse, further along where you can see the regularized version pulling ahead: This time showing probe rmse (vertical) against train rmse (horizontal). Note how the regularized version has better probe performance relative to

the training performance:

Anyway, that's about it. I've tried a few other ideas over the last couple of weeks, including a couple of ways of using the date information, and while many of them have worked well up front, none held their advantage long enough to actually improve the final result.

If you notice any obvious errors or have reasonably quick suggestions for better notation or whatnot to make this explanation more clear, let me know. And of course, I'd love to hear what y'all are doing and how well it's working, whether it's improvements to the above or something completely different. Whatever you're willing to share,

Moving on: Linear models are limiting. We've bastardized the whole matrix analogy so much that we aren't really restricted to linear models: We can add non-linear outputs such that instead of predicting with: sum (userFeature[f][user] * movieFeature[f][movie]) for f from 1 to 40. We can use: sum G(userFeature[f][user] * movieFeature[f][movie]) for f from 1 to 40.

Page 5: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 6: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 7: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 8: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 9: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 10: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 11: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 12: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 13: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 14: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 15: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD
Page 16: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD

mpp-mpred.C reads PROBE, loops thru (Mi, ProbeSup(Mi), pass each to mpp-user.C. mpp-mpred.C can call separate instances of mpp-user.C for many Us (in parallel (governed by # of slots.)

mpp-user.C loops thru ProbeSup(M), reads config file, prints prediciton(M,U) to predictionsFor user votes, mpp-user.C calls user-vote.C For movie votes, mpp-user.C calls movie-vote.C

user-vote.C prunes, loops thru user voters, V. calculating a V-vote. Combines V-votes and returns vote. movie-vote.C similar.

mpp-mpred.C

mpp-user.C

user-vote.C movie-vote.C

prune.C

( Mi, ProbeSup(Mi)={Ui1, …, Uik})

Loops thru ProbeSup, from uservote, movieVOTE writes Predict(Mi,Uik) to predictions Uik ProbeSup(Mi)

( Mi , Sup(Mi), Uik , Sup(Uik ))

vote(Mi ,Uik )

( Mi , Sup(Mi), Uik , Sup(Uik ))

VOTE(Mi ,Uik )

We must loop thru V’s (VPHD rather than HPVD) because the HP required of most correlation calculations is impossible using AND/OR/COMP.

Cinematch use Training Table Rents(MID,UID,Rating,Date), to classify new (MID,UID,Date) (i.e., predict ratings).

Nearest Neighbor User Voting: uid votes on rating(MID,UID) if it is near enough to UID in it’s ratings of movies M={mid1, ..., midk}

(i.e., near is based on a User-User correlation over M ). User-User-Correlation? (Pearson, Cosine?) and the set M={mid 1,…, midk }.

Nearest Neighbor Movie Voting: mid votes on rating(MID,UID) if its ratings by U={uid1,..., uidk} are near enough to those of MID

(i.e., near is based on a Movie-Movie correlation over U). Movie-Movie-Correlation? (Pearson or Cosine or?) and set U={uid 1,…, uidk }.

The data mining algorithms in movie-vote.C (first Nearest Neighbor Classification code)Similar (dual) code either exists or will exist in user-vote.C. The file, movie-vote-full.C, contains ARM attempts, Boundary-based attempts and the Nearest Neighbor Classification attempts. The file, movie-vote-justNN.C contains only the NN attempts (so we will start with that).

Page 17: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD

extern double movie_vote(PredictionConfig *pcfg, // 2010_11_13 notesunsigned long int M, PTree & supportM, unsigned long int U, PTree & supportU){auto double MU=Users.get_rating(U,M)-2, VOTE=DEFAULT_VOTE,VOTE_sum=0,VOTE_cnt=0,Nb,Mb,dsSq,UCor=1, supportUsize=supportU.get_count(), supportMsize=supportM.get_count();struct pruning *internal_prune; struct external_prune *external_prune;auto PTree supM=supportM, supU=supportU; supM.clearbit(U); supU.clearbit(M);

mov

ie-v

ote.

C c

ode

/* External pruning: Prune Users supM */external_prune = pcfg->get_movie_Prune_Users_in_SupM();if (external_prune->enabled) {

if(supM.get_count()>external_prune->params.Ct)do_pruning(external_prune, M, U, supM, supU); supM.clearbit(U); supU.clearbit(M);if((supM.get_count()<1)||(supU.get_count()<1)) return VOTE;}

/* External pruning: Prune Movies supU */external_prune = pcfg->get_movie_Prune_Movies_in_SupU();if (external_prune->enabled) {

if(supU.get_count()>external_prune->params.Ct )do_pruning(external_prune, M, U, supM, supU); supM.clearbit(U); supU.clearbit(M);if((supM.get_count()<1) || (supU.get_count()<1) ) return VOTE;

auto PTreeSet & U_ptree_set= Users.get_ptreeset(), & M_ptree_set=Movies.get_ptreeset(); supU.clearbit(M); supM.clearbit(U);auto PTree supU_1=supU&(~U_ptree_set[(U*3)+0])&( U_ptree_set[(U*3)+1])&( U_ptree_set[(U*3)+2]),supU_2=supU&( U_ptree_set[(U*3)+0])&(~U_ptree_set[(U*3)+1])&(~U_ptree_set[(U*3)+2]),supU_3=supU&( U_ptree_set[(U*3)+0])&(~U_ptree_set[(U*3)+1])&( U_ptree_set[(U*3)+2]),supU_4=supU&( U_ptree_set[(U*3)+0])&( U_ptree_set[(U*3)+1])&(~U_ptree_set[(U*3)+2]),supU_5=supU&( U_ptree_set[(U*3)+0])&( U_ptree_set[(U*3)+1])&( U_ptree_set[(U*3)+2]),supM_1=supM&(~M_ptree_set[(M*3)+0])&( M_ptree_set[(M*3)+1])&( M_ptree_set[(M*3)+2]),supM_2=supM&( M_ptree_set[(M*3)+0])&(~M_ptree_set[(M*3)+1])&(~M_ptree_set[(M*3)+2]),supM_3=supM&( M_ptree_set[(M*3)+0])&(~M_ptree_set[(M*3)+1])&( M_ptree_set[(M*3)+2]),supM_4=supM&( M_ptree_set[(M*3)+0])&( M_ptree_set[(M*3)+1])&(~M_ptree_set[(M*3)+2]),supM_5=supM&( M_ptree_set[(M*3)+0])&( M_ptree_set[(M*3)+1])&( M_ptree_set[(M*3)+2]),sou, souM, souU, som, somU, somM, spM, spU;auto double thr1, expnt1, thr2, expnt2, s, S, ss, sn, sM, sU, c, C, wt, XBalVT, wt_const=16;

Page 18: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD

/* Nearest Neighbor Code */ supU.clearbit(M); auto unsigned long long int *supUlist = supU.get_indexes();for ( unsigned long long int n= 0; n < supU.get_count(); ++n) //NLOOP (voters){auto unsigned long long int N=supUlist[n]; if (N == M) continue;auto double NU=Users.get_rating(U,N)-2,MAX=0,smN=0,smM=0,MM=0,MN=0,NN=0,denom=0,dm;auto PTree supN=Movies.get_users(N), csMN= supM & supN;

csMN.clearbit(U); dm=csMN.get_count(); if(dm<1) continue;

/* External pruning: PRUNE USERS CoSupMN */external_prune = pcfg->get_movie_Prune_Users_in_CoSupMN();if (external_prune->enabled) {if(csMN.get_count()>external_prune->params.Ct)do_pruning(external_prune,M,U,csMN,supU); csMN.clearbit(U); supU.clearbit(M); dm=csMN.get_count(); if(dm<1) continue;}/*Adjusted Cosine*/auto double ACCor,Vbar,ACCnum=0,ACCden,ACCdenSum1=0,ACCdenSum2=0;auto unsigned long long int *csMNlist=csMN.get_indexes();for (unsigned long long int v= 0; v < csMN.get_count(); ++v){ //VLOOP (dims) auto unsigned long long int V= csMNlist[v]; auto double MV=Users.get_rating(V,M)-2, NV=Users.get_rating(V,N)-2; if(pow(MV-NV,2) > MAX) MAX=pow(MV-NV,2); smN+=NV; smM+=MV; MM+=MV*MV; MN+=NV*MV; NN+=NV*NV; ++denom;/* Adjusted Cosine code */auto PTree supV=Users.get_movies(V);Vbar=Users.get_mean(V,supV); ACCnum+=(NV-Vbar)*(MV-Vbar);ACCdenSum1+=(NV-Vbar)*(NV-Vbar); ACCdenSum2+=(MV-Vbar)*(MV-Vbar); } //VLOOP ends

mov

ie-v

ote.

C N

N c

ode

/* Adjusted Cosine code */ACCden=pow(ACCdenSum1,.5)*pow(ACCdenSum2,.5);ACCor=ACCnum/ACCden;UCor=ACCor;dm=csMN.get_count(); if(denom<1) continue; else {Nb=smN/dm; Mb=smM/dm; dsSq=NN-2*MN MM; VOTE=NU-Nb+Mb;}if (UCor>0) {VOTE_sum+=VOTE*UCor; VOTE_cnt+=UCor; } else continue;if ( pcfg->movie_vote_force_in_loop() ) {if((VOTE<1)&&(VOTE!=DEFAULT_VOTE))VOTE=1;if((VOTE>5)&&(VOTE!=DEFAULT_VOTE))VOTE=5;}} /*end NLOOP*/

if ( VOTE_cnt>0 ) VOTE=VOTE_sum/VOTE_cnt; else VOTE=DEFAULT_VOTE;/* force_vote_after_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_after_loop() ) { if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) ) VOTE=1; if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) ) VOTE=5; return VOTE;}

Page 19: SVD classification. Does it lend itself to pTree processing? What rendition of the many that are called SVD lends itself best? A new rendition? Funk SVD

createconfigs script in src/mpp-mpred-3.2.0/p95/mu11#!/bin/bashfor g in .1 .2 .4 .7 .9 dosed -i -e "s/dMNsdsThr=[^ ]*/dMNsdsThr=$g/" t.configfor h in .1 .2 .4 .7 .9 dosed -i -e "s/dMNsdsExp=[^ ]*/dMNsdsExp=$h/" t.configcp t.config configs/a$g$h.config

-rw-r--r-- 1 a.1.1.out-rw-r--r-- 1 a.1.2.out-rw-r--r-- 1 a.1.4.out-rw-r--r-- 1 a.1.7.out-rw-r--r-- 1 a.1.9.out-rw-r--r-- 1 a.2.1.out-rw-r--r-- 1 a.2.2.out-rw-r--r-- 1 a.2.4.out-rw-r--r-- 1 a.2.7.out-rw-r--r-- 1 a.2.9.out-rw-r--r-- 1 a.4.1.out-rw-r--r-- 1 a.4.2.out-rw-r--r-- 1 a.4.4.out-rw-r--r-- 1 a.4.7.out-rw-r--r-- 1 a.4.9.out-rw-r--r-- 1 a.7.1.out-rw-r--r-- 1 a.7.2.out-rw-r--r-- 1 a.7.4.out-rw-r--r-- 1 a.7.7.out-rw-r--r-- 1 a.7.9.out-rw-r--r-- 1 a.9.1.out-rw-r--r-- 1 a.9.2.out-rw-r--r-- 1 a.9.4.out-rw-r--r-- 1 a.9.7.out-rw-r--r-- 1 a.9.9.outI copy to src/mpp-mpred-3.2.0/dotouts.

creates in src.mpp-mpred-3.2.0/p95/mu11/configs:-rw-r--r-- 1 a.1.1.config-rw-r--r-- 1 a.1.2.config-rw-r--r-- 1 a.1.4.config-rw-r--r-- 1 a.1.7.config-rw-r--r-- 1 a.1.9.config-rw-r--r-- 1 a.2.1.config-rw-r--r-- 1 a.2.2.co-rw-r--r-- 1 a.2.4.co-rw-r--r-- 1 a.2.7.co-rw-r--r-- 1 a.2.9.co-rw-r--r-- 1 a.4.1.co-rw-r--r-- 1 a.4.2.co-rw-r--r-- 1 a.4.4.co-rw-r--r-- 1 a.4.7.co-rw-r--r-- 1 a.4.9.co-rw-r--r-- 1 a.7.1.co-rw-r--r-- 1 a.7.2.co-rw-r--r-- 1 a.7.4.co-rw-r--r-- 1 a.7.7.co-rw-r--r-- 1 a.7.9.co-rw-r--r-- 1 a.9.1.co-rw-r--r-- 1 a.9.2.co-rw-r--r-- 1 a.9.4.co-rw-r--r-- 1 a.9.7.co-rw-r--r-- 1 a.9.9.co

submit script run in scr/mpp-mpred-3.2.0 produces subdirs in mpp-mpred-3.2.0 :

drwxr-xr-x 2 10:15 a.1.1drwxr-xr-x 2 10:15 a.1.2drwxr-xr-x 2 10:15 a.1.4drwxr-xr-x 2 10:15 a.1.7drwxr-xr-x 2 10:15 a.1.9drwxr-xr-x 2 10:15 a.2.1drwxr-xr-x 2 10:15 a.2.2drwxr-xr-x 2 10:16 a.2.4drwxr-xr-x 2 10:16 a.2.7drwxr-xr-x 2 10:16 a.2.9drwxr-xr-x 2 10:16 a.4.1drwxr-xr-x 2 10:16 a.4.2drwxr-xr-x 2 10:16 a.4.4drwxr-xr-x 2 10:16 a.4.7drwxr-xr-x 2 10:17 a.4.9drwxr-xr-x 2 10:17 a.7.1drwxr-xr-x 2 10:17 a.7.2drwxr-xr-x 2 10:17 a.7.4drwxr-xr-x 2 10:17 a.7.7drwxr-xr-x 2 10:17 a.7.9drwxr-xr-x 2 10:17 a.9.1drwxr-xr-x 2 10:18 a.9.2drwxr-xr-x 2 10:18 a.9.4drwxr-xr-x 2 10:18 a.9.7drwxr-xr-x 2 10:18 a.9.9

and e.g., a.9.9 contains:

-rw-r--r-- 1 a.9.9.config-rw-r--r-- 1 hi-a.9.9.txt-rw-r--r-- hi-a.9.9.txt.answers-rw-r--r-- lo-a.9.9.txt-rw-r--r-- lo-a.9.9.txt.answers-rw-r--r-- p95test.txt.predictions-rw-r--r-- p95test.txt.rmse

submit in src/mpp-mpred-3.2.0 produces here#!/bin/bashfor g in .1 .2 .4 .7 .9 do; for h in .1 .2 .4 .7 .9 do./mpp-submit -S -i Data/p95test.txt -c p95/mu11/configs a$g$h.out -t .05 -d ./p95/mu11

p95test.txt.rmse Movie: 12641: 0: Ans: 1 Pred: 1.22 Error: 0.04840 1: Ans: 4 Pred: 3.65 Error: 0.12250 2: Ans: 2 Pred: 2.55 Error: 0.30250 3: Ans: 4 Pred: 4.04 Error: 0.00160 4: Ans: 2 Pred: 1.85 Error: 0.02250 Sum: 0.49750 Total: 5 RMSE: 0.315436 Running RMSE: 0.315436 / 5 predictionsMovie: 12502: 0: Ans: 4 Pred: 4.71 Error: 0.50410 1: Ans: 5 Pred: 3.54 Error: 2.13160 2: Ans: 5 Pred: 3.87 Error: 1.2769 3: Ans: 3 Pred: 3.33 Error: 0.10890 4: Ans: 2 Pred: 2.97 Error: 0.94090 Sum: 4.96240 Total: 5 RMSE: 0.996233 Running RMSE: .738911 /10 predictions:Movie: 10811 0: Ans: 5 Pred: 4.05 Error: 0.90250 1: Ans: 3 Pred: 3.49 Error: 0.24010 2: Ans: 4 Pred: 3.94 Error: 0.00360 3: Ans: 3 Pred: 3.39 Error: 0.15210 Sum: 1.29830 Total: 4 RMSE: 0.569715 Running RMSE: 0.964397 / 743 predsMovie: 12069: 0: Ans: 4 Pred: 3.20 Error: 0.64000 1: Ans: 3 Pred: 3.48 Error: 0.23040 Sum: 0.87040 Total: 2 RMSE: 0.659697Prediction summary: Sum: 691.9061 Total:745 RMSE: .963708

.predictions12641:1.223.652.554.041.85

12502:4.713.543.873.332.97:.10811:4.053.493.943.39

12069:3.203.48

In dotouts is a script, createtablermse:#!/bin/bashfor g in .1 .2 .4 .7 .9 do; for h in .1 .2 .4 .7 .9 dogrep RMSE:\ a$g$h.out >> rmse

Sum: 692.82510 Total: 745 RMSE: 0.964348 Sum: 691.59330 Total: 745 RMSE: 0.963490 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.84690 Total: 745 RMSE: 0.963667 Sum: 690.47330 Total: 745 RMSE: 0.962710 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 693.27970 Total: 745 RMSE: 0.964664 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708

dotouts is a script, createtablejob:#!/bin/bashfor g in .1 .2 .4 .7 .9 do; for h in .1 .2 .4 .7 .9 dogrep Input:\ \ \ lo a$g$h.out >> job

Input: lo-a.1.1.txt Input: lo-a.1.2.txt Input: lo-a.1.4.txt Input: lo-a.1.7.txt Input: lo-a.1.9.txt Input: lo-a.2.1.txt Input: lo-a.2.2.txt Input: lo-a.2.4.txt Input: lo-a.2.7.txt Input: lo-a.2.9.txt Input: lo-a.4.1.txt Input: lo-a.4.2.txt Input: lo-a.4.4.txt Input: lo-a.4.7.txt Input: lo-a.4.9.txt Input: lo-a.7.1.txt Input: lo-a.7.2.txt Input: lo-a.7.4.txt Input: lo-a.7.7.txt Input: lo-a.7.9.txt Input: lo-a.9.1.txt Input: lo-a.9.2.txt Input: lo-a.9.4.txt Input: lo-a.9.7.txt Input: lo-a.9.9.txt