Download - Cs221 Report

Transcript
Page 1: Cs221 Report

Sentiment Analysis of Movie Reviews: A Study ofFeatures and Classifiers

Siddharth Jain and Sushobhan NayakCS221: Stanford University

sjain2, snayak @ stanford.edu

Abstract—We love movies, and in this project we experimenton a sentiment analysis task on movie reviews. Ourobjective is two-fold: 1) the binary sentiment classifi-cation of a large dataset of movie reviews from IMDB,2) predicting the critic-assigned rating of the moviefrom the review. We extract bag-of-words and tf-idfand LDA-based language features from the documentsto gauge the saliency of different words and sentencestructures for the task. We then experiment withdifferent learning algorithms like Naive Bayes, anddifferent flavors of SVM with different kernels, toclassify our documents – which helps us compare theimportance and use of different textual features as wellas the capability of the standard learning algorithmsin such a task. We present a detailed analysis of theeffects of the myriads of features and classifiers wehave considered and support them with a battery ofexperiments on a massive dataset.

I. INTRODUCTION

In this project, we investigate the problem of gauging thesentiment of a movie reviewer from the review, and predicthow the reviewer rated the movie. Sentiment analysis tasksare predominantly useful in all ’rating games’, whether theybe shopping/restaurant suggestions or entertainment venuesuggestions: add to that the increased importance of gauginghow a person feels from their social network posts, and youhave got a fairly complex and relevant problem at hand. Whilemany previous works try to tackle this issue, our work here isin line of [1] and [2], and we use the later dataset.

The IMDB dataset provided by Andrew Maas[3] contains50,000 reviews split evenly into 25k train and 25k test sets.The overall distribution of labels is balanced (25k pos and25k neg, with equal numbers in both the train and test set).In the labeled train/test sets, a negative review has a score<= 4 out of 10, and a positive review has a score >= 7out of 10. Thus reviews with more neutral ratings are notincluded in the train/test sets. Our task here is two-fold: 1) toclassify the documents in the test set as pos or neg , and 2)to properly try to predict the score of the documents in thetest set, after training our model. This task helps us strengthenour concepts in two ways: Firstly, it helps us investigate anddifferent aspects of text processing, especially the relativeprominence of features like bag-of-words (hereafter, BOW),term frequency-inverse document frequency(tf-idf) and latentDirichlet allocation(LDA)-based features. Secondly, by lettingus play with some standard classifiers, it helps us internalizethe concepts of parameter selection and comparative perfor-

mance. Furthermore, it lets us test the assumptions derivedfrom the theory.

II. TASK DEFINITION AND METHODS

The dataset has been described in the previous section. Inthe entire collection, no more than 30 reviews are allowedfor any given movie because reviews for the same movietend to have correlated ratings. Further, the train and test setscontain a disjoint set of movies, so no significant performanceis obtained by memorizing movie-unique terms and theirassociated with observed labels. We represent each documentin the dataset as a feature vector, with the features beingeither raw bag-of-words, tf-idf or LDA topics (as these arestandard document features used widely in NLP, we refrainfrom providing an explanation for each of them due to spaceconstraints – details can be found in the appendix), whichhelps us compare between the three for different settings.For the classification task, the baseline algorithm predictsthe most common label: please note that we are going torun two sets of classification tasks: a binary classifier forsentiment analysis, and a multi-class classifier(and regresser)for score prediction. The classification problem is handled withnaive-Bayes classification and SVMs (also, given that they arestandard, explained in appendix). We also increase the stakesby investigating different forms of both the algorithms, througha variation in kernels, and types of the classifiers - so, ineffect, instead of comparing just two complementary methods,we compare a set of possible methods in a systematic way.While naive-Bayes gives us a simplistic and fast algorithmfor classification, SVM represents a complex time consumingalgorithm that’s expected to provide higher accuracy – twoaspects we investigate below.

III. EXPERIMENT

Each document was converted to a bag-of-words repre-sentation, using sparse matrix libsvm format[4] for storage.These BOW representations were transformed into tf-idf andLDA-topic vector representations(500 topics). We then trainedthe various classifiers and tested the prediction on the testdocuments. We used the following standard libraries:• LibSVM[4], for SVM classifier.• scikits.learn[5], for naive-bayes classification• Gensim[6], for document feature extraction and conver-

sion from one feature to the otherThe comparisons made were the following:• For the multi-class SVM classification, 4 types of SVM

were used, viz. two SVM classifiers, C-SVC and ν-SVC,

Page 2: Cs221 Report

and for SVM regression, ε-SVR and ν-SVR. While theclassifiers would predict the score as one of the trainsocres, to wit 1-4/7-10 (reviews with score 5/6 absentfrom dataset because they are neutral, hence effectively a8-class classification task, though we allude to it as a 10-class classification sometimes in the spirit of score rangeof 1-10), the regression values will be free to be realnumbers, that are supposed to get as close as possible tothe expected integer score. We do detailed error analysisin the following section, and the different types of SVMsare detailed in the appendix. For the binary sentimentanalysis task, as is expected, only the first two classifierswere used.

• For each type of SVM, 4 types of kernels were used,linear(u′ × v), polynomial((γ × u′ × v + coef0)degree),radial basis function(RBF): exp(-γ × |u − v|2) andsigmoid(tanh(γ × u′ × v + coef0)), denoted as kerneltype 0-4.

• Scaling: The objective was to see how scaling wouldaffect the SVM classification

• For Naive-Bayes classification, we used two classifiers:Multinomial Naive-Bayes, a classic Naive Bayes variantused in text classification, which usually takes data rep-resented as word vector counts, although tf-idf vectorsare also known to work well in practice (and also inthis present experiment), and Bernoulli Naive Bayes,which requires samples to be represented as binary-valuedfeature vectors, i.e. 1 for the word that is present in thedocument and 0 otherwise, irrespective of the frequency.(Details in appendix.)

• Performance of the three feature representations in all theabove scenarios was analysed.

We ended up testing 155 combinations of these variables,the details of which have been included in a long table at theend of this report. Before we move on to the analysis, it’s to benoted that, for the multi-class classification task, the baselineaccuracy is 20.088% when the most occurring label(score1) is predicted. For the binary sentiment classification, sincewe have equal number of instances from both classes, thebase case is at chance, i.e. 50%. From the table, even basicclassifiers like Naive-Bayes work pretty well in these cases,with around 80% accuracy with BOW and tf-idf and 60%with LDA(Fig. 11). Prediction due to SVM varies from thebase case of 50% to a maximum of 88.632%. For the 8-classclassification, accuracy ranges from base case of 20.088% to41%, while the regression results show a mean-squared errorin the range of 6.8 to 28.8 and squared correlation coefficientof 0 to 0.509 (both defined in appendix). In general, for thebinary classification task, the SVM classification, except withthe polynomial kernel, gives better prediction than Naive-Bayes, though it takes a fairly long time, as is expected. SVMshows a high error rate for multi-class classification, primarilybecause of the unbalanced number of training examples fordifferent classes. Reviews with scores 10 and 1 are the highestin the training set, and as such, this classification is heavilybiased towards them, so much so that for the default valuefor libSVM, with unscaled raw BOW data, the test documentsare completely classified into either 1 or 10. In the following

section, we present a detailed analysis of the results and theinsights gained.

IV. ANALYSIS

A. Time Complexity

While naive-Bayes classification ran in seconds (exactnumber unspecified, but always within half-a- minute), SVMruns took minutes, the minimum being 3min and maximumbeing 4 hours 35 mins. This is expected because the presentimplementation of naive-Bayes is just a relative frequencyestimator, whereas the version of SVM implemented has tomap a 89,527-D feature space to a higher dimension throughkernel transformation and then run optimization in that space.When time complexity of features is concerned, tf-idf and rawBOW have comparable running times, one being higher thanthe other at different times and vice-versa; however, LDA takesmuch less time than the previous two, 5 to 6 times being theusual reduction, which is expected because LDA topic modelin the present experiment has only 500 features(Fig. 1, 2).As implemented, libsvm is polynomial order, while due tolinear nature of the dataset, without kernel transformations,classification can be done in linear time.

B. Feature-effects

Tf-idf and raw BOW results are comparable, while, incomparison, LDA results show around a 10-15% drop (Fig.3, 4). This is expected because the former two boast of a∼90,000-D feature space, while the later is in 500-D. Ingeneral, tf-idf is expected to fare better than raw BOW;however, though such a trend is there, it’s not too pronounced,with only <1% boost in most cases. There might be tworeasons why this can be happening: our tf-idf features are notcosine normalized, and it’s usually accepted that tf-idf worksbetter under normalization – however, this ill-effect born ofnot normalizing is largely seen in longer documents, which,with more terms and higher term frequencies, tend to getlarger dot products than smaller documents, thereby creatinga strong bias; in our case, as the reviews are of almost thesame length, this effect is therefore not pronounced. Secondly,another anomaly that can be noticed is that, sometimes tf-idf predictions are much lower than raw BOWs, e.g. inthe multinomial naive-Bayes case (they are the same in theBernoulli case because Bernoulli uses binary features, whichwould be the same for both feature-sets, which acts as a sanitycheck), where they are 4 and 7% and with scaled-C-SVC withkernel types 2 and 3, where they are off by 20%: this mightbe the case because tf-idf is oblivious to the class labels in thetraining set, which can lead to inappropriate scaling for somefeatures, especially in sentiment analysis tasks: consider twowords representative of positive sentiment that do not occurin the negative documents at all – in this case, while the morefrequent word of the two is a better estimator of the sentiment,due to its high document frequency, it would be assigned alower weight in tf-idf.

C. Scaling effect on SVM

[7] explains the importance of scaling in classification. Allthe three feature sets were run in scaled and unscaled modes

Page 3: Cs221 Report

– regular features were scaled to have values in [0,1], asopposed to the more regular practice of [-1,1] due to thesparse nature of the data: [0,1] scaling helps keep missingfeature values to 0, which would otherwise be -1 in manycases, leading to extra computational load without any gainin classification performance(at least for a RBF kernel withoptimal parameters). While scaling is expected to improveclassification results, in our experiments we didn’t see apronounced effect between runs of scaled and unscaled data,with scaled data only being a marginally better predictor inregression and almost equally good in classifications (Figs.5, 6), just like the case with tf-idf vs. BOW. This might bebecause scaling is relevant when the orders of magnitudes ofdifferent features is unknown, and in that case, scaling preventsfeatures at the higher end of the measurement spectrum tobe unfairly given larger weights. In our case however, allthe feature dimensions are word-frequencies, and they arecomparable – not some artefact of the measuring process, sothat there is minimal bias in that sense for scaling to eliminate.However, scaling is also helpful in the computation process,and that explains the little bit of boost.

D. Effect of types of SVM

We have used 4 types of SVMs, C- and ν-SVC forclassification and ε- and ν-SVR for regression. The C- andν versions are similar, with C ∈ [0,∞), but ν ∈ [0, 1],with ν being preferred because it is related to the ratio ofsupport vectors and the ratio of the training error, and isintuitively and asthetically pleasing, though both the SVMsare otherwise equivalent and expected to give similar results,which is evident from the graph. However, ν-SVC also showsslightly more prominent results than its counterpart at times,especially towards the end of 2- and 10-class classificationpart of the graph (Fig. 7). It might be because that thedefault parameterν(=0.5) conforms to our dataset more thanthe default value of C(=1) does. Optimal parameter selection isexpected to provide comparable results. In fact, as we describein the next item, the optimal C was found to be 512, muchmuch larger than the default parameter, which might explainthe poor results in comparison. The SVRs also have a similarrelationship, and their results are indeed comparable (Fig. 8).

E. Parameter setting

While the default parameters in libSVM are good for gen-eral testing, each dataset requires these parameters to be tunedproperly for a better classification result. While our initial aimwas to explore this scenario, given the stupendous size of thedataset, the time constraints prevented that from happening.For example, the algorithm does a grid search to discover thebest C through a 5-fold cross validation on the training data.While it’s manageable for datasets with a thousand instances,the running time runs into days for a 25,000-instances strongdataset. One instance ran for two days without being ableto produce a result. We improvised and randomly selecteda subset of 5000 examples and ran the grid search, whichtook 6 hrs to produce C=512, which was then subsequentlyused in a binary sentiment classification using C-SVM to geta 88.336% accuracy on scaled raw BOW with the RBF kernel,

an improvement of 17% over 71% without parameter tuning.Notice that a high C of 512 also increases the training time,so that over the 150 odd runs of the experiment, a largeC seriously undermines the time constraints of a small-classproject. When faced with the choice of a larger C and smallernumber of runs and the default C and ∼150 runs, we chosethe lesser of the two evils so that we can get more insight intoother aspects of the classification task like feature selection,kernel manipulation etc., and have an idea of the effect ofparameter tuning from this one instance.

F. Effect of Kernels

We notice that, predominantly, the linear kernel gives thebest results, whereas the polynomial gives the worst. RBF andsigmoid kernels give comparable results, with the former giv-ing slightly better results (Fig. 9, 10). This behavior is actuallyexpected: with polynomial kernels, numerical difficulties tendto happen, for dth power of numbers go to 0 or ∞ dependingwhether they are >< 1. the kernel matrix for sigmoid is notalways positive definite in general, therefore, its accuracy, ingeneral, is less than that of the RBF[8]. The relation betweencase labels and attributes in text classification problems areprimarily linear, hence the results of the linear kernel – also,for a large number of features, linear kernel is supremelyuseful, since a non-linear mapping doesn’t improve the perfor-mance, while takes up a sizable amount of time(linear kernelhas the fastest running time for the classification tasks inthis experiment). The RBF kernel, however, would be ableto produce comparable results with parameter tuning[9].

V. CONCLUSION

The series of experiments helped us gain fruitful insightinto the nuances of selecting features and decide on classifiersin a text-classification task. The results of the experimentconformed to theoretical predictions most of the time: whenthey seemingly didn’t, they compelled us to look into the ex-planations, which were satisfactory and helped understand thetask better. The shear volume of the work – a massive datasetand 150 odd runs – helped us gain research experience andstrengthened our concepts. Though we used libSVM to playwith SVMs, given the linear nature of the problem, similarclassification task can be completed with libLinear[10], whichdoesn’t use kernel transformation. In fact, libsvm is O(n2) orO(n3) whereas liblinear is O(n), though it does not supportkernel SVMs, so that investigating that would be a naturalextension. The next stage would be to follow a similaritytype grouping of documents for sentiment classification. Thedata set contains another 50000 reviews, which are unlabeled,and an unsupervised learning attempt on them might take thisattempt further.

REFERENCES

[1] Bo Pang and Lillian Lee. A sentimental education: Sentiment analysisusing subjectivity summarization based on minimum cuts. In InProceedings of the ACL, pages 271–278, 2004.

[2] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, An-drew Y. Ng, and Christopher Potts. Learning word vectors for sentimentanalysis. In Proceedings of the 49th Annual Meeting of the Associationfor Computational Linguistics: Human Language Technologies - Volume1, HLT ’11, pages 142–150, Stroudsburg, PA, USA, 2011. Associationfor Computational Linguistics.

Page 4: Cs221 Report

[3] Dataset. http://ai.stanford.edu/∼amaas/data/sentiment/index.html.[4] LibSVM. http://www.csie.ntu.edu.tw/∼cjlin/libsvm/.[5] Scikits.learn. http://scikit-learn.org/stable/.[6] Gensim. http://radimrehurek.com/gensim/.[7] WS Sarle. Neural Network FAQ. ftp://ftp.sas.com/pub/neural/FAQ.html,

1997.[8] Hsuan tien Lin and Chih-Jen Lin. A study on sigmoid kernels for svm

and the training of non-psd kernels by smo-type methods. Technicalreport, 2003.

[9] S. Sathiya Keerthi and Chih jen Lin. Asymptotic behaviors of supportvector machines with gaussian kernel.

[10] LibLinear. http://www.csie.ntu.edu.tw/∼cjlin/liblinear/.

APPENDIX

1) Graphs2) Table of results3) Algorithms

ALGORITHMS

Bag-of-WordsA dictionary was created from all the words in the trainset.The size of the dictionary was the dimention of the trainingspace, and each word represented one dimension. For eachdocument, the value of the document along a word-dimensionwas the frequency of that word in the document. So, mostdocuments had 0 in most of the dimensions, so that theywere represented with a sparse matrix in libSVM format forcomputational efficiency.Tf-idfIn this experiment, tf-idf assigned a weight to eachdimension for each document as follows: weight ofterm i in document j in a corpus of D documents isweightij = frequencyi,j × log2(D/documentFreqi)LDAThe module used was of Gensim(http://radimrehurek.com/gensim/models/ldamodel.html#id2):we refrain from giving a detailed explanation, because it’slengthy and LDA is well-understood. We extracted 500 topicmodels and each document was projected to this 500-D space.C-SVCGiven training vectors xi ∈ Rn, i = 1, . . . , l and indicatory ∈ Rl relates to the optimization problem:

minw,b,L

1

2wTw + C

l∑i=1

Li

subject to:

yi(wTφ(xi) + b) ≥ 1− Li, Li ≥ 0, i = 1, . . . , l

ν-SVC

minw,b,L,ρ

1

2wTw − νρ+ 1

l

l∑i=1

Li

subject to:

yi(wTφ(xi) + b) ≥ ρ− Li, Li ≥ 0, i = 1, . . . , l, ρ ≥ 0

ε and ν-SVRPlease refer to [?].Naive-Bayes Classification

Both defined in detail: http://scikit-learn.org/stable/modules/naive bayes.htmlAccuracyAccuracy = no of orrectly predicted data

Total test data × 100RegressionMean squared error = 1/l

∑li=1(f(xi)− yi)2

Sqaured correlation coefficient:

r2 =(l∑li=1 f(xi)yi −

∑li=1 f(xi)

∑li=1 yi)

2

(l∑li=1 f(xi)

2 − (∑li=1 f(xi))

2)(l∑li=1 y

2i − (

∑li=1 yi)

2

FIGURES

Page 5: Cs221 Report

Fig. 1. Time taken by classifiers arranged according to the three feature sets.

Fig. 2. Time taken in regression arranged according to the three feature sets.

Page 6: Cs221 Report

Fig. 3. Accuracy of classifiers arranged according to the three feature sets.

Fig. 4. Correlation coefficient for regression arranged according to the three feature sets.

Page 7: Cs221 Report

Fig. 5. Accuracy of classifiers for scaled and unscaled versions.

Fig. 6. Correlation coefficient of regression for scaled and unscaled versions.

Page 8: Cs221 Report

Fig. 7. Accuracy of C-SVC and ν-SVC classifiers.

Fig. 8. Accuracy of ε-SVC and ν-SVR regression.

Page 9: Cs221 Report

Fig. 9. Accuracy of classifiers for different kernels versions.

Fig. 10. Correlation coefficient of regression for different kernels.

Page 10: Cs221 Report

Fig. 11. Naive-Bayes algorithm run results.

Page 11: Cs221 Report

CS 221 Sentiment Analysis Results

Classification

DataScaledData?

Algorithm Kernel TypePredictionAccuracy

MeanSquared Error

SquaredCorrelationCoefficient

Duration

1 - Two Class 1 - Raw 1 - Unscaled 1 - Multinomial Naïve-Bayes NA 81.360% NA NA Unavailable1 - Two Class 1 - Raw 1 - Unscaled 2 - Bernoulli Naïve-Bayes NA 83.010% NA NA Unavailable1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 0 84.500% NA NA 0:11:221 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 1 50.004% NA NA 0:29:471 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 2 73.280% NA NA 0:30:291 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 3 67.900% NA NA 0:30:121 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 0 87.152% NA NA 0:17:481 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 1 52.852% NA NA 0:16:041 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 2 87.084% NA NA 0:19:441 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 3 85.788% NA NA 0:21:041 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 0 85.480% NA NA 0:13:511 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 1 50.000% NA NA 0:42:231 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 2 70.896% NA NA 0:44:021 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 3 70.568% NA NA 0:32:011 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 0 88.352% NA NA 0:20:171 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 1 50.000% NA NA 0:24:391 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 2 88.256% NA NA 0:26:201 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 3 84.412% NA NA 0:21:021 - Two Class 2 - TFIDF 1 - Unscaled 1 - Multinomial Naïve-Bayes NA 77.100% NA NA Unavailable1 - Two Class 2 - TFIDF 1 - Unscaled 2 - Bernoulli Naïve-Bayes NA 83.010% NA NA Unavailable1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 0 84.856% NA NA 0:12:091 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 1 50.008% NA NA 0:32:101 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 2 87.668% NA NA 0:25:421 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 3 86.688% NA NA 0:26:241 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 0 88.632% NA NA 0:21:021 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 1 56.508% NA NA 0:29:34

Page 12: Cs221 Report

1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 2 88.524% NA NA 0:22:471 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 3 88.588% NA NA 0:23:311 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 0 85.600% NA NA 0:14:031 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 1 50.000% NA NA 0:34:241 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 2 50.000% NA NA 0:36:131 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 3 50.000% NA NA 0:33:111 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 0 88.400% NA NA 0:21:561 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 1 50.008% NA NA 0:16:281 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 2 88.312% NA NA 0:21:141 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 3 85.636% NA NA 0:34:061 - Two Class 3 - LDA 1 - Unscaled 1 - Multinomial Naïve-Bayes NA 66.300% NA NA Unavailable1 - Two Class 3 - LDA 1 - Unscaled 2 - Bernoulli Naïve-Bayes NA 68.320% NA NA Unavailable1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 0 66.133% NA NA 0:05:471 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 1 50.472% NA NA 0:06:531 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 2 51.248% NA NA 0:08:021 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 3 51.240% NA NA 0:08:031 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 0 62.837% NA NA 0:03:541 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 1 50.224% NA NA 0:03:221 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 2 54.152% NA NA 0:04:241 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 3 60.881% NA NA 0:04:081 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 0 67.860% NA NA 0:05:561 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 1 50.124% NA NA 0:07:331 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 2 53.768% NA NA 0:08:411 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 3 53.744% NA NA 0:08:131 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 0 62.744% NA NA 0:04:131 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 1 50.040% NA NA 0:03:451 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 2 58.064% NA NA 0:04:531 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 3 62.776% NA NA 0:05:142 - Ten Class 1 - Raw 1 - Unscaled 1 - Multinomial Naïve-Bayes NA 38.460% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 2 - Bernoulli Naïve-Bayes NA 38.760% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 0 35.496% NA NA Unavailable

Page 13: Cs221 Report

2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 1 20.088% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 2 25.996% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 3 21.960% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 0 39.816% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 1 20.804% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 2 39.688% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 3 38.096% NA NA Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 0 NA 15.625 0.274 4:35:332 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 1 NA 12.185 0.010 0:16:292 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 3 NA 11.303 0.191 0:17:082 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 0 NA 15.185 0.280 Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.687 0.002 Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 2 NA 10.312 0.184 Unavailable2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 3 NA 11.105 0.129 Unavailable2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 0 36.732% NA NA Unavailable2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 1 20.088% NA NA Unavailable2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 2 20.088% NA NA Unavailable2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 3 20.088% NA NA Unavailable2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 0 40.484% NA NA Unavailable2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 1 9.376% NA NA Unavailable2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 2 36.500% NA NA Unavailable2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 3 31.220% NA NA Unavailable2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 0 NA 10.171 0.370 0:51:142 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:16:272 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 2 NA 12.167 0.412 0:56:422 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 3 NA 12.177 0.414 0:16:422 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 0 NA 9.949 0.376 0:39:062 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 1 NA 12.186 -0.000 0:35:102 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 2 NA 12.148 0.359 0:32:492 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 3 NA 12.167 0.359 0:33:212 - Ten Class 2 - TFIDF 1 - Unscaled 1 - Multinomial Naïve-Bayes NA 31.520% NA NA Unavailable

Page 14: Cs221 Report

2 - Ten Class 2 - TFIDF 1 - Unscaled 2 - Bernoulli Naïve-Bayes NA 38.760% NA NA Unavailable2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 0 36.020% NA NA 0:30:072 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 1 20.088% NA NA 0:42:262 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 2 37.720% NA NA 0:40:242 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 3 37.096% NA NA 0:40:322 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 0 40.808% NA NA 0:38:142 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 1 20.780% NA NA 0:40:272 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 2 40.928% NA NA 0:41:122 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 3 40.816% NA NA 0:40:092 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 0 NA 28.811 0.180 1:30:372 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 1 NA 12.186 0.003 0:20:002 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 2 NA 7.979 0.509 0:19:562 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 3 NA 9.092 0.453 0:25:292 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 0 NA 26.983 0.190 0:49:052 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.285 0.001 0:32:202 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 2 NA 6.769 0.509 0:34:052 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 3 NA 7.902 0.446 0:35:222 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 0 36.768% NA NA 0:47:422 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 1 20.088% NA NA 0:53:072 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 2 20.088% NA NA 0:57:092 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 3 20.088% NA NA 0:59:432 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 0 40.492% NA NA 0:54:542 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 1 9.668% NA NA 0:27:472 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 2 35.812% NA NA 0:41:452 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 3 31.724% NA NA 0:36:202 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 0 NA 9.953 0.377 1:02:482 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:22:072 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 2 NA 12.167 0.413 0:21:512 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 3 NA 12.176 0.414 0:21:112 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 0 NA 9.747 0.383 0:40:272 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 1 NA 14.110 0.000 0:37:38

Page 15: Cs221 Report

2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 2 NA 13.960 0.359 0:46:162 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 3 NA 14.034 0.359 0:43:542 - Ten Class 3 - LDA 1 - Unscaled 1 - Multinomial Naïve-Bayes NA 26.320% NA NA Unavailable2 - Ten Class 3 - LDA 1 - Unscaled 2 - Bernoulli Naïve-Bayes NA 29.250% NA NA Unavailable2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 0 28.150% NA NA 0:08:552 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 1 20.082% NA NA 0:10:082 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 2 20.082% NA NA 0:11:042 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 3 20.082% NA NA 0:10:382 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 0 24.442% NA NA 0:07:392 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 1 10.953% NA NA 0:05:332 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 2 24.686% NA NA 0:08:472 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 3 23.622% NA NA 0:08:572 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 0 NA 10.617 0.157 0:04:132 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 1 NA 12.185 0.000 0:04:182 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 2 NA 12.124 0.053 0:05:012 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 3 NA 12.155 0.053 0:11:272 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 0 NA 10.835 0.128 0:07:022 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.185 0.000 0:07:302 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 2 NA 12.081 0.041 0:09:132 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 3 NA 12.119 0.041 0:09:442 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 0 29.236% NA NA 0:11:242 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 1 20.088% NA NA 0:09:282 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 2 20.088% NA NA 0:12:092 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 3 20.088% NA NA 0:11:012 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 0 25.004% NA NA 0:10:562 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 1 16.352% NA NA 0:06:282 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 2 22.168% NA NA 0:10:062 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 3 23.992% NA NA 0:08:032 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 0 NA 10.169 0.182 0:04:492 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:04:412 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 2 NA 12.058 0.078 0:05:10

Page 16: Cs221 Report

2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 3 NA 12.121 0.077 0:08:302 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 0 NA 10.313 0.171 0:07:592 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 1 NA 12.186 0.000 0:08:022 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 2 NA 11.993 0.059 0:08:532 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 3 NA 12.060 0.059 0:09:08


Top Related