cs221 report
Embed Size (px)
TRANSCRIPT

Sentiment Analysis of Movie Reviews: A Study ofFeatures and Classifiers
Siddharth Jain and Sushobhan NayakCS221: Stanford University
sjain2, snayak @ stanford.edu
AbstractWe love movies, and in this project we experimenton a sentiment analysis task on movie reviews. Ourobjective is twofold: 1) the binary sentiment classification of a large dataset of movie reviews from IMDB,2) predicting the criticassigned rating of the moviefrom the review. We extract bagofwords and tfidfand LDAbased language features from the documentsto gauge the saliency of different words and sentencestructures for the task. We then experiment withdifferent learning algorithms like Naive Bayes, anddifferent flavors of SVM with different kernels, toclassify our documents which helps us compare theimportance and use of different textual features as wellas the capability of the standard learning algorithmsin such a task. We present a detailed analysis of theeffects of the myriads of features and classifiers wehave considered and support them with a battery ofexperiments on a massive dataset.
I. INTRODUCTION
In this project, we investigate the problem of gauging thesentiment of a movie reviewer from the review, and predicthow the reviewer rated the movie. Sentiment analysis tasksare predominantly useful in all rating games, whether theybe shopping/restaurant suggestions or entertainment venuesuggestions: add to that the increased importance of gauginghow a person feels from their social network posts, and youhave got a fairly complex and relevant problem at hand. Whilemany previous works try to tackle this issue, our work here isin line of [1] and [2], and we use the later dataset.
The IMDB dataset provided by Andrew Maas[3] contains50,000 reviews split evenly into 25k train and 25k test sets.The overall distribution of labels is balanced (25k pos and25k neg, with equal numbers in both the train and test set).In the labeled train/test sets, a negative review has a score= 7out of 10. Thus reviews with more neutral ratings are notincluded in the train/test sets. Our task here is twofold: 1) toclassify the documents in the test set as pos or neg , and 2)to properly try to predict the score of the documents in thetest set, after training our model. This task helps us strengthenour concepts in two ways: Firstly, it helps us investigate anddifferent aspects of text processing, especially the relativeprominence of features like bagofwords (hereafter, BOW),term frequencyinverse document frequency(tfidf) and latentDirichlet allocation(LDA)based features. Secondly, by lettingus play with some standard classifiers, it helps us internalizethe concepts of parameter selection and comparative perfor
mance. Furthermore, it lets us test the assumptions derivedfrom the theory.
II. TASK DEFINITION AND METHODS
The dataset has been described in the previous section. Inthe entire collection, no more than 30 reviews are allowedfor any given movie because reviews for the same movietend to have correlated ratings. Further, the train and test setscontain a disjoint set of movies, so no significant performanceis obtained by memorizing movieunique terms and theirassociated with observed labels. We represent each documentin the dataset as a feature vector, with the features beingeither raw bagofwords, tfidf or LDA topics (as these arestandard document features used widely in NLP, we refrainfrom providing an explanation for each of them due to spaceconstraints details can be found in the appendix), whichhelps us compare between the three for different settings.For the classification task, the baseline algorithm predictsthe most common label: please note that we are going torun two sets of classification tasks: a binary classifier forsentiment analysis, and a multiclass classifier(and regresser)for score prediction. The classification problem is handled withnaiveBayes classification and SVMs (also, given that they arestandard, explained in appendix). We also increase the stakesby investigating different forms of both the algorithms, througha variation in kernels, and types of the classifiers  so, ineffect, instead of comparing just two complementary methods,we compare a set of possible methods in a systematic way.While naiveBayes gives us a simplistic and fast algorithmfor classification, SVM represents a complex time consumingalgorithm thats expected to provide higher accuracy twoaspects we investigate below.
III. EXPERIMENT
Each document was converted to a bagofwords representation, using sparse matrix libsvm format[4] for storage.These BOW representations were transformed into tfidf andLDAtopic vector representations(500 topics). We then trainedthe various classifiers and tested the prediction on the testdocuments. We used the following standard libraries: LibSVM[4], for SVM classifier. scikits.learn[5], for naivebayes classification Gensim[6], for document feature extraction and conver
sion from one feature to the otherThe comparisons made were the following: For the multiclass SVM classification, 4 types of SVM
were used, viz. two SVM classifiers, CSVC and SVC,

and for SVM regression, SVR and SVR. While theclassifiers would predict the score as one of the trainsocres, to wit 14/710 (reviews with score 5/6 absentfrom dataset because they are neutral, hence effectively a8class classification task, though we allude to it as a 10class classification sometimes in the spirit of score rangeof 110), the regression values will be free to be realnumbers, that are supposed to get as close as possible tothe expected integer score. We do detailed error analysisin the following section, and the different types of SVMsare detailed in the appendix. For the binary sentimentanalysis task, as is expected, only the first two classifierswere used.
For each type of SVM, 4 types of kernels were used,linear(u v), polynomial(( u v + coef0)degree),radial basis function(RBF): exp( u v2) andsigmoid(tanh( u v + coef0)), denoted as kerneltype 04.
Scaling: The objective was to see how scaling wouldaffect the SVM classification
For NaiveBayes classification, we used two classifiers:Multinomial NaiveBayes, a classic Naive Bayes variantused in text classification, which usually takes data represented as word vector counts, although tfidf vectorsare also known to work well in practice (and also inthis present experiment), and Bernoulli Naive Bayes,which requires samples to be represented as binaryvaluedfeature vectors, i.e. 1 for the word that is present in thedocument and 0 otherwise, irrespective of the frequency.(Details in appendix.)
Performance of the three feature representations in all theabove scenarios was analysed.
We ended up testing 155 combinations of these variables,the details of which have been included in a long table at theend of this report. Before we move on to the analysis, its to benoted that, for the multiclass classification task, the baselineaccuracy is 20.088% when the most occurring label(score1) is predicted. For the binary sentiment classification, sincewe have equal number of instances from both classes, thebase case is at chance, i.e. 50%. From the table, even basicclassifiers like NaiveBayes work pretty well in these cases,with around 80% accuracy with BOW and tfidf and 60%with LDA(Fig. 11). Prediction due to SVM varies from thebase case of 50% to a maximum of 88.632%. For the 8classclassification, accuracy ranges from base case of 20.088% to41%, while the regression results show a meansquared errorin the range of 6.8 to 28.8 and squared correlation coefficientof 0 to 0.509 (both defined in appendix). In general, for thebinary classification task, the SVM classification, except withthe polynomial kernel, gives better prediction than NaiveBayes, though it takes a fairly long time, as is expected. SVMshows a high error rate for multiclass classification, primarilybecause of the unbalanced number of training examples fordifferent classes. Reviews with scores 10 and 1 are the highestin the training set, and as such, this classification is heavilybiased towards them, so much so that for the default valuefor libSVM, with unscaled raw BOW data, the test documentsare completely classified into either 1 or 10. In the following
section, we present a detailed analysis of the results and theinsights gained.
IV. ANALYSIS
A. Time Complexity
While naiveBayes classification ran in seconds (exactnumber unspecified, but always within halfa minute), SVMruns took minutes, the minimum being 3min and maximumbeing 4 hours 35 mins. This is expected because the presentimplementation of naiveBayes is just a relative frequencyestimator, whereas the version of SVM implemented has tomap a 89,527D feature space to a higher dimension throughkernel transformation and then run optimization in that space.When time complexity of features is concerned, tfidf and rawBOW have comparable running times, one being higher thanthe other at different times and viceversa; however, LDA takesmuch less time than the previous two, 5 to 6 times being theusual reduction, which is expected because LDA topic modelin the present experiment has only 500 features(Fig. 1, 2).As implemented, libsvm is polynomial order, while due tolinear nature of the dataset, without kernel transformations,classification can be done in linear time.
B. Featureeffects
Tfidf and raw BOW results are comparable, while, incomparison, LDA results show around a 1015% drop (Fig.3, 4). This is expected because the former two boast of a90,000D feature space, while the later is in 500D. Ingeneral, tfidf is expected to fare better than raw BOW;however, though such a trend is there, its not too pronounced,with only

regular features were scaled to have values in [0,1], asopposed to the more regular practice of [1,1] due to thesparse nature of the data: [0,1] scaling helps keep missingfeature values to 0, which would otherwise be 1 in manycases, leading to extra computational load without any gainin classification performance(at least for a RBF kernel withoptimal parameters). While scaling is expected to improveclassification results, in our experiments we didnt see apronounced effect between runs of scaled and unscaled data,with scaled data only being a marginally better predictor inregression and almost equally good in classifications (Figs.5, 6), just like the case with tfidf vs. BOW. This might bebecause scaling is relevant when the orders of magnitudes ofdifferent features is unknown, and in that case, scaling preventsfeatures at the higher end of the measurement spectrum tobe unfairly given larger weights. In our case however, allthe feature dimensions are wordfrequencies, and they arecomparable not some artefact of the measuring process, sothat there is minimal bias in that sense for scaling to eliminate.However, scaling is also helpful in the computation process,and that explains the little bit of boost.
D. Effect of types of SVM
We have used 4 types of SVMs, C and SVC forclassification and  and SVR for regression. The C and versions are similar, with C [0,), but [0, 1],with being preferred because it is related to the ratio ofsupport vectors and the ratio of the training error, and isintuitively and asthetically pleasing, though both the SVMsare otherwise equivalent and expected to give similar results,which is evident from the graph. However, SVC also showsslightly more prominent results than its counterpart at times,especially towards the end of 2 and 10class classificationpart of the graph (Fig. 7). It might be because that thedefault parameter(=0.5) conforms to our dataset more thanthe default value of C(=1) does. Optimal parameter selection isexpected to provide comparable results. In fact, as we describein the next item, the optimal C was found to be 512, muchmuch larger than the default parameter, which might explainthe poor results in comparison. The SVRs also have a similarrelationship, and their results are indeed comparable (Fig. 8).
E. Parameter setting
While the default parameters in libSVM are good for general testing, each dataset requires these parameters to be tunedproperly for a better classification result. While our initial aimwas to explore this scenario, given the stupendous size of thedataset, the time constraints prevented that from happening.For example, the algorithm does a grid search to discover thebest C through a 5fold cross validation on the training data.While its manageable for datasets with a thousand instances,the running time runs into days for a 25,000instances strongdataset. One instance ran for two days without being ableto produce a result. We improvised and randomly selecteda subset of 5000 examples and ran the grid search, whichtook 6 hrs to produce C=512, which was then subsequentlyused in a binary sentiment classification using CSVM to geta 88.336% accuracy on scaled raw BOW with the RBF kernel,
an improvement of 17% over 71% without parameter tuning.Notice that a high C of 512 also increases the training time,so that over the 150 odd runs of the experiment, a largeC seriously undermines the time constraints of a smallclassproject. When faced with the choice of a larger C and smallernumber of runs and the default C and 150 runs, we chosethe lesser of the two evils so that we can get more insight intoother aspects of the classification task like feature selection,kernel manipulation etc., and have an idea of the effect ofparameter tuning from this one instance.
F. Effect of Kernels
We notice that, predominantly, the linear kernel gives thebest results, whereas the polynomial gives the worst. RBF andsigmoid kernels give comparable results, with the former giving slightly better results (Fig. 9, 10). This behavior is actuallyexpected: with polynomial kernels, numerical difficulties tendto happen, for dth power of numbers go to 0 or dependingwhether they are >< 1. the kernel matrix for sigmoid is notalways positive definite in general, therefore, its accuracy, ingeneral, is less than that of the RBF[8]. The relation betweencase labels and attributes in text classification problems areprimarily linear, hence the results of the linear kernel also,for a large number of features, linear kernel is supremelyuseful, since a nonlinear mapping doesnt improve the performance, while takes up a sizable amount of time(linear kernelhas the fastest running time for the classification tasks inthis experiment). The RBF kernel, however, would be ableto produce comparable results with parameter tuning[9].
V. CONCLUSION
The series of experiments helped us gain fruitful insightinto the nuances of selecting features and decide on classifiersin a textclassification task. The results of the experimentconformed to theoretical predictions most of the time: whenthey seemingly didnt, they compelled us to look into the explanations, which were satisfactory and helped understand thetask better. The shear volume of the work a massive datasetand 150 odd runs helped us gain research experience andstrengthened our concepts. Though we used libSVM to playwith SVMs, given the linear nature of the problem, similarclassification task can be completed with libLinear[10], whichdoesnt use kernel transformation. In fact, libsvm is O(n2) orO(n3) whereas liblinear is O(n), though it does not supportkernel SVMs, so that investigating that would be a naturalextension. The next stage would be to follow a similaritytype grouping of documents for sentiment classification. Thedata set contains another 50000 reviews, which are unlabeled,and an unsupervised learning attempt on them might take thisattempt further.
REFERENCES[1] Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis
using subjectivity summarization based on minimum cuts. In InProceedings of the ACL, pages 271278, 2004.
[2] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentimentanalysis. In Proceedings of the 49th Annual Meeting of the Associationfor Computational Linguistics: Human Language Technologies  Volume1, HLT 11, pages 142150, Stroudsburg, PA, USA, 2011. Associationfor Computational Linguistics.

[3] Dataset. http://ai.stanford.edu/amaas/data/sentiment/index.html.[4] LibSVM. http://www.csie.ntu.edu.tw/cjlin/libsvm/.[5] Scikits.learn. http://scikitlearn.org/stable/.[6] Gensim. http://radimrehurek.com/gensim/.[7] WS Sarle. Neural Network FAQ. ftp://ftp.sas.com/pub/neural/FAQ.html,
1997.[8] Hsuan tien Lin and ChihJen Lin. A study on sigmoid kernels for svm
and the training of nonpsd kernels by smotype methods. Technicalreport, 2003.
[9] S. Sathiya Keerthi and Chih jen Lin. Asymptotic behaviors of supportvector machines with gaussian kernel.
[10] LibLinear. http://www.csie.ntu.edu.tw/cjlin/liblinear/.
APPENDIX
1) Graphs2) Table of results3) Algorithms
ALGORITHMS
BagofWordsA dictionary was created from all the words in the trainset.The size of the dictionary was the dimention of the trainingspace, and each word represented one dimension. For eachdocument, the value of the document along a worddimensionwas the frequency of that word in the document. So, mostdocuments had 0 in most of the dimensions, so that theywere represented with a sparse matrix in libSVM format forcomputational efficiency.TfidfIn this experiment, tfidf assigned a weight to eachdimension for each document as follows: weight ofterm i in document j in a corpus of D documents isweightij = frequencyi,j log2(D/documentFreqi)LDAThe module used was of Gensim(http://radimrehurek.com/gensim/models/ldamodel.html#id2):we refrain from giving a detailed explanation, because itslengthy and LDA is wellunderstood. We extracted 500 topicmodels and each document was projected to this 500D space.CSVCGiven training vectors xi Rn, i = 1, . . . , l and indicatory Rl relates to the optimization problem:
minw,b,L
1
2wTw + C
li=1
Li
subject to:
yi(wT(xi) + b) 1 Li, Li 0, i = 1, . . . , l
SVC
minw,b,L,
1
2wTw + 1
l
li=1
Li
subject to:
yi(wT(xi) + b) Li, Li 0, i = 1, . . . , l, 0
and SVRPlease refer to [?].NaiveBayes Classification
Both defined in detail: http://scikitlearn.org/stable/modules/naive bayes.htmlAccuracyAccuracy = no of orrectly predicted dataTotal test data 100RegressionMean squared error = 1/l
li=1(f(xi) yi)2
Sqaured correlation coefficient:
r2 =(lli=1 f(xi)yi
li=1 f(xi)
li=1 yi)
2
(lli=1 f(xi)
2 (li=1 f(xi))
2)(lli=1 y
2i (
li=1 yi)
2
FIGURES

Fig. 1. Time taken by classifiers arranged according to the three feature sets.
Fig. 2. Time taken in regression arranged according to the three feature sets.

Fig. 3. Accuracy of classifiers arranged according to the three feature sets.
Fig. 4. Correlation coefficient for regression arranged according to the three feature sets.

Fig. 5. Accuracy of classifiers for scaled and unscaled versions.
Fig. 6. Correlation coefficient of regression for scaled and unscaled versions.

Fig. 7. Accuracy of CSVC and SVC classifiers.
Fig. 8. Accuracy of SVC and SVR regression.

Fig. 9. Accuracy of classifiers for different kernels versions.
Fig. 10. Correlation coefficient of regression for different kernels.

Fig. 11. NaiveBayes algorithm run results.

CS 221 Sentiment Analysis Results
Classification
DataScaledData?
Algorithm Kernel TypePredictionAccuracy
MeanSquared Error
SquaredCorrelationCoefficient
Duration
1  Two Class 1  Raw 1  Unscaled 1  Multinomial NaveBayes NA 81.360% NA NA Unavailable1  Two Class 1  Raw 1  Unscaled 2  Bernoulli NaveBayes NA 83.010% NA NA Unavailable1  Two Class 1  Raw 1  Unscaled 3  CSVC 0 84.500% NA NA 0:11:221  Two Class 1  Raw 1  Unscaled 3  CSVC 1 50.004% NA NA 0:29:471  Two Class 1  Raw 1  Unscaled 3  CSVC 2 73.280% NA NA 0:30:291  Two Class 1  Raw 1  Unscaled 3  CSVC 3 67.900% NA NA 0:30:121  Two Class 1  Raw 1  Unscaled 4  nuSVC 0 87.152% NA NA 0:17:481  Two Class 1  Raw 1  Unscaled 4  nuSVC 1 52.852% NA NA 0:16:041  Two Class 1  Raw 1  Unscaled 4  nuSVC 2 87.084% NA NA 0:19:441  Two Class 1  Raw 1  Unscaled 4  nuSVC 3 85.788% NA NA 0:21:041  Two Class 1  Raw 2  Scaled 3  CSVC 0 85.480% NA NA 0:13:511  Two Class 1  Raw 2  Scaled 3  CSVC 1 50.000% NA NA 0:42:231  Two Class 1  Raw 2  Scaled 3  CSVC 2 70.896% NA NA 0:44:021  Two Class 1  Raw 2  Scaled 3  CSVC 3 70.568% NA NA 0:32:011  Two Class 1  Raw 2  Scaled 4  nuSVC 0 88.352% NA NA 0:20:171  Two Class 1  Raw 2  Scaled 4  nuSVC 1 50.000% NA NA 0:24:391  Two Class 1  Raw 2  Scaled 4  nuSVC 2 88.256% NA NA 0:26:201  Two Class 1  Raw 2  Scaled 4  nuSVC 3 84.412% NA NA 0:21:021  Two Class 2  TFIDF 1  Unscaled 1  Multinomial NaveBayes NA 77.100% NA NA Unavailable1  Two Class 2  TFIDF 1  Unscaled 2  Bernoulli NaveBayes NA 83.010% NA NA Unavailable1  Two Class 2  TFIDF 1  Unscaled 3  CSVC 0 84.856% NA NA 0:12:091  Two Class 2  TFIDF 1  Unscaled 3  CSVC 1 50.008% NA NA 0:32:101  Two Class 2  TFIDF 1  Unscaled 3  CSVC 2 87.668% NA NA 0:25:421  Two Class 2  TFIDF 1  Unscaled 3  CSVC 3 86.688% NA NA 0:26:241  Two Class 2  TFIDF 1  Unscaled 4  nuSVC 0 88.632% NA NA 0:21:021  Two Class 2  TFIDF 1  Unscaled 4  nuSVC 1 56.508% NA NA 0:29:34

1  Two Class 2  TFIDF 1  Unscaled 4  nuSVC 2 88.524% NA NA 0:22:471  Two Class 2  TFIDF 1  Unscaled 4  nuSVC 3 88.588% NA NA 0:23:311  Two Class 2  TFIDF 2  Scaled 3  CSVC 0 85.600% NA NA 0:14:031  Two Class 2  TFIDF 2  Scaled 3  CSVC 1 50.000% NA NA 0:34:241  Two Class 2  TFIDF 2  Scaled 3  CSVC 2 50.000% NA NA 0:36:131  Two Class 2  TFIDF 2  Scaled 3  CSVC 3 50.000% NA NA 0:33:111  Two Class 2  TFIDF 2  Scaled 4  nuSVC 0 88.400% NA NA 0:21:561  Two Class 2  TFIDF 2  Scaled 4  nuSVC 1 50.008% NA NA 0:16:281  Two Class 2  TFIDF 2  Scaled 4  nuSVC 2 88.312% NA NA 0:21:141  Two Class 2  TFIDF 2  Scaled 4  nuSVC 3 85.636% NA NA 0:34:061  Two Class 3  LDA 1  Unscaled 1  Multinomial NaveBayes NA 66.300% NA NA Unavailable1  Two Class 3  LDA 1  Unscaled 2  Bernoulli NaveBayes NA 68.320% NA NA Unavailable1  Two Class 3  LDA 1  Unscaled 3  CSVC 0 66.133% NA NA 0:05:471  Two Class 3  LDA 1  Unscaled 3  CSVC 1 50.472% NA NA 0:06:531  Two Class 3  LDA 1  Unscaled 3  CSVC 2 51.248% NA NA 0:08:021  Two Class 3  LDA 1  Unscaled 3  CSVC 3 51.240% NA NA 0:08:031  Two Class 3  LDA 1  Unscaled 4  nuSVC 0 62.837% NA NA 0:03:541  Two Class 3  LDA 1  Unscaled 4  nuSVC 1 50.224% NA NA 0:03:221  Two Class 3  LDA 1  Unscaled 4  nuSVC 2 54.152% NA NA 0:04:241  Two Class 3  LDA 1  Unscaled 4  nuSVC 3 60.881% NA NA 0:04:081  Two Class 3  LDA 2  Scaled 3  CSVC 0 67.860% NA NA 0:05:561  Two Class 3  LDA 2  Scaled 3  CSVC 1 50.124% NA NA 0:07:331  Two Class 3  LDA 2  Scaled 3  CSVC 2 53.768% NA NA 0:08:411  Two Class 3  LDA 2  Scaled 3  CSVC 3 53.744% NA NA 0:08:131  Two Class 3  LDA 2  Scaled 4  nuSVC 0 62.744% NA NA 0:04:131  Two Class 3  LDA 2  Scaled 4  nuSVC 1 50.040% NA NA 0:03:451  Two Class 3  LDA 2  Scaled 4  nuSVC 2 58.064% NA NA 0:04:531  Two Class 3  LDA 2  Scaled 4  nuSVC 3 62.776% NA NA 0:05:142  Ten Class 1  Raw 1  Unscaled 1  Multinomial NaveBayes NA 38.460% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 2  Bernoulli NaveBayes NA 38.760% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 3  CSVC 0 35.496% NA NA Unavailable

2  Ten Class 1  Raw 1  Unscaled 3  CSVC 1 20.088% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 3  CSVC 2 25.996% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 3  CSVC 3 21.960% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 4  nuSVC 0 39.816% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 4  nuSVC 1 20.804% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 4  nuSVC 2 39.688% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 4  nuSVC 3 38.096% NA NA Unavailable2  Ten Class 1  Raw 1  Unscaled 5  nuSVR 0 NA 15.625 0.274 4:35:332  Ten Class 1  Raw 1  Unscaled 5  nuSVR 1 NA 12.185 0.010 0:16:292  Ten Class 1  Raw 1  Unscaled 5  nuSVR 3 NA 11.303 0.191 0:17:082  Ten Class 1  Raw 1  Unscaled 6  EpsilonSVR 0 NA 15.185 0.280 Unavailable2  Ten Class 1  Raw 1  Unscaled 6  EpsilonSVR 1 NA 12.687 0.002 Unavailable2  Ten Class 1  Raw 1  Unscaled 6  EpsilonSVR 2 NA 10.312 0.184 Unavailable2  Ten Class 1  Raw 1  Unscaled 6  EpsilonSVR 3 NA 11.105 0.129 Unavailable2  Ten Class 1  Raw 2  Scaled 3  CSVC 0 36.732% NA NA Unavailable2  Ten Class 1  Raw 2  Scaled 3  CSVC 1 20.088% NA NA Unavailable2  Ten Class 1  Raw 2  Scaled 3  CSVC 2 20.088% NA NA Unavailable2  Ten Class 1  Raw 2  Scaled 3  CSVC 3 20.088% NA NA Unavailable2  Ten Class 1  Raw 2  Scaled 4  nuSVC 0 40.484% NA NA Unavailable2  Ten Class 1  Raw 2  Scaled 4  nuSVC 1 9.376% NA NA Unavailable2  Ten Class 1  Raw 2  Scaled 4  nuSVC 2 36.500% NA NA Unavailable2  Ten Class 1  Raw 2  Scaled 4  nuSVC 3 31.220% NA NA Unavailable2  Ten Class 1  Raw 2  Scaled 5  nuSVR 0 NA 10.171 0.370 0:51:142  Ten Class 1  Raw 2  Scaled 5  nuSVR 1 NA 12.186 0.000 0:16:272  Ten Class 1  Raw 2  Scaled 5  nuSVR 2 NA 12.167 0.412 0:56:422  Ten Class 1  Raw 2  Scaled 5  nuSVR 3 NA 12.177 0.414 0:16:422  Ten Class 1  Raw 2  Scaled 6  EpsilonSVR 0 NA 9.949 0.376 0:39:062  Ten Class 1  Raw 2  Scaled 6  EpsilonSVR 1 NA 12.186 0.000 0:35:102  Ten Class 1  Raw 2  Scaled 6  EpsilonSVR 2 NA 12.148 0.359 0:32:492  Ten Class 1  Raw 2  Scaled 6  EpsilonSVR 3 NA 12.167 0.359 0:33:212  Ten Class 2  TFIDF 1  Unscaled 1  Multinomial NaveBayes NA 31.520% NA NA Unavailable

2  Ten Class 2  TFIDF 1  Unscaled 2  Bernoulli NaveBayes NA 38.760% NA NA Unavailable2  Ten Class 2  TFIDF 1  Unscaled 3  CSVC 0 36.020% NA NA 0:30:072  Ten Class 2  TFIDF 1  Unscaled 3  CSVC 1 20.088% NA NA 0:42:262  Ten Class 2  TFIDF 1  Unscaled 3  CSVC 2 37.720% NA NA 0:40:242  Ten Class 2  TFIDF 1  Unscaled 3  CSVC 3 37.096% NA NA 0:40:322  Ten Class 2  TFIDF 1  Unscaled 4  nuSVC 0 40.808% NA NA 0:38:142  Ten Class 2  TFIDF 1  Unscaled 4  nuSVC 1 20.780% NA NA 0:40:272  Ten Class 2  TFIDF 1  Unscaled 4  nuSVC 2 40.928% NA NA 0:41:122  Ten Class 2  TFIDF 1  Unscaled 4  nuSVC 3 40.816% NA NA 0:40:092  Ten Class 2  TFIDF 1  Unscaled 5  nuSVR 0 NA 28.811 0.180 1:30:372  Ten Class 2  TFIDF 1  Unscaled 5  nuSVR 1 NA 12.186 0.003 0:20:002  Ten Class 2  TFIDF 1  Unscaled 5  nuSVR 2 NA 7.979 0.509 0:19:562  Ten Class 2  TFIDF 1  Unscaled 5  nuSVR 3 NA 9.092 0.453 0:25:292  Ten Class 2  TFIDF 1  Unscaled 6  EpsilonSVR 0 NA 26.983 0.190 0:49:052  Ten Class 2  TFIDF 1  Unscaled 6  EpsilonSVR 1 NA 12.285 0.001 0:32:202  Ten Class 2  TFIDF 1  Unscaled 6  EpsilonSVR 2 NA 6.769 0.509 0:34:052  Ten Class 2  TFIDF 1  Unscaled 6  EpsilonSVR 3 NA 7.902 0.446 0:35:222  Ten Class 2  TFIDF 2  Scaled 3  CSVC 0 36.768% NA NA 0:47:422  Ten Class 2  TFIDF 2  Scaled 3  CSVC 1 20.088% NA NA 0:53:072  Ten Class 2  TFIDF 2  Scaled 3  CSVC 2 20.088% NA NA 0:57:092  Ten Class 2  TFIDF 2  Scaled 3  CSVC 3 20.088% NA NA 0:59:432  Ten Class 2  TFIDF 2  Scaled 4  nuSVC 0 40.492% NA NA 0:54:542  Ten Class 2  TFIDF 2  Scaled 4  nuSVC 1 9.668% NA NA 0:27:472  Ten Class 2  TFIDF 2  Scaled 4  nuSVC 2 35.812% NA NA 0:41:452  Ten Class 2  TFIDF 2  Scaled 4  nuSVC 3 31.724% NA NA 0:36:202  Ten Class 2  TFIDF 2  Scaled 5  nuSVR 0 NA 9.953 0.377 1:02:482  Ten Class 2  TFIDF 2  Scaled 5  nuSVR 1 NA 12.186 0.000 0:22:072  Ten Class 2  TFIDF 2  Scaled 5  nuSVR 2 NA 12.167 0.413 0:21:512  Ten Class 2  TFIDF 2  Scaled 5  nuSVR 3 NA 12.176 0.414 0:21:112  Ten Class 2  TFIDF 2  Scaled 6  EpsilonSVR 0 NA 9.747 0.383 0:40:272  Ten Class 2  TFIDF 2  Scaled 6  EpsilonSVR 1 NA 14.110 0.000 0:37:38

2  Ten Class 2  TFIDF 2  Scaled 6  EpsilonSVR 2 NA 13.960 0.359 0:46:162  Ten Class 2  TFIDF 2  Scaled 6  EpsilonSVR 3 NA 14.034 0.359 0:43:542  Ten Class 3  LDA 1  Unscaled 1  Multinomial NaveBayes NA 26.320% NA NA Unavailable2  Ten Class 3  LDA 1  Unscaled 2  Bernoulli NaveBayes NA 29.250% NA NA Unavailable2  Ten Class 3  LDA 1  Unscaled 3  CSVC 0 28.150% NA NA 0:08:552  Ten Class 3  LDA 1  Unscaled 3  CSVC 1 20.082% NA NA 0:10:082  Ten Class 3  LDA 1  Unscaled 3  CSVC 2 20.082% NA NA 0:11:042  Ten Class 3  LDA 1  Unscaled 3  CSVC 3 20.082% NA NA 0:10:382  Ten Class 3  LDA 1  Unscaled 4  nuSVC 0 24.442% NA NA 0:07:392  Ten Class 3  LDA 1  Unscaled 4  nuSVC 1 10.953% NA NA 0:05:332  Ten Class 3  LDA 1  Unscaled 4  nuSVC 2 24.686% NA NA 0:08:472  Ten Class 3  LDA 1  Unscaled 4  nuSVC 3 23.622% NA NA 0:08:572  Ten Class 3  LDA 1  Unscaled 5  nuSVR 0 NA 10.617 0.157 0:04:132  Ten Class 3  LDA 1  Unscaled 5  nuSVR 1 NA 12.185 0.000 0:04:182  Ten Class 3  LDA 1  Unscaled 5  nuSVR 2 NA 12.124 0.053 0:05:012  Ten Class 3  LDA 1  Unscaled 5  nuSVR 3 NA 12.155 0.053 0:11:272  Ten Class 3  LDA 1  Unscaled 6  EpsilonSVR 0 NA 10.835 0.128 0:07:022  Ten Class 3  LDA 1  Unscaled 6  EpsilonSVR 1 NA 12.185 0.000 0:07:302  Ten Class 3  LDA 1  Unscaled 6  EpsilonSVR 2 NA 12.081 0.041 0:09:132  Ten Class 3  LDA 1  Unscaled 6  EpsilonSVR 3 NA 12.119 0.041 0:09:442  Ten Class 3  LDA 2  Scaled 3  CSVC 0 29.236% NA NA 0:11:242  Ten Class 3  LDA 2  Scaled 3  CSVC 1 20.088% NA NA 0:09:282  Ten Class 3  LDA 2  Scaled 3  CSVC 2 20.088% NA NA 0:12:092  Ten Class 3  LDA 2  Scaled 3  CSVC 3 20.088% NA NA 0:11:012  Ten Class 3  LDA 2  Scaled 4  nuSVC 0 25.004% NA NA 0:10:562  Ten Class 3  LDA 2  Scaled 4  nuSVC 1 16.352% NA NA 0:06:282  Ten Class 3  LDA 2  Scaled 4  nuSVC 2 22.168% NA NA 0:10:062  Ten Class 3  LDA 2  Scaled 4  nuSVC 3 23.992% NA NA 0:08:032  Ten Class 3  LDA 2  Scaled 5  nuSVR 0 NA 10.169 0.182 0:04:492  Ten Class 3  LDA 2  Scaled 5  nuSVR 1 NA 12.186 0.000 0:04:412  Ten Class 3  LDA 2  Scaled 5  nuSVR 2 NA 12.058 0.078 0:05:10

2  Ten Class 3  LDA 2  Scaled 5  nuSVR 3 NA 12.121 0.077 0:08:302  Ten Class 3  LDA 2  Scaled 6  EpsilonSVR 0 NA 10.313 0.171 0:07:592  Ten Class 3  LDA 2  Scaled 6  EpsilonSVR 1 NA 12.186 0.000 0:08:022  Ten Class 3  LDA 2  Scaled 6  EpsilonSVR 2 NA 11.993 0.059 0:08:532  Ten Class 3  LDA 2  Scaled 6  EpsilonSVR 3 NA 12.060 0.059 0:09:08