centroid selection approaches for k-means …seeepedia.org › wp-content › uploads › 2018 ›...

8
SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018 ALTERNATE ENERGY TECHNOLOGIES 978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment P. Masethungh, [email protected] Prof. S. Kumaresan, [email protected] Centroid Selection Approaches for K-Means Clustering based Recommender Systems P. Masethungh, S. Kumaresan, Government College of Technology, Coimbatore, India. [email protected] , [email protected] Abstract— Recommender system suggests products to users based on the preference of similar users. It is used in e- commerce system and helps to match users with items. If any active user searches an item, the system recommends products based on the items searched by other users who have similar interest, hence it avoids information overloading problem. Over the past few years, recommender system suffers from scalability and sparsity issues. Different traditional methods had been employed but it does not solve such issues effectively. This paper addresses these issues by using K-Means clustering algorithms. Different centroid selection algorithms had been studied to determine the performance of K-means algorithm. The proposed algorithm uses model based collaborative filtering with offline clustering method. This makes searching easier and reduces the latency of search. Hence, the drawbacks of memory based collaborative filtering of recommender systems had been overcomed. In this paper, the implementation of model based collaborative filtering uses various types of K-Means or Clustering algorithms. Index Terms— Data Mining, Clustering, Content-Based Filtering, Collaborative-Based Filtering, Recommender System, Memory- Based Recommender System, Model-Based Recommender System. I. INTRODUCTION A. DATA MINING Data mining also known as knowledge discovery in the database has been recognized as a new area for database. It is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. Analyze the data from many alternative dimensions to allow users, categorize it, and summarize the relationships known. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. B. CLUSTERING Clustering is a process of partitioning or divided a set of data into a set of meaningful classes or sub-classes. It is called as clusters. It helps users understand the grouping or structure in a data set. It can be used a stand-alone tool to get data distribution or preprocessing steps involve for other algorithms. Cluster models are a lot of clustering methods can be applied to a dataset in order to partitioning the data or information. Algorithm will always depends on the characteristics of the data set. Type of clustering 1. Centroid-Based Clustering 2. Distributed-Based Clustering 3. Connectivity-Based Clustering 4. Density-Based Clustering 274

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Centroid Selection Approaches for K-Means …seeepedia.org › wp-content › uploads › 2018 › 06 › S18_05_56.pdfCentroid selection in k- means based recommender systems can

SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018 ALTERNATE ENERGY TECHNOLOGIES

978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment

P. Masethungh, [email protected] Prof. S. Kumaresan, [email protected]

Centroid Selection Approaches for K-Means

Clustering based Recommender Systems

P. Masethungh, S. Kumaresan, Government College of Technology, Coimbatore, India. [email protected], [email protected]

Abstract— Recommender system suggests products to users based on the preference of similar users. It is used in e-commerce system and helps to match users with items. If any active user searches an item, the system recommends products based on the items searched by other users who have similar interest, hence it avoids information overloading problem. Over the past few years, recommender system suffers from scalability and sparsity issues. Different traditional methods had been employed but it does not solve such issues effectively. This paper addresses these issues by using K-Means clustering algorithms. Different centroid selection algorithms had been studied to determine the performance of K-means algorithm. The proposed algorithm uses model based collaborative filtering with offline clustering method. This makes searching easier and reduces the latency of search. Hence, the drawbacks of memory based collaborative filtering of recommender systems had been overcomed. In this paper, the implementation of model based collaborative filtering uses various types of K-Means or Clustering algorithms.

Index Terms— Data Mining, Clustering, Content-Based Filtering, Collaborative-Based Filtering, Recommender System, Memory-Based Recommender System, Model-Based Recommender System.

I. INTRODUCTION

A. DATA MINING

Data mining also known as knowledge discovery in the database has been recognized as a new area for database. It is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. Analyze the data from many alternative dimensions to allow users, categorize it, and summarize the relationships known. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

B. CLUSTERING

Clustering is a process of partitioning or divided a set of data into a set of meaningful classes or sub-classes. It is called as clusters. It helps users understand the grouping or structure in a data set. It can be used a stand-alone tool to get data distribution or preprocessing steps involve for other algorithms. Cluster models are a lot of clustering methods can be applied to a dataset in order to partitioning the data or information. Algorithm will always depends on the characteristics of the data set.

Type of clustering 1. Centroid-Based Clustering 2. Distributed-Based Clustering 3. Connectivity-Based Clustering 4. Density-Based Clustering

274

Page 2: Centroid Selection Approaches for K-Means …seeepedia.org › wp-content › uploads › 2018 › 06 › S18_05_56.pdfCentroid selection in k- means based recommender systems can

SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018

978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment

C. RECOMMENDER SYSTEM

A RECOMMENDER SYSTEM IS A SYSTEM PERFORMING INFORMATION FILTERING TO BRING INFORMATION ITEMS SUCH AS MOVIES, MUSIC, NEWS, IMAGES, WEB PAGES, TOOLS TO USER [2]. IF A FRESH USER WANTS TO SEARCH SOME ITEMS WITHOUT ANY KNOWLEDGE, THEN IT WILL CONSUME MUCH TIME. SO WE WILL MOVE ON TO THE RECOMMENDER SYSTEMS. THE NEED FOR RECOMMENDATIONS FROM TRUSTED SOURCES IS TRIGGERED WHEN IT IS NOT POSSIBLE TO MAKE CHOICES WITH INSUFFICIENT PERSONAL EXPERIENCE OF A PARTICULAR[1][2].

RECOMMENDER SYSTEM TYPES There are two types of Recommender systems, 1. Content Based Filtering Recommender System 2. Collaborative Based Filtering Recommender System

1. CONTENT BASED FILTERING RECOMMENDER SYSTEM

Content-based filtering approach uses textual (content) features of items in order to make recommendations. These approaches train machine learning classifiers over user’s and item’s profiles. The content based approach consists in analyzing the content of the items being recommended. Compute similarity between items or users. Query items that are similar to a given item .Match item’s content and user’s profile. Fairly easy for text. Difficult for music and video, except for digital signals. A lot of noise, e.g. Misplaced tags, attacks, etc.., it is a time consuming process.[1][2]

2. COLLABORATIVE BASED FILTERING RECOMMENDER SYSTEM Collaborative filtering is mainly used for trying to

predict the opinion the user will have on the different items and be able to recommend the Best items to the each user based on the user’s previous likings and the opinion of the other likeminded users[1]. In the cooperative filtering, the similarities between completely different things within the dataset square measure calculated by exploitation one among variety of similarity measures, and so these

similarity values square measure accustomed predict ratings for user-item pairs not gift within the dataset.[1][2][3] The similarity values between things square measure measured by observant all the users who have rated each the things.

TYPES OF COLLABORATIVE FILTERING

There are two types of Collaborative Filtering Recommender Systems.

1. Model-Based Collaborative Filtering 2. Memory-Based Collaborative Filtering

1. MODEL-BASED COLLABORATIVE FILTERING

Model-based approaches initially train a model based on training data and then makes a prediction for real data. Usually, these models are based on clustering or classification techniques and are used to find a pattern from the training set. Input the user database to estimate or learn a model of user ratings, then run new data through the model to get a predicted output. First, develop a model for user ratings. Computing the expected value of a user prediction, given his/her ratings on the other items. Static structure. In dynamic domains, the model could soon become inaccurate.

2. MEMORY-BASED COLLABORATIVE FILTERING

Memory-based approaches make predictions by taking into account the active user’s rating data. All the ratings provided by the users are kept in memory and used for predictions[2]. To compute the similarity between items/users all the previously rated items are considered. Input the user database to estimate or learn a model of user ratings, then run new data through the model to get a predicted output. First, develop a model for user ratings. Computing the expected value of a user prediction, given his/ her ratings on the other items. Static structure. In dynamic domains, the model could soon become inaccurate.

II. PROBLEM STATEMENT

2.1. EXISTING APPROACH

The existing approach of selecting centroid in K means Algorithm has limitations like Euclidean distance measures can unequally weight underlying factors. Randomly choosing of the cluster centre cannot lead us to the fruitful result. Applicable only when mean is defined i.e. fails for categorical data. Unable to handle noisy data and outlier’s. Algorithm fails for non-linear data set.

275

Page 3: Centroid Selection Approaches for K-Means …seeepedia.org › wp-content › uploads › 2018 › 06 › S18_05_56.pdfCentroid selection in k- means based recommender systems can

SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018

978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment

2.2. PROBLEM STATEMENT

In the types recommender systems, we use collaborative filtering, not the content based filtering because the content based filtering approach uses textual (content) features of items in order to make recommendations. But the collaborative filtering approach is considered to be the most popular approach for recommendation systems. The collaborative filtering takes into the account the interest of similar users, under the assumptions that the active users will be interested in items that users similar to them have rated highly. The major problems in K-Means clustering algorithms are Accuracy, Scalability cluster quality coverage, Robustness to sparsity, Cold start problem. Collaborative Filtering -based recommender systems are unable to provide reliable recommendations for such items or sometimes simply ignore them. This problem is known as long tail problem and most of the items in recommender systems fall in this category. As these item cannot be left overlooked and there is a need to develop some algorithm that can filter and provide accurately recommendations from the items that exist in long tail category. In Collaborative Filtering approach it is tough to get ratings for new item from significant number of users. This is called new item cold-start problem. Conventional K-Means algorithms choose initial centroids randomly. Hence, they can converge to local optima resulting in poor quality clusters. We want a recommendation algorithm to be accurate than the conventional K-Means based CF algorithms. The coverage of a recommendation algorithm should be maximum. A recommendation algorithm should scale gracefully with the increase in data and also have good clustering capacity.

III. LITERATURE SURVEY

4.1 NOVEL CENTROID SELECTION APPROACHES FOR K-MEANS CLUSTERING BASED RECOMMENDER SYSTEMS, SOBIA ZAHRA , MUSTANSAR ALI GHAZANFAR , ASRA KHALID , MUHAMMAD AWAIS AZAM , USMAN NAEEM , ADAM PRUGEL BENNET . [1]

K-Means clustering-based recommendation algorithm, which addresses the scalability issues associated with traditional recommender systems. The issue with traditional K-Means clustering algorithms is that they choose the initial centroid randomly, which leads to inaccurate recommendations and increased cost for offline training of clusters. The proposed centroid selection method has the ability to exploit underlying data correlation structures, which has been proven to exhibit superior accuracy and performance in comparison to the traditional centroid selection strategies, which choose centroids randomly. Centroid selection in k- means based recommender systems can improve performance as well as being cost saving.

Recommendations can be presented to an active user in two different ways

1. Predicting ratings of item that the user has not seen.

2. Construct a list of items ordered by the users preferences, which is known as top-N recommendations.

Centroid selection approaches in k-means clustering for improving the recommendation process for recommender

systems. We have applied these selection approaches along with traditional K-Means for comparing their performance. 4.2 RECOMMENDER SYSTEM FOR PREDICTING STUDENT PERFORMANCE, NGUYEN THAI-NGHE, LUCAS DRUMOND, ARTUS KROHN-GRIMBERGHE, LARS SCHMIDT-THIEME. [2]

In this paper, we use Recommender System for e-learning system. It’s recommending resources to the learners. (Resource->books, papers, etc., and Learner -> users or students). Recommender systems focus on reducing the information overload and act as information filters. Used for collaborative filtering and matrix factorization. In this methods have “new-items” problem. So, we have to use for traditional regression methods (logistic regression or linear regression) to overcome this problem for “New-items” Problem. The aim of recommender system is making vast catalogs of products consumable by learning user preferences and applying them to items formerly unknown to the user, thus being able to recommend what has a high likelihood of being interesting to the target user. The two most common tasks in recommender systems are Top-N item recommendation where the recommender suggests a ranked list of (at most) N items i ∈ I to a user u ∈ U and rating prediction where the aim is predicting the preference score (rating) r ∈ R for a given user-item combination. For item recommendation the training data is currently usually unary information on items being viewed, clicked, purchased etc. by the respective users. Rating prediction mainly uses rating information itself as training data. In the early days of recommender systems, content was deemed very valuable training data and research data sets contained lots of attribute information for algorithm training. But since the late nineties the so called collaborative filtering approach prevails. Collaborative filtering is based on the assumption that similar users like similar things and, being content-agnostic, focuses only on the past ratings assigned. Making use of matrix factorization which is known to be one of the most successful methods for rating prediction, outperforming other state-of-the-art methods. 4.3 PERFORMANCE IMPROVEMENT OF A MOVIE RECOMMENDER SYSTEM BASED ON PERSONAL PROPENSITY AND SECURE COLLABORATIVE FILTERING, WOON-HAE JEONG, SE-JUN KIM, DOO-SOON PARK AND JIN KWAK, [3]

In this paper, We use collaborative filtering system. The collaborative filtering system is performance improvement of a movie recommender system based on personal propensity and secure collaborative filtering. The collaborative filtering system have sparsity, scalability, and transparency, as well as security issues in the collection of the information that becomes the basis for preparation of the profiles. We have to used for Personal Propensity with logistic regression. So, that model can be avoided for the scalability and sparsity problem or issues. 4.4 MULTICRITERIA BASED RESTAURANT RECOMMENDER SYSTEM, GEDIMINAS ADOMAVICIUS, YOUNGOK KWON, [4] In the application of restaurant recommender system, recommender system recommend the restaurants based on the user preferences like cost, service, distance, quality etc., It recommends restaurants based on previous ratings and similarities of previous users preference and current users preference. The problem was multicriteria rating. In our approach we consider all criteria not a single

276

Page 4: Centroid Selection Approaches for K-Means …seeepedia.org › wp-content › uploads › 2018 › 06 › S18_05_56.pdfCentroid selection in k- means based recommender systems can

SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018

978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment

one. The proposed approaches are Similarity based approach and Aggregation function based approach. These approaches improves recommendation accuracy. 4.5 A NOVEL COLLABORATIVE FILTERING RECOMMENDATION SYSTEM ALGORITHM, QI WANG, WEI CAO AND YUN LIU, [5] In this paper, Collaborative filtering approaches is used. It computes the similarity of items or users according to a user-items rating matrix. It constructs the sparse rating matrix. This algorithm has data sparsity problem. The proposed improves clustering based collaborative filtering algorithm for dealing with data sparsity. (Using for the algorithm K-Means). This algorithm improved better accuracy. 4.6 ITEM-BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHMS, BADRUL SARWAR, GEORGE KARYPIS, JOSEPH KONSTAN, AND JOHN RIEDL, [6] Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative filltering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative clustering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative clustering techniques. Item-based techniques analyze the user-item matrix to identify relationships between different items, and then use these relationships to indirectly compute recommendations for users. 4.7 THE COLLABORATIVE FILTERING RECOMMENDATION BASED ON SOM CLUSTER-INDEXING CBR, TAE HYUP ROHA, KYONG JOO OHB, INGOO HANA, [7]

In this paper, We use Collaborative filtering recommender system. The main concerns of this algorithm are Prediction accuracy and Speed of response time. To identified the problems are Data Sparsity Problem and Scalability Problem. To overcome the above problems using the model for Collaborative filtering recommendation model. Our approaches for three step processes. They are,

1. Profiling -> The process of examining the data available in an existing data source, and collecting statistics and information about that data.

2. Inferring 3. Predicting This models are combines a collaborative filtering

algorithm with two Machine learning processes are SOM( Self-Organizing Map) and CBR( Case Based Reasoning ). The collaborative filtering algorithm and SOM then CBR, by changing an unsupervised learning clustering problem into a supervised learning user preference learning problem. In this paper, we propose a Structure Conduct Performance SCP model which applies two combing machine learning

techniques, SOM and CBR. It is an new approaches in the Collaborative filtering recommendation field. 4.8 PROBABILISTIC MEMORY-BASED COLLABORATIVE FILTERING, KAI YU, ANTON SCHWAIGHOFER, VOLKER TRESP, XIAOWEI XU, AND HANS-PETER KRIEGEL, [8]

In this paper, we use Probabilistic Memory-Based Collaborative Filtering and Probabilistic Active learning method in Recommender Systems. The existing method for Traditional Memory-Based Collaborative Filtering. This model has a problem for “New User Problem”. We use the model for Probabilistic Active Learning to overcome “New User Problem”. Another one method for Probabilistic Memory-Based Collaborative Filtering (PMCF). This model is used to improve accurate and efficient prediction of user preference. 4.9 SCALABLE COLLABORATIVE FILTERING USING CLUSTER-BASED SMOOTHING, GUI-RONG XUE, CHENXI LIN, QIANG YANG, WENSI XI, HUA-JUN ZENG, YONG YU, ZHENG CHEN, [9]

In this paper, we use the model, memory based for collaborative filtering system. To identify the similarity between two users and comparing their ratings on a set of items. In this model have been two kinds of problems are Sparsity problem and Scalability problem. Model-based collaborative filtering can be used. In this model can be avoided for this problem. But these models tends to limit the range of users. But, In this paper, Introduce two kinds of approaches, they are

1. Smoothing-based To solve the missing-value problems. And

to improve the accuracy then, solve the scalability problem. (It is used to remove noise from a dataset).

2. Neighborhood

These two kinds of approaches to improve the accuracy and to increase the efficiency. And another new approaches for state-of-the-art collaborative filtering algorithms. 4.10 COLLABORATIVE FILTERING USING ORTHOGONAL NONNEGATIVE MATRIX TRI-FACTORIZATION, GANG CHEN, FEI WANG, CHANGSHUI ZHANG, [10] In this paper, we use the model for collaborative filtering. The aim of this model to predicting a test user’s ratings for new items by integrating other like-minded user’s rating information. The traditional collaborative filtering can be divided into two model such as,

1. Memory Based -> In this model have two problems are Sparsity and Scalability 2. Model Based -> Too much of cost to establishing a model.

To overcome this problem using for Novel based collaborating filtering by applying ONMFT. (Orthogonal Nonnegative Matrix Tri-factorization). To avoid the sparsity problem using matrix factorization. To solve the scalability problem using clustering– User-item matrix. In this algorithms to solve sparsity and scalability problem and to achieves well performance in that particular time. We propose system for Hybrid collaborative filtering recommender system. 4.11 COMBINING CONTENT-BASED AND COLLABORATIVE RECOMMENDATIONS A HYBRID APPROACH BASED ON BAYESIAN NETWORKS, LUIS M. DE CAMPOS, JUAN M. FERNÁNDEZ-LUNA, JUAN F. HUETE, MIGUEL A. RUEDA-MORALES, [11]

277

Page 5: Centroid Selection Approaches for K-Means …seeepedia.org › wp-content › uploads › 2018 › 06 › S18_05_56.pdfCentroid selection in k- means based recommender systems can

SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018

978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment

This paper use for a Hybrid approach based on Bayesian networks. Hybrid recommender system is nothing but the combination of content based recommender system and collaborative based recommender system.

The problems are classified into two in memory based Sparsity and Scalability in memory based collaborative filtering. This problem is too much of cost to establish the model based collaborative filtering approach. These approaches have above mention problems. This problem can be avoided, by using the technique for hybrid recommender system to improve quality of our recommender system. Hybrid recommender technique avoids the cold start problem by using boosting algorithms. In this paper, we use the model for probabilistic model. We have proposed a hybrid recommender model based on Bayesian networks model. 4.12 A COLLABORATIVE FILTERING RECOMMENDATION BASED ON USER PROFILE AND USER BEHAVIOR IN ONLINE SOCIAL NETWORKS, LU YANG, ANILKUMAR KOTHALIL GOPALAKRISHNAN, [12]

Aim of this paper, the similarity among users and items in a social network. It is based on algorithms for CF (Collaborative filtering algorithm) and SimRank algorithm (Similarity Based on Random Walk). The Collaborative filtering algorithm used to predict the relationship between user rating on items and users profile. The SimRank algorithm used to calculates the smilarity among users and finding the nearest neighbors for each user in the social network. In this paper focus on the social recommendation problem. These two algorithms to improve the prediction accuracy of recommender system. 4.13 HYBRID USER-ITEM BASED COLLABORATIVE FILTERING, NITIN PRADEEP KUMAR, ZHENZHEN FAN, [13]

In this paper, we use a Hybrid user-item based CF. The traditional collaborative filtering recommender system is used in recommendation system. These algorithms face two major issues data sparsity, scalability problem. To overcome the these problems by using techniques for CBR and SOM based on hybrid user-item collaborative filtering.

1. CBR-> Case Based Reasoning It’s combined with average filling is used to solve the sparsity issues.

2. SOM -> Self-Organizing Map It’s optimized with GA (Genetic Algorithm) is used to solve the scalability problem for item based collaborative filtering. We propose a hybrid user-item based CF to achieve a more personalized product recommendation for a user.

K-MEANS CLUSTERING ALGORITHM

1. Define desired numbers of clusters, k. 2. Choose the k users uniformly at random from U, as initial starting points. 3. Assign each user to the cluster with nearest centroid. 4. Calculate mean of all clusters and update centroid value to the mean value of that cluster. 5. Repeat step 3 and 4, till no user changes its cluster membership or any other convergence criteria is met. 6. return {c1; c2; . . . ; c}. k centroids .

The main design objectives include scalability, sparsity, accuracy and quality of Cluster

IV. PROPOSED METHODOLOGY

FLOW DIAGRAM

K-MEANS CLUSTERING .

DATA PREPROCESSING Data preparation is the process of collecting, cleaning

and consolidating data into one file or data table.But the collected data can’t be used directly for performing analysis process. To solve this problem data preparation is done. Two types of techniques are listed below. They are,

1. Data Preprocessing 2. Data Wrangling

CLUSTERING ANALYSIS

Clustering is a process of partitioning or divided a set of data into a set of meaningful classes or sub-classes. It is called as clusters. It helps users understand the grouping or structure in a data set. It can be used a stand-alone tool to get data distribution or preprocessing steps involve for other algorithms. Cluster models are a lot of clustering methods can be applied to a dataset in order to partitioning the data or information. Algorithm will always depends on the characteristics of the data set.

USER AND ITEM-BASED FILTERING User and Item Based Collaborative Filtering recommends items on the basis of the similarity matrix. This algorithm is efficient and scalable. In this project we will use the demo MovieLens dataset. Identify which items are similar in terms of having been purchased by the same people Recommend to a new user the items that are similar to its purchases

K-MEANS CLUSTERING

K-Means is a clustering approach that belogs to the class of unsupervised statistical learning methods. K-Means is very popular in a variety of domains. One of the first steps in

278

Page 6: Centroid Selection Approaches for K-Means …seeepedia.org › wp-content › uploads › 2018 › 06 › S18_05_56.pdfCentroid selection in k- means based recommender systems can

SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018

978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment

building a K-Means clustering work is to define the number of clusters to work with. Subsequently, the algorithm assigns each individual data point to one of the clusters in a random fashion. The underlying idea of the algorithm is that a good cluster is the one which contains the smallest possible within-cluster variation of all observations in relation to each other. The most common way to define this variation is using the squared Euclidean distance.

V. EXPERIMENT RESULT

In this experiments, our datasets collected from Films, Recommendation sites, these are generally used for the recommender systems. The datasets are large datasets and frequently use for the recommendation algorithms, its measure the scalability of an our algorithm. In this paper we use for MovieLens (Latest-Small, 1M).

MovieLens Latest Small Ratings (MLSR) MovieLens 1M Ratings (ML1M)

MovieLens Latest Small Raings (MLSR) The MovieLens Latest Small Ratings dataset has

100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.

MovieLens 1M Ratings (ML1M) The MovieLens 1M Ratings dataset has 1 Million

Ratings from 6000 users on 4000 Movies. DATA PREPROCESSING

USER AND ITEM BASED SIMILARITY

NORMALIZED DATA

EVALUATION RESULT

CLUSTER QUALITY

K-MEANS RESULT WITH 3 CLUSTERS

K-MEANS RESULT WITH 5 CLUSTERS

279

Page 7: Centroid Selection Approaches for K-Means …seeepedia.org › wp-content › uploads › 2018 › 06 › S18_05_56.pdfCentroid selection in k- means based recommender systems can

SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018

978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment

K-MEANS RESULT WITH 7 CLUSTERS

K-MEANS RESULT WITH 9 CLUSTERS

VI. CONCLUSION AND FUTURE WORK

In this project various issues of recommender systems such as scalability and sparsity have been addressed by using different K-Means clustering algorithms. Scalability issues are addressed with traditional system and sparsity issues are addressed with collaborative filtering method using non-negative factorization algorithm. The performance analysis has been done using

centroid selection algorithms. Hence the performance has been increased using based on the analysis of different K-Means algorithms and the model based filtering approach which has offline clustering methods has been used. Thus the latency of search is reduced, which makes search easier for the users. The future enhancement of this project is to reduce scalability and sparsity problems more efficiently using various types of K-Means algorithms. After, complete the above module then focus on the Hybrid based on Recommender Systems.

VII. REFERENCES

[1] Sobia Zahra, Mustansar Ali Ghazanfar, Asra Khalid, Muhammad Awais Azam, Usman Naeem, Adam Prugel Bennett, “ Novel centroid selection approaches for KMeans-clustering based recommender systems ”, Information Sciences Vol. 320, pp. 156-189, May 2015.

[2] Nguyen Thai-Nghe, Lucas Drumond, Artus Krohn-Grimberghe, Lars Schmidt-Thieme, “ Recommender System for Predicting Student Performance ”, Procedia Computer Science Vol.1, pp. 2811-2819, 2010.

[3] Woon-hae Jeong, Se-jun Kim, Doo-soon Park and Jin Kwak, “ Performance Improvement of a Movie Recommendation System based on PersonalPropensity and Secure Collaborative Filtering ”, J Inf Process Syst, Vol.9, No.1, March 2013.

[4] Gediminas Adomavicius , YoungOk Kwon, “ New Recommendation Techniques for Multi-Criteria Rating Systems ”, Gediminas Adomavicius and Young Ok Kwon, University of Minnesota, IEEE Intelligent System, Vol. 22, Issue. 3, May-June 2007.

[5] Qi Wang, Wei Cao and Yun Liu,“A Novel Clustering Based Collaborative Filtering Recommendation System Algorithm ”, Springer Science, Vol.260, pp 673-680, November 2013.

[6] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, “ ItemBased Collaborative Filtering Recommendation Algorithms ”, ACM Proceedings of the 10th international conference on World Wide Web, No.01, pp. 285-295, May 01-05, 2001.

[7] Gui-Rong Xue1, Chenxi Lin1, Qiang Yang3, WenSi Xi4, Hua-Jun Zeng2, Yong Yu1, Zheng Chen2, “Scalable Collaborative Filtering Using Cluster-based Smoothing* ”, ACM, August 15–19, 2005.

[8] Nagaraj, B., P. Muthusami, and N. Murugananth. "Optimum PID Controller Tuning Using Soft computing Methodologies for Industrial Process." Karpagam Journal of Computer Science: 1761.

[9] Vidhya, S., and B. Nagaraj. "Fuzzy based PI Controller for Basis Weight Process in Paper Industry." Fuzzy Systems 4.7 (2012): 268-272.

[10] Nagaraj, B., and P. Vijayakumar. "CONTROLLER TUNING FOR INDUSTRIAL PROCESS-A SOFT COMPUTING APPROACH." Int. J. Advance. Soft Comput. Appl 4.2 (2012).

280

Page 8: Centroid Selection Approaches for K-Means …seeepedia.org › wp-content › uploads › 2018 › 06 › S18_05_56.pdfCentroid selection in k- means based recommender systems can

SEEE DIGIBOOK ON ENGINEERING & TECHNOLOGY, VOL. 01, MAY 2018

978-81-933187-0-6 © 2018 SEEEPEDIA.ORG Society for Engineering Education Enrichment

[11] Nagaraj, B., and P. Vijayakumar. "Bio Inspired Algorithm for PID Controller Tuning and Application to the Pulp and Paper Industry." Sensors & Transducers 145.10 (2012): 149.

[12] Lu Yang, Anilkumar Kothalil Gopalakrishnan, “A Collaborative Filtering Recommendation Based on User Profile and User Behavior in Online Social Networks”, IEEE International Computer Science and Engineering Conference 14, 2014.

[13] Nitin Pradeep Kumar, Zhenzhen Fan, “ Hybrid User-Item Based Collaborative Filtering ”, Procedia Computer Science 60, pp. 1453 – 1461, 2015.

[14] Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete, Miguel A. Rueda-Morales “ Combining content-based and collaborative recommendations A hybrid

approach based on Bayesian networks ”, International Journal of Approximate Reasoning 51, pp. 785–799, April 2010.

[15] Gang Chen, Fei Wang, Changshui Zhang, “ Collaborative filtering using orthogonal nonnegative matrix tri-factorization ”, Information Processing and Management 45, pp. 368–379, January 2009.

[16] Tae Hyup Roha, Kyong Joo Ohb, Ingoo Hana, “ The collaborative filtering recommendation based on SOM cluster-indexing CBR ”, Expert Systems with Applications 25, pp. 413–423, 2003.

[17] Kai Yu, Anton Schwaighofer, Volker Tresp, Xiaowei Xu, and Hans-Peter Kriegel, “ Probabilistic Memory-Based Collaborative Filtering ”, IEEE Transactions On Knowledge and Data Engineering, Vol. 16, No. 1, January 2004

281