mahout tutorial fossmeet nitc
DESCRIPTION
Biju B and Jaganadh G's presentation on Mahout at FOSSMEET-NITCTRANSCRIPT
Practical Machine LearningA Tutorial on Apache Mahout
Biju BNLP R&D Division365Media Pvt. [email protected]
FOSSMEET NITC,Calicut
4-6 February 2011
Biju B & Jaganadh G Practical Machine Learning
nlp r d $ whoweare
Working in Natural Language Processing (NLP), Machine Learning,Data Mining
Passionate about Free and Open source :-)
When gets free time teaches Python and blogs athttp://jaganadhg.freeflux.net/blog and contributes toOpenstreetmap
Works for 365Media Pvt. Ltd. Coimbatore India.
twitter handle : @jaganadhg, @bijub
Biju B & Jaganadh G Practical Machine Learning
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI) concerned withalgorithms that allow computers to learn.
This talk is not aimed to give introduction about Machine Learning
Dont expect some mathy equations here
Biju B & Jaganadh G Practical Machine Learning
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI) concerned withalgorithms that allow computers to learn.
This talk is not aimed to give introduction about Machine Learning
Dont expect some mathy equations here
Biju B & Jaganadh G Practical Machine Learning
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI) concerned withalgorithms that allow computers to learn.
This talk is not aimed to give introduction about Machine Learning
Dont expect some mathy equations here
Biju B & Jaganadh G Practical Machine Learning
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI) concerned withalgorithms that allow computers to learn.
This talk is not aimed to give introduction about Machine Learning
Dont expect some mathy equations here
Biju B & Jaganadh G Practical Machine Learning
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life ??
Yes
In our day to day life we may use many Machine Learning poweredtools
Recommendation Engines
Clustering
Classification , Spam Filtering
Sentiment Analysis
Fraud Detraction
Biju B & Jaganadh G Practical Machine Learning
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life ??
Yes
In our day to day life we may use many Machine Learning poweredtools
Recommendation Engines
Clustering
Classification , Spam Filtering
Sentiment Analysis
Fraud Detraction
Biju B & Jaganadh G Practical Machine Learning
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life ??
Yes
In our day to day life we may use many Machine Learning poweredtools
Recommendation Engines
Clustering
Classification , Spam Filtering
Sentiment Analysis
Fraud Detraction
Biju B & Jaganadh G Practical Machine Learning
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life ??
Yes
In our day to day life we may use many Machine Learning poweredtools
Recommendation Engines
Clustering
Classification , Spam Filtering
Sentiment Analysis
Fraud Detraction
Biju B & Jaganadh G Practical Machine Learning
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life ??
Yes
In our day to day life we may use many Machine Learning poweredtools
Recommendation Engines
Clustering
Classification , Spam Filtering
Sentiment Analysis
Fraud Detraction
Biju B & Jaganadh G Practical Machine Learning
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life ??
Yes
In our day to day life we may use many Machine Learning poweredtools
Recommendation Engines
Clustering
Classification , Spam Filtering
Sentiment Analysis
Fraud Detraction
Biju B & Jaganadh G Practical Machine Learning
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life ??
Yes
In our day to day life we may use many Machine Learning poweredtools
Recommendation Engines
Clustering
Classification , Spam Filtering
Sentiment Analysis
Fraud Detraction
Biju B & Jaganadh G Practical Machine Learning
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life ??
Yes
In our day to day life we may use many Machine Learning poweredtools
Recommendation Engines
Clustering
Classification , Spam Filtering
Sentiment Analysis
Fraud Detraction
Biju B & Jaganadh G Practical Machine Learning
Mahout
Mahout
Open Source project by Apache FoundationGoal of this project is to build scalable machine learning libraries
Biju B & Jaganadh G Practical Machine Learning
Mahout
Mahout
Mahout: a person who drives elephant ;-)The name comes from the project’s use of Apache Hadoop.
Biju B & Jaganadh G Practical Machine Learning
Why a new library ?
There are more than 30 Java libraries/ tools available for MachineLearning.Weka , Mallet, Classifier4j, Rapidminer ........
Large Amount of data processing is not an easy task
Machine Learning tools are supposed to produce quick results
If the amount of data is too large it is not easy to process with asingle machine (Even if it is powerful)
Mahout is scalable: the core algorithms in Mahout are implementedon top of Apache Hadoop using the map/reduce paradigm
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Biju B & Jaganadh G Practical Machine Learning
Recommendation
Filter information based on user preference
Searching a large set of people and finding a smaller set with tastessimilar to you
e.g :- Amazon’s book recommendation , Netflix movierecommendation
Biju B & Jaganadh G Practical Machine Learning
Document Classification
Classify documents based on its content
e.g: - spam filtering,priority inbox
Biju B & Jaganadh G Practical Machine Learning
Demo
Building recommendations engines with Mahout
Document Classification with Mahout
Biju B & Jaganadh G Practical Machine Learning
Reference
Biju B & Jaganadh G Practical Machine Learning
Reference
Mahout in Action - Book by Sean Owen and Robin Anil, publishedby Manning Publications.
Taming Text - By Grant Ingersoll and Tom Morton, published byManning Publications.
Introducing Apache Mahout - Grant Ingersoll - Intro to ApacheMahout focused on clustering, classification and collaborativefiltering. https://www.ibm.com/developerworks/java/library/j-mahout/index.html
Programming Collective Intelligence: Building Smart Web 2.0Applicationshttp://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325
Biju B & Jaganadh G Practical Machine Learning
Useful Resources
Apache Mahout Site http://mahout.apache.org/
Apache Mahout Mailing List [email protected]
The code which I used for Mahout demo is available athttp://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/
Twenty News Group data sethttp://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz
Biju B & Jaganadh G Practical Machine Learning
Questions ??
Biju B & Jaganadh G Practical Machine Learning
Acknowledgments
Thanks to :
Manning Publications for Review Copy of the book ”Mahout inAction”
Apache Mahout mailing list members
Ted Dunning and Robin Anil for suggestions
@chelakkandupoda for review and criticism
Mukundhanchari R&D Director 365Media Pvt. Ltd. for support andencouragement
Biju B & Jaganadh G Practical Machine Learning
Finally
Biju B & Jaganadh G Practical Machine Learning