big data analytics module 4 – data mining and predictive analytics including mahout saptak sen,...
TRANSCRIPT
![Page 1: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/1.jpg)
Big Data AnalyticsModule 4 – Data Mining and Predictive Analytics Including Mahout
Saptak Sen, MicrosoftBill Ramos, Advaiya
![Page 2: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/2.jpg)
• Overview of predictive analytics & data mining
• How Microsoft supports predictive analytics
• How Mahout fits into the picture
• Demos
Agenda
![Page 3: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/3.jpg)
Data Mining
![Page 4: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/4.jpg)
Predicting future performance from historical data
*Source: Ventana Research, Predictive Analytics Benchmark Research Report, March 2012.
Recommenda-tion engines
Advertising analysis
Weather forecasting for business planning
Social network analysis
IT infrastructure and web app optimization
Legal discovery and document archiving
Pricing analysisFraud detection
Churn analysis
Equipment monitoring
Location-based tracking and services
Personalized Insurance
Predictive analytics should address the likelihood of something happening in the future, even if it is just an instant later*
![Page 5: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/5.jpg)
Data mining tool in SQL Server Analysis Services
• Rich data mining algorithms, for clustering, classification, forecasting through time series analysis, and more
• Rich developer experience
![Page 6: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/6.jpg)
Analysis Services Data Mining Algorithms
Classify Estimate Cluster Forecast Associate
• Decision Trees
• Logistic Regression
• Naïve Bayes
• Neural Networks
• Decision Trees
• Linear Regression
• Logistic Regression
• Neural Networks
• Clustering
• Time Series
• Association Rules
• Decision Trees
![Page 7: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/7.jpg)
Data mining add-in for Excel
• Ease of use through Excel
• Rich data mining algorithms for clustering, prediction, forecasting, market basket analysis, and more
• Scalable through integration with SSAS
![Page 8: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/8.jpg)
Algorithms: Data Mining Add-in for Excel
Menu Data Mining
Analyze Key Influencers Naïve Bayes
Detect Categories Clustering
Fill From Example Logical Regression
Forecast Time Series
Highlight Exceptions Clustering
Scenario Analysis – Goal Seek Logical Regression
Scenario Analysis – What If Logical Regression
Prediction Calculator Logical Regression
Shopping Basket Analysis Association Rules
![Page 9: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/9.jpg)
Demo 1: Excel Data Mining Add-In
Windows Azure HDInsight
Microsoft Excel(Mining Add-in)
Microsoft Excel
Excel Data Mining Add-in
Serving LayerSpeed LayerBatch Layer
Flat files (.txt, .dat, .xl
sx, etc.)
![Page 10: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/10.jpg)
Mahout
![Page 11: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/11.jpg)
Mahout
• Scalable machine learning algorithms on Hadoop platform
• Algorithms for clustering, classification, and batch-based collaborative filtering using the map/reduce paradigm
• Supports a wide range of use cases—from email spam filtering, to fraud detection, to recommendations for books or movies
Applications
ClusteringRecommendersVector Similarity
PatternMining
Classification
Regression
GeneticDimension Reduction
Matrices
Collocations
Examples
![Page 12: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/12.jpg)
Demo 2: Mahout
Flat files (.txt, .dat, .xl
sx, etc.)
Running Mahout job on Hadoop Command Window to get output
file
Convert to Mahout input
Hadoop Command Window
Output file
Serving LayerSpeed LayerBatch Layer
Windows Azure HDInsight
HDInsight Consoles
![Page 13: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/13.jpg)
Learn more
• Data Mining SSAS http://
msdn.microsoft.com/en-us/library/bb510516.aspx
• Microsoft SQL Server 2012 SP1 Data Mining Add-ins for Microsoft Office 2013
• http://www.microsoft.com/en-us/download/details.aspx?id=35578.
• Mahout on Windows Azure - Machine Learning Using Microsoft HDInsighthttp://social.technet.microsoft.com/wiki/contents/articles/15102.mahout-on-windows-azure-machine-learning-using-microsoft-hdinsight.aspx
![Page 14: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/14.jpg)
Questions?
![Page 15: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649da75503460f94a932ce/html5/thumbnails/15.jpg)