introduction to data science and data...
TRANSCRIPT
![Page 1: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/1.jpg)
Rapidminer
Juan Camilo Estevez Cárdenas
July 5th to 29th of 2016
![Page 2: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/2.jpg)
Juan Camilo
Estevez
Cárdenas
Ingeniero de Sistemas Universidad Nacional de Colombia
2013
Maestría en Ingeniería Industrial Universidad Nacional de Colombia
2015
Beca Asistente Docente Programación de computadores
Universidad Nacional de Colombia
2013 – 2014
Universidad de Buenos Aires UBA
Gerencia de proyectos informáticos
Sistemas Inteligentes
2015 – I
Project Manager Professional (PMP)
Project Management Institute
![Page 3: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/3.jpg)
Organizational analytical evolution
![Page 4: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/4.jpg)
Advanced analytics
![Page 5: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/5.jpg)
Business Intelligence Architecture
(Rapidminer,2015)
(Chaudhuri,2011)
![Page 6: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/6.jpg)
Rapidminer
OPEN SOURCE DATA SCIENCE PLATFORM
Prep data, create models, validate, operationalize and embed in business processes.
https://rapidminer.com/
http://www.kdnuggets.com
/
Data scientist tool free
of code
![Page 7: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/7.jpg)
Characteristics
Connection to different data sources
- Excel, CSV, data bases, text files,
dropbox, amazon, twitter, salesforce.
Preprocessing or data preparation (format
and cleaning)
- Creation attributes, - Format and cleaning attributes, - Table operations, replaces, - Filters- Type conversions- Missing values treatment- Normalization- Oultiers treatment.
![Page 8: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/8.jpg)
Characteristics
Modeling (Data mining)
- Predictive
- Segmentation (Clustering)
- Classification
- Association
- Correlation
Models Validation
- Cross validation, split validation...
![Page 9: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/9.jpg)
Characteristics
Extensions
- Series
- R
- Python
- Text processing
- Weka
- Reporting
![Page 10: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/10.jpg)
Learning rapidminer
Documentation
- Web page: http://docs.rapidminer.com/
- Stand alone installation:
Examples
- Welcome window
- Click on operator and review help menú
- Repository Samples
![Page 11: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/11.jpg)
Rapidminer Academia
https://rapidminer.com/academia/studen-ts/
![Page 12: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/12.jpg)
Rapidminer example Beauty Data
- Load data from BeautyData.csv
- Exploratory data analysis.
- Example of Decision tree with rapidminer
![Page 13: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/13.jpg)
Bibliography
Chaudhuri, S., Dayal, U., & Narasayya, V. (2011). An overview of
business intelligence technology. Communications of the ACM,
54(8), 88. doi:10.1145/1978542.1978562
Laudon, K. C., & Laudon, J. P. (2012). Management Information
Systems (12th ED). Prentice Hall.
http://businessanalytics.com.mx/2014/08/27/diferencias-entre-
business-analytics-y-business-intelligence/
Gartner.Magic Cuadrant Survey, 2012.
Rapidminer. 2015. An introduction to advanced analytics
![Page 14: Introduction to Data Science and Data Visualizationdisi.unal.edu.co/.../talks/RapidminerJCEstevez.pdf · Rapidminer OPEN SOURCE DATA SCIENCE PLATFORM Prep data, create models, validate,](https://reader034.vdocuments.net/reader034/viewer/2022051802/5af50a857f8b9ae9488ceb4e/html5/thumbnails/14.jpg)