a brief tutorial on data mining-20140701

16
A BRIEF TUTORIAL ON DATA MINING Xiaming Chen, OMNI-Lab 2014-07

Upload: xiaming-chen

Post on 24-Jun-2015

115 views

Category:

Technology


4 download

DESCRIPTION

An introduction to data mining and RapidMiner, OMNI-Lab, 20140701

TRANSCRIPT

Page 1: A Brief Tutorial On Data Mining-20140701

A BRIEF TUTORIAL ON DATA MINING

Xiaming Chen, OMNI-Lab 2014-07

Page 2: A Brief Tutorial On Data Mining-20140701

OUTLINE

• Whats Data Mining?

• A Hands-on Practice

2

Page 3: A Brief Tutorial On Data Mining-20140701

WHATS DATA MINING

Page 4: A Brief Tutorial On Data Mining-20140701

WHATS DATA MINING

• Science: probability, statistics, graph theory etc.

• Techniques: clustering, classification, regression, prediction etc.

• A way to think about this world.

On textbooks

4

Page 5: A Brief Tutorial On Data Mining-20140701

WHATS DATA MINING

• Science? Maybe

In reality

Research on Social Networks5

Page 6: A Brief Tutorial On Data Mining-20140701

WHATS DATA MINING

• Prediction? Yes!

In reality

The Highest Creature Intelligence (100%)

Anti-Prediction

6

US Election, Bayes Selection!

Page 7: A Brief Tutorial On Data Mining-20140701

WHATS DATA MINING

• The world, thinking? Spying!

In reality

7

“Illegal SPYING below!”

Page 8: A Brief Tutorial On Data Mining-20140701

WHATS DATA MINING• You Need, You Learn, You Expert

8

Insights Thinking Programming

Page 9: A Brief Tutorial On Data Mining-20140701

HANDS-ON PRACTICE

Page 10: A Brief Tutorial On Data Mining-20140701

HANDS-ON PRACTICE• Tools to Facilitate Your Data Analysis

• Commercial

• SAS

• IBM SPSS

• Matlab etc.

• Free/Open Source

• RapidMiner + Weka

• R (my favor)

• Python + SciPy + scikit-learn

• Hadoop/Spark etc.

10

Page 11: A Brief Tutorial On Data Mining-20140701

HANDS-ON PRACTICE• Example: RapidMiner + StoneFlakes

http://archive.ics.uci.edu/ml/datasets/StoneFlakes11

Page 12: A Brief Tutorial On Data Mining-20140701

HANDS-ON PRACTICE• RapidMiner (ads-free)

• A Java-based IDE for ML, data mining, text mining etc.

• Modular design, graphic interface, zero-line coding

• Complete Process logic: data ETL, visualization, modeling, prediction, reports etc.

• Growing extension market

• CLI and API for other programs

• Call functions of Weka and RDownload: http://www.rapidminer.com/12

Page 13: A Brief Tutorial On Data Mining-20140701

HANDS-ON PRACTICE

• StoneFlakes • StoneFlakes.csv: flake

attribute information

• annotation.csv: inventory properties

Formated: http://io.hsiamin.com/data/StoneFlakes.tar.gz13

Page 14: A Brief Tutorial On Data Mining-20140701

HANDS-ON PRACTICE

• Demo

14

Page 15: A Brief Tutorial On Data Mining-20140701

SUMMER COURSE• Spatial-temporal Data Analysis

• 郑宇,MSR

• 7.1 ~ 31, 2014

• 周⼆二、四下午2:00 ~ 5:40

• 闵⾏行上院316

15

Page 16: A Brief Tutorial On Data Mining-20140701

http://www.hsiamin.com

Thanks caesar0301@github

16