chapter 01.introduction to data mining

27
Data Mining IKO42351 Bahan Rancangan Pengajaran Mohamad Ivan Fanany, Dr. Eng.,

Upload: darmatasia-palehai

Post on 24-Dec-2015

19 views

Category:

Documents


4 download

DESCRIPTION

Data Mining

TRANSCRIPT

Page 1: Chapter 01.Introduction to Data Mining

Data MiningIKO42351

Bahan Rancangan PengajaranMohamad Ivan Fanany, Dr. Eng.,

Page 2: Chapter 01.Introduction to Data Mining

Lectures Introduction

● Goals and Objectives

● Textbooks

● Syllabus

● Evaluation

● Lecture Plans

● Rules

Page 3: Chapter 01.Introduction to Data Mining

Goals and Objectives

● After finishing this course, students are expected to

understand the concept, tools, and techniques of

machine learning for data mining.

● Beside acquiring general picture of the most recent

development in data mining, students are also

expected to deeply understand the used techniques

and appreciate their strengths and applicability by

actively doing their own experiments both as

individual and as a member of a team.

Page 4: Chapter 01.Introduction to Data Mining

Textbooks

Major textbookbefore UTS

Programming Book

Page 5: Chapter 01.Introduction to Data Mining

Textbooks

1. Introduction to Data Mining, Pang-Ning

Tan, Michael Steinbach, Vipin Kumar,

Addison-Wesley, 2006

2. R and Data Mining, Examples and Case

Studies, YangChang Zhao, 2013

Page 6: Chapter 01.Introduction to Data Mining

Syllabus (Weekly)

1) Introduction

2) Data

3) Exploring Data

4) Classification: Basic Concepts, Decision

Tree, and Model Evaluation

5) Classification: Alternative Techniques

6) Association: Basic Concept and Algorithms

7) Association Analysis: Advanced Concepts

8) Cluster Analysis

9) Anomali Detection

UTSWittenCh.1-7

UASKumarCh.6-8WittenCh.8+

Page 7: Chapter 01.Introduction to Data Mining

Evaluation

1.Tugas Individu (PR): 8 kali = 16%

2.Tugas Kelompok (TK): 1 kali = 14%

3.Ujian Tengah Semester = 35%

4.Ujian Akhir Semester = 35%

5.Bonus (partisipasi di kelas, pop-quiz, dll)=++

6.Total: 100% ++

Page 8: Chapter 01.Introduction to Data Mining

Rules

● Toleransi keterlambatan 15 menit

● Handphone harus non-aktif

● Terkait PR:

◆Seluruh PR dan Tugas diwajibkan

menggunakan Python(x,y)

◆Untuk PR, tuliskan kode asisten dosen pada

masing-masing berkas PR, dan kumpulkan

berdasarkan kode asisten tersebut.

◆Penalti keterlambatan → Lihat BRP

Page 9: Chapter 01.Introduction to Data Mining

R and R Studio

http://www.rstudio.com/http://www.r-project.org/

Page 10: Chapter 01.Introduction to Data Mining

● Lots of data is being collected

and warehoused

◆ Web data, e-commerce

◆ purchases at department/

grocery stores

◆ Bank/Credit Card

transactions

● Computers have become cheaper and more powerful

● Competitive Pressure is Strong

◆ Provide better, customized services for an edge (e.g. in

Customer Relationship Management)

Why Mine Data? Commercial Viewpoint

Page 11: Chapter 01.Introduction to Data Mining

Why Mine Data? Scientific Viewpoint

● Data collected and stored at

enormous speeds (GB/hour)

◆ remote sensors on a satellite

◆ telescopes scanning the skies

◆ microarrays generating gene

expression data

◆ scientific simulations

generating terabytes of data

● Traditional techniques infeasible for raw data

● Data mining may help scientists

◆ in classifying and segmenting data

◆ in Hypothesis Formation

Page 12: Chapter 01.Introduction to Data Mining

Mining Large Data Sets - Motivation

● There is often information “hidden” in the data that is not readily evident

● Human analysts may take weeks to discover useful information

● Much of the data is never analyzed at all

Number of

analysts

Page 13: Chapter 01.Introduction to Data Mining

What is Data Mining?

● Many Definitions◆ Non-trivial extraction of implicit, previously

unknown and potentially useful information from data

◆ Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

Page 14: Chapter 01.Introduction to Data Mining

•What is (not) Data Mining?

What is Data Mining?

– Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area)

– Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)

What is not Data Mining?

– Look up phone number in phone directory

– Query a Web search engine for information about “Amazon”

Page 15: Chapter 01.Introduction to Data Mining

● Draws ideas from machine learning/AI, pattern

recognition, statistics, and database systems

● Traditional Techniques

may be unsuitable due to

◆Enormity of data

◆High dimensionality

of data

◆Heterogeneous,

distributed nature

of data

Origins of Data Mining

Machine Learning/

Pattern Recognition

Statistics/AI

Data Mining

Database systems

Page 16: Chapter 01.Introduction to Data Mining

© 2002, AvaQuest Inc.

Text

Mining

Data

Mining

Data

Retrieval

Information

Retrieval

Search

(goal-oriented)

Discover

(opportunistic)

Structured

Data

Unstructured

Data (Text)

Search Vs Discovery

Data Mining = KDD: Knowledge ‘Discovery’ from DB

Page 17: Chapter 01.Introduction to Data Mining

Data Mining Tasks

● Prediction Methods

◆Use some variables to predict unknown or

future values of other variables.

● Description Methods

◆Find human-interpretable patterns that

describe the data.

From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

Page 18: Chapter 01.Introduction to Data Mining

Data Mining Tasks...

● Classification [Predictive]

● Clustering [Descriptive]

● Association Rule Discovery [Descriptive]

● Sequential Pattern Discovery [Descriptive]

● Regression [Predictive]

● Deviation Detection [Predictive]

Page 19: Chapter 01.Introduction to Data Mining

Do you want to be a Miner?

19

Wisdom

Knowledge

Information

Data

Pattern

Page 20: Chapter 01.Introduction to Data Mining

Why we need Data Mining?

The Internet

Storage

Storage

Storage

IncreasedCapacity

LowerCost

Faster... & Faster...

Storage

DATA EXPLOSION

DATA MINING

Wisdom

Knowledge

Information

Data

CompetitiveAdvantages

Page 21: Chapter 01.Introduction to Data Mining

Data Mining and Machine Learning

MACHINE LEARNING

DATA MINING

MULTI-SOURCE

MULTI-TYPE

ENSEMBLE LEARNING

MULTI-DIMENSION

SPATIO-TEMPORAL

BIG DATA

DEEP LEARNING

Page 22: Chapter 01.Introduction to Data Mining

Data Mining and Database

DATABASE

DATA MINING

DATA WAREHOUSE

DATA CLEANING

CLUSTER ANALYSIS

DATA CUBE OLAP

ASSOCIATION ANALYSIS

BIG DATA

Page 23: Chapter 01.Introduction to Data Mining

Evolution of Database Technology

Page 24: Chapter 01.Introduction to Data Mining

Financial Reporting

Page 25: Chapter 01.Introduction to Data Mining

Another Dashboard

Page 26: Chapter 01.Introduction to Data Mining

Another Dashboard

Page 27: Chapter 01.Introduction to Data Mining

Another Dashboard