introduction to mining massive datasets

Post on 16-Aug-2015

116 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mining massive datasets(based on Standford CS246)

Viet-Trung TRAN

Viet-­‐Trung  Tran   1  

Credits

•  Jure Leskovec, Anand Rajaraman, Jeff Ullman - Stanford University

•  http://web.stanford.edu/class/cs246/ •  http://mmds.org/

Viet-­‐Trung  Tran   2  

What is data mining?

•  Knowledge discovery from data

Viet-­‐Trung  Tran   3  

Viet-­‐Trung  Tran   4  

Data contains value and knowledge

Viet-­‐Trung  Tran   5  

Data mining

•  Store •  Manage •  Analyzed

Data  mining  ~  Big  Data  ~    Predic5ve  Analysis  ~  Data  science  

Viet-­‐Trung  Tran   6  

Demand for data mining (US)

Viet-­‐Trung  Tran   7  

What is data mining

•  Given lots of data •  Discover patterns and make predictions that

are – Valid – Useful – Unexpected – Understandable

Viet-­‐Trung  Tran   8  

Data mining tasks

•  Descriptive methods – Find human-interpretable patterns that describe

data •  Clustering

•  Predictive methods – Use some variables to predict the unknown or

future values of other variables •  Recommender systems

Viet-­‐Trung  Tran   9  

Meaningfulness of analytic answers

•  Risk of "data mining" is that the discover is meaningless

•  Bonferroni's principle – An algorithm or method we think is useful for

finding a particular set of data actually returns more false positives

Viet-­‐Trung  Tran   10  

Dealing with data?

Viet-­‐Trung  Tran   11  

Data mining cultures •  Overlap with

–  Database: large scale data, simple queries –  Machine learning: Small data, complex models –  CS theory: (Randomized) algorithms

•  Different cultures –  To DB guys: extreme form of analytic

processing –  To ML guys: inference of models (A conclusion

reached on the basis of evidence and reasoning)

Viet-­‐Trung  Tran   12  

What will be learn

•  Mine different types of data – High dimensional – Graph –  Infinite/never-ending – Labeled

•  Use different models of computation – Batch processing – Stream

Viet-­‐Trung  Tran   13  

To solve real-world problems

Viet-­‐Trung  Tran   14  

top related