introduction to mining massive datasets
Post on 16-Aug-2015
116 Views
Preview:
TRANSCRIPT
Mining massive datasets(based on Standford CS246)
Viet-Trung TRAN
Viet-‐Trung Tran 1
Credits
• Jure Leskovec, Anand Rajaraman, Jeff Ullman - Stanford University
• http://web.stanford.edu/class/cs246/ • http://mmds.org/
Viet-‐Trung Tran 2
What is data mining?
• Knowledge discovery from data
Viet-‐Trung Tran 3
Viet-‐Trung Tran 4
Data contains value and knowledge
Viet-‐Trung Tran 5
Data mining
• Store • Manage • Analyzed
Data mining ~ Big Data ~ Predic5ve Analysis ~ Data science
Viet-‐Trung Tran 6
Demand for data mining (US)
Viet-‐Trung Tran 7
What is data mining
• Given lots of data • Discover patterns and make predictions that
are – Valid – Useful – Unexpected – Understandable
Viet-‐Trung Tran 8
Data mining tasks
• Descriptive methods – Find human-interpretable patterns that describe
data • Clustering
• Predictive methods – Use some variables to predict the unknown or
future values of other variables • Recommender systems
Viet-‐Trung Tran 9
Meaningfulness of analytic answers
• Risk of "data mining" is that the discover is meaningless
• Bonferroni's principle – An algorithm or method we think is useful for
finding a particular set of data actually returns more false positives
Viet-‐Trung Tran 10
Dealing with data?
Viet-‐Trung Tran 11
Data mining cultures • Overlap with
– Database: large scale data, simple queries – Machine learning: Small data, complex models – CS theory: (Randomized) algorithms
• Different cultures – To DB guys: extreme form of analytic
processing – To ML guys: inference of models (A conclusion
reached on the basis of evidence and reasoning)
Viet-‐Trung Tran 12
What will be learn
• Mine different types of data – High dimensional – Graph – Infinite/never-ending – Labeled
• Use different models of computation – Batch processing – Stream
Viet-‐Trung Tran 13
To solve real-world problems
Viet-‐Trung Tran 14
top related