Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Statistics 202: Data MiningIntroduction
c©Jonathan TaylorBased in part on slides from textbook, slides of Susan Holmes
October 7, 2011
1 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Data Mining
What is data mining?
Non-trivial extraction of implicit, previously unknown andpotentially useful information from data
Data mining involves the use of sophisticated data analysistools to discover previously unknown, valid patterns andrelationships in large data sets.
A key feature of data mining is that the data sets arelarger than those encountered in “classical” statistics. Solarge that it must be (semi-)automated.
2 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Data Mining
Who uses data mining?
Industry:1 Netflix2 Amazon3 Google (i.e. google trends)
Science:1 Genomics2 Climate Science3 Astrophysics4 Neuroimaging
3 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Netflix
4 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Amazon
See larger image
Share your own customer images
Publisher: learn how customers can search inside thisbook.
+
Hello, Jonathan Taylor. We have recommendations for you. (Not Jonathan?) FREE Two-Day Shipping: See details
Jonathan's Amazon.com | Today's Deals | Gifts & Wish Lists | Gift Cards Your Digital Items | Your Account | Help
Search Books
Books AdvancedSearch
BrowseSubjects
NewReleases
BestSellers
The New YorkTimes® Bestsellers
Libros enespañol
BargainBooks Textbooks
Introduction to Data Mining [Hardcover]Pang-Ning Tan (Author), Michael Steinbach (Author), VipinKumar (Author)
(18 customer reviews) | (3)
List Price: $120.00
Price: $94.50 & this item ships for FREE withSuper Saver Shipping. Details
You Save: $25.50 (21%)
In Stock.Ships from and sold by Amazon.com. Gift-wrap available.
Want it delivered Tuesday, September 27? Order it in thenext 20 hours and 22 minutes, and choose One-Day Shipping atcheckout. Details
32 new from $94.50 20 used from $55.00
FREE Two-Day Shipping for Students. Learn more
Frequently Bought TogetherCustomers buy this book with Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The MorganKaufmann Series in Data Management Systems) by Ian H. Witten Paperback $39.50
Price For Both: $134.00
Show availability and shipping details
Shop All Departments Cart Wish List
Yes, I want FREE Two-DayShipping with Amazon Prime
Quantity: 1
or
Sign in to turn on 1-Click ordering.
More Buying Choices
52 used & new from $55.00
Have one to sell? or
Get a $62.20 Amazon Gift Card
Share
Tell the Publisher!I'd like to read this book on Kindle
Don't have a Kindle? Get your Kindlehere, or download a FREE KindleReading App.
Formats AmazonPrice
Newfrom
Usedfrom
Hardcover $94.50 $94.50 $55.00
Paperback -- -- $84.93
Sell Back Your Copy for $62.20Whether you buy it used on Amazon for $55.00 or somewhere else, you can sell it backthrough our Book Trade-In Program at the current price of $62.20 through December 20,2011. Restrictions Apply
Customers Who Bought This Item Also Bought Page 1 of 11
Data Mining: PracticalMachine Learning Toolsan... by Ian H. Witten
(13)
$39.50
The Elements ofStatistical Learning:Data Minin... by TrevorHastie
(45)
$61.32
Programming CollectiveIntelligence: BuildingSma... by Toby Segaran
(69)
$26.39
Data Mining: Conceptsand Techniques, ThirdEdition... by Jiawei Han
(4)
$60.12
Amazon.com: Introduction to Data Mining (9780321321367): ... http://www.amazon.com/Introduction-Data-Mining-Pang-Ning...
1 of 7 9/25/11 8:07 PM
5 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Google Trends
[email protected] | Sign out
andrew luck Search Trends Tip: Use commas to compare multiple search terms.
Searches Websites All regions All years
- Scale is based on the average worldwide traffic of andrew luck in all years. Learn more- An improvement to our geographical assignment was applied retroactively from 1/1/2011. Learn more
andrew luck 1.00
Rank by andrew luck
Interception caps tough day for Stanford's Andrew LuckSan Jose Mercury News - Nov 22 2009
Andrew Luck outplays Jake Locker as No. 13 Stanford dominates Washington 41-0Los Angeles Times - Oct 31 2010
Cam Newton wins Heisman Trophy over Andrew Luck, LaMichael James, Kellen Moore in New YorkNew York Daily News - Dec 12 2010
Andrew Luck leads Stanford past Va Tech 40-12Fox News - Jan 4 2011
Andrew Luck, No. 7 Stanford roll past San Jose State 57-3 in season openerWashington Post - Sep 4 2011
Andrew Luck throws for 325 yards as Stanford rolls ArizonaESPN - Sep 18 2011
More news results »
Regions
1. United States
2. Canada
3. Australia
4. United Kingdom
Cities
1. Stanford, CA, USA
2. Charlotte, NC, USA
3. San Francisco, CA, USA
4. Houston, TX, USA
5. San Jose, CA, USA
6. Herndon, VA, USA
7. Austin, TX, USA
8. Raleigh, NC, USA
9. Pleasanton, CA, USA
10. Seattle, WA, USA
Languages
1. English
2. Spanish
Export this page as a CSV file
Google Trends provides insights into broad search patterns. Please keep in mind that several approximations are used when computing these results.
©2008 Google - Discuss - Terms of Use - Privacy Policy - Help
Google Trends: andrew luck http://www.google.com/trends?q=andrew+luck&ctab=0&geo=a...
1 of 1 9/25/11 8:09 PM
6 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Genomics
7 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Neuroimaging
8 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Climate science
9 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Data Mining
Some things that are not data mining
Looking up a record in a database by identifier such as lastname . (No pattern is revealed by this lookup . . . )
Searching for “Amazon” on google. (Google has donesome data mining, but you have not . . . )
Testing a two-sample hypothesis in a clinical trial. (Dataset is often not large and unstructured.)
10 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Data Mining
Some things that are more like data mining
Noting that some last names occur in certain geographicalareas.
Taking all query results from google on Amazon anddiscovering that there are at least two groups: “Amazonriver” and “Amazon.com”
When doing multiple tests across many different genes,identifying very strongly significant genes . . .
11 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Data Mining
Prediction / Supervised Problems
In such problems there is an outcome or label we want topredict based on many features.
Classification
Regression
Outlier detection
12 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
Data Mining
Descriptive / Unsupervised Problems
In such problems, we are seeking to discover hidden “structure”in the data, without an outcome or label.
Clustering
Dimension Reduction
Association Rules
Semisupervised problems
A mix of labelled and unlabelled data is used.
13 / 1
Statistics 202:Data Mining
c©JonathanTaylor
Based inpart onslidesfromtext-book,
slides ofSusan
Holmes
14 / 1