data mining disasters

17
Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Upload: meir

Post on 05-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Data Mining Disasters. A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety. Data Mining Safety. Data mining disasters are a hazard to the progress of scientific research. We will review some common mining disasters and make recommendations for prevention. Numeric Overflow. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining Disasters

Data Mining Disasters

A Report

Mary McGlohonSIGBOVIK Commission for Workplace Safety

Page 2: Data Mining Disasters

Data Mining Safety

•Data mining disasters are a hazard to the progress of scientific research.

•We will review some common mining disasters and make recommendations for prevention

Page 3: Data Mining Disasters

Numeric Overflow

In 2007, numeric floods were responsible for over $600 million in property

damages.-Department of Made-Up Statistics

““’’’’

Page 4: Data Mining Disasters

Numeric Overflow

ERROR::NUMERICOVERFLOW Nobody expected the breach of the levees

Page 5: Data Mining Disasters

Numeric Overflow

•Also caused loss of several hundred nerd-hours.

•1 nerd-hour = 1 grad-student-hour = 0.25 faculty-hours = 6 undergrad-hours

Page 6: Data Mining Disasters

Numeric Overflow

•Recommendation: A drowning researcher’s best bet is to grab onto a floating log.

Page 7: Data Mining Disasters

Power Law Failures

•Occurs when confusing heavy-tailed distributions such as:

• Power Law (incl. Pareto, Zipf)

• Lognormal

• Weibull

• Burr

• Log-gamma

• Log-Log-Log-Log-Mushroom-Mushroom

Page 8: Data Mining Disasters

Power Law Failures

•Many natural phenomena have heavy tails.

• Magnitude of earthquakes

• Size of human settlements

• Degree distribution of “real” graphs

• Time-to-response in CS professors email

• Your mom

•However, confusing heavy-tailed distributions confused results in...

Page 9: Data Mining Disasters
Page 10: Data Mining Disasters

Power Law Failures

•Related danger: Statisticians, computer scientists, and physicists wasting valuable nerd-hours in religious arguments over which heavy-tailed distribution is being followed.

Page 11: Data Mining Disasters

Power Law Failures

•Statisticians get mean when they get religious. (SIGBOVIK07)

•Recommendation: Calm the hell down.

Page 12: Data Mining Disasters

Decision Tree Forest Fires

•Pruning is used to prevent overfitting.

•When overpruning occurs, trees are burned to stumps.

•This spreads, torching entire forests.

(Aww...)

Page 13: Data Mining Disasters

Decision Tree Forest Fires•Recommendation:

Researchers should obtain burning permit before pruning with fire.

•Smoking while researching is not recommended-- if you choose to do so, make sure your “butts are out”.

Page 14: Data Mining Disasters

Voting Fraud by One-Armed Bandits

•Cascading failures from other fields may cause disasters in data mining.

•Fatal mistake: combining related subfields voting mechanisms and one-armed bandit problems.

Page 15: Data Mining Disasters

Voting Fraud by One-Armed Bandits

•One-armed bandits commit voting fraud by:

• Impersonating real voting machines.

• Cramming cake into voting machines.

• (The cake is a lie.)

Page 16: Data Mining Disasters

Other safety measures

•Cool mining helmets

Page 17: Data Mining Disasters

Conclusion

•The Commission for Workplace Safety hopes this has raised awareness of potential data mining disasters.

•When faced with data-mining disasters,

• Remain Calm.

• Blame it on one-off errors, lack of rigor in proofs of correctness, or whatever government agency is funding the project.