stefan arnborg, kth stefan statistical methods in applied computer science dd2447, dd3342, spring...

41
Stefan Arnborg, KTH http://www.nada.kth.se/~stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Post on 22-Dec-2015

227 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Stefan Arnborg, KTHhttp://www.nada.kth.se/~stefan

Statistical Methods in Applied Computer Science

DD2447, DD3342, spring 2010

Page 2: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

SYLLABUS

Common statistical models and their use:Bayesian, testing, and fiducial statistical philosophyHypothesis choiceParametric inferenceNon-parametric inferenceElements of regressionClusteringGraphical statistical modelsPrediction and retrodictionChapman-Kolmogoroff formulationElements of Vapnik/Chervonenki's learning theoryEvidence theory, estimation and combination of evidence.Support Vector Machines and Kernel methodsVovk/Gammerman hedged prediction technologyStochastic simulation, Markov Chain Monte Carlo.

Page 3: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

LEARNING GOALS

After successfully taking this course, you will be able to:

-motivate the use of uncertainty management and statistical methodology in computer science applications, as well as the main methods in use,

-account for algorithms used in the area and use the standard tools,

-critically evaluate the applicability of these methods in new contexts, and design new applications of uncertainty management,

-follow research and development in the area.

Page 4: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

GRADINGDD2447: Bologna grades

Grades are E-A during 2009. 70% of homeworks and a very short oral discussion of them gives grade C. Less gives F-D.

For higher grades, essentially all homeworks should be turned in on time. Alternative assignments will be substituted for those homeworks you miss.

For grade B you must pass one Master's test, for grade A you must do two Master's tests or a project with some research content.

DD3342: Pass/Fail

Research level project, or deeper study of part of course

Page 5: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010
Page 6: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010
Page 7: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 9: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Applications of Uncertainty everywhere

Medical Imaging/Research (Schizophrenia)

Land Use Planning

Environmental Surveillance and Prediction

Finance and Stock

Marketing into Google

Robot Navigation and Tracking

Security and Military

Performance Tuning

Page 10: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Some Master’s Projects using this syllabus (subset)

• Recommender system for Spotify• Behavior of mobile phone users• Recommender system for book club• Recommender for job search site• Computations in evolutionary genetics• Gene hunting• Psychiatry: genes, anatomy, personality• Command and control: Situation awareness• Diagnosing drilling problems• Speech, Music, …

Page 11: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 12: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Aristotle: Logic

Logic as a semi-formal system was created by Aristotle, probably inspiredby current practice in mathematicalarguments.

There is no record of Aristotle himselfapplying logic, but probably the Elementsof Euclid derives from Aristotles illustrations of the logical method.

Which role has logic in Computer Science??

Page 13: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Nicomachean Ethics

• Every action is thought to aim at some good• Should we not, like archers, aim at what is

right?• We must be content to indicate the truth

roughly and in outline, and with premises of the same kind, to reach conclusions that are no better.

• It is equally foolish to expect probable reasoning from a mathematician as it is to demand, from a rhetorician, scientific proofs.

Page 14: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Visualization

• Visualize data in such a way that the important aspects are obvious - A good visualization strikes you as a punch between your eyes (Tukey, 1970)

• Pioneered by Florence Nightingale, first female member of Royal Statistical Society, inventor of pie charts and performance metrics

Page 15: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Probabilistic approaches

• Bayes: Probability conditioned by observation

• Cournot: An event with very small probability will not happen.

• Vapnik-Chervonenkis: VC-dimension and PAC,distribution-independence

• Kolmogorov/Vovk: A sequence is random if it cannot be compressed

Page 16: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Peirce: Abduction and uncertainty

Aristotles induction , generalizingfrom particulars, is considered invalidby strict deductionists.

Peirce made the concept clear, or atleast confused on a higher level.

Abduction is verification by findinga plausible explanation. Key processin scientific progress.

Page 17: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Sherlock Holmes: common sense inference

Techniques used by Sherlock are modeled on Conan Doyle’s professor in medical school, who followed the methodological tradition of Hippocrates and Galen. Abductive reasoning, first spelled out by Peirce, is found in 217 instances in SherlockHolmes adventures - 30 of them in the first novel, ‘A study in Scarlet’.

Page 18: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Thomas Bayes,amateur mathematician

If we have a probability modelof the world we know how to compute probabilities of events.

But is it possible to learn aboutthe world from events we see?

Bayes’ proposal was forgottenbut rediscovered by Laplace.

Page 19: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

An alternative to Bayes’ method - hypothesis testing - is based on

’Cournot’s Bridge’:an event with very small

probability will not happen

Antoine Augustine Cournot (1801--1877)Pioneer in stochastic processes, market theoryand structural post-modernism. Predicted demise of academic system due to discourses of administration and excellence(cf Readings).

Page 20: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Kolmogorov and randomnessAndrei Kolmogorov(1903-1987) is the mathematician best known for shaping probability theory into a modern axiomatized theory. His axioms of probability tells how probability measures are defined, also on infinite and infinite-dimensional event spacesand complex product spaces.

Kolmogorov complexity characterizes a random string by the smallest size of a description of it. Used to explain Vovk/Gammerman scheme of hedged prediction. Also used in MDL (Minimum Description Length) inference.

Page 21: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Normative claim of Bayesianism

• EVERY type of uncertainty should be treated as probability

• This claim is controversial and not universally accepted: Fisher(1922), Cramér, Zadeh, Dempster, Shafer, Walley(1999) …

• Students encounter many approaches to uncertainty management and identify weaknessess in foundational arguments.

Page 22: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Foundations for Bayesian Inference

• Bayes method, first documented methodbased on probability: Plausibility of event depends on observation, Bayes rule:

• Bayes’ rule organizing principle for uncertainty• Parameter and observation spaces can be extremely

complex, priors and likelihoods also.• MCMC current approach -- often but not always

applicable (difficult when posterior has many local maxima separated by low density regions)Better than Numerics??

Page 23: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Showcase application: PET-camera

f (λ |D)∝ f(D |λ) f (λ)

Camera geometry&noise film scene regularity

and also any other camera or imaging device …

Page 24: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

PET camera

D: film, count by detector jX: radioactivity in voxel ia: camera geometry

likelihood

prior

Inference about Y gives posterior,its mean is often a good picture

Page 25: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Sinogram and reconstruction

Tumour

Fruit FlyDrosophila family (Xray)

Page 26: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

• Support transformation of tasks and solutions in a generic fashion

• Integrate different command levels and services in a dynamic organization

• Facilitate consistent situation awareness

Project Aims

Page 27: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

* WIRED on Total Information Awareness WIRED (Dec 2, 2002) article "Total Info System Totally Touchy" discusses the Total Information Awareness system. ~~~ Quote:"People have to move and plan before committing a terrorist act. Ourhypothesis is their planning process has a signature." Jan Walker, Pentagon spokeswoman, in Wired, Dec 2, 2002.

"What's alarming is the danger of false positives based on incorrect data,"

Herb Edelstein, in Wired, Dec 2, 2002.

Page 28: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Combination of evidencef (λ |D)∝ f(D |λ) f (λ)

f (λ |{d1,d2})∝ f(d1 |λ) f(d2 |λ) f(λ)

In Bayes’ method, evidence is likelihood for observation.

Page 29: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Particle filter-general tracking

Page 30: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Chapman Kolmogorov version of Bayes’ rule

f (λt |Dt) ∝ f(dt |λt)∫ f (λt |λt−1) f (λt−1 |Dt−1)dλt−1

Page 31: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Berry and Linoff have eloquently stated their preferences with the often quoted sentence:

"Neural networks are a good choice for most classification problemswhen the results of the model are more important than understandinghow the model works".

“Neural networks typically give the right answer”

Page 32: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 33: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 34: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

1950-1980: The age of rationality. Let us describe the world witha mathematical model and compute the best way to manage it!!This is a large Bayesian Network, a popular statistical model

Page 35: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Ed Jaynes devoted a large part of his career to promoteBayesian inference.

He also championed theuse of Maximum Entropy in physics

Outside physics, he received resistance from people who hadalready invented other methods.Why should statistical mechanics say anything about our daily human world??

Page 36: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Robust Bayes• Priors and likelihoods are convex sets of probability distributions (Berger, de Finetti, Walley,...): imprecise

probability:

• Every member of posterior is a ’parallell combination’ of one member of likelihood and one member of prior.• For decision making: Jaynes recommends to use that member of posterior with maximum entropy (Maxent

estimate).f (λ |D)∝ f(D |λ) f (λ)

F(λ |D) ∝ F(D|λ)F(λ)

Page 37: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

SVM and Kernel method

Based on Vapnik-Chervonenkis learning theory

Separate classes by wide margin hyperplane classifier,or enclose data points between close parallell hyperplanesfor regression

Possibly after non-linear mapping to highdimensional space

Assumption is only point exchangeability

Page 38: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Classify with hyperplanes

Frank Rosenblatt (1928 – 1971)

Pioneering work in classifying byhyperplanes in high-dimensional spaces.

Criticized by Minsky-Papert, sincereal classes are not normallylinearly separable.ANN research taken up again in1980:s, with non-linear mappingsto get improved separation.Predecessor to SVM/kernel methods

Page 39: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Find parallel hyperplanes

ClassificationRed: true separatingplane.Blue: wide marginseparation in sampleClassify by planebetween blue planes

Page 40: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

SVM and Kernel method

Page 41: Stefan Arnborg, KTH stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2010

Vovk/Gammerman Hedged predictions

• Based on Kolmogorov complexity ornon-conformance measure

• In classification, each prediction comes with confidence

• Asymptotically, misclassifications appear independently and with probability 1-confidence.

• Only assumption is exchangeability