e. santovetti lesson 2 monte carlo methods statistical...
TRANSCRIPT
![Page 1: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/1.jpg)
1
E. SantovettiE. Santovetti
lesson 2lesson 2
Monte Carlo methodsMonte Carlo methodsStatistical testStatistical test
![Page 2: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/2.jpg)
2
IntroductionIntroduction
Monte Carlo methods try to simulate the physical effect and the detector response in order to better understand the detector performance (geometrical acceptance, efficiency, dead time, etc...) to the desired signal.
A crucial issue to face when we want to simulate any process is to provide a robust and reliable random number generator
The usual steps are:
1) Generate sequence r1, r2 …rm, uniform in [0, 1]
2) Use this to produe another sequence x1, x2,...,xm, distribuited according to some pdf f(x), in wich we are interested
3) Use the x values to estimate some property of f (x), e.g., fraction of x values with a < x < b.
Monte Carlo (MC) generated values = simulated data
![Page 3: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/3.jpg)
3
Random number generatorRandom number generator
Goal: generate uniformly distributed values in [0, 1].
Difficult coin for e.g. 32 bit number... (too tiring).
→ ‘random number generator’ = computer algorithm to generate r1, r2, ..., rn.
Example: multiplicative linear congruential generator (MLCG)
This rule produces a sequence of numbers n0, n1, ...
![Page 4: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/4.jpg)
4
Random number generator (2)Random number generator (2)
The sequence is (unfortunately) periodic!
Example (see Brandt Ch 4): a = 3, m = 7, n0 = 1.
sequence repeats
Choose a, m to obtain long period (maximum = m – 1);
m usually close to the largest integer that can represented in the computer.
Only use a subset of a single period of the sequence.
![Page 5: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/5.jpg)
5
Random number generator (3)Random number generator (3)
Is the sequence really a random sequence?
a and m have be chosen so that ri pass few tests of randomness:
Uniform distribution in 0-1;All values independent (no correlation between pairs)
Far better generators available, e.g. TRandom3, based on Mersenne twister algorithm, period = 219937-1=4·106000 (a “Mersenne prime”).See Matsumoto, M.; Nishimura, T. (1998). "Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator". ACM Transactions on Modeling and Computer Simulation 8 (1): 3–30
Commun. ACM 31 (1988) 742 A=40692M = 2147483399suggested
Sequence periodic:pseudorandom sequence
![Page 6: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/6.jpg)
6
Trasformation methodTrasformation methodGiven the sequence
find the trasformation to obtain
Require that
Then we have
We need the cumulative inverse functionNot always possible
![Page 7: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/7.jpg)
7
Trasformation method exampleTrasformation method exampleConsider the exponentioal pdf
![Page 8: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/8.jpg)
8
The acceptance rejection methodThe acceptance rejection method
Consider a pdf case in which it is not trivial to find the inverse of cumulative
A random sequence following the pdf f(x) can be obtained in the following way
Enclose the pdf in a box
Generate a random variable x uniform in the interval [xmin, xmax], x=xmin+(xmax-xmin)r, r uniform in [0,1]
Generate a second independent uniform random number u in the interval [0, fmax]
If u<f(x), x is accepted, if not, rejct and repea
Efficiency much lower than direct extraction: <0.5 depending on the shape
![Page 9: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/9.jpg)
9
Example of acceptance rejection methodExample of acceptance rejection method
If the dot is below the curve,use it in the histogram
![Page 10: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/10.jpg)
10
The acceptance rejection methodThe acceptance rejection method
The efficiency of the method could be very low if the curve has long tails (low values of probability for large x regions, e.g. Gaussian)
We can select different regions in wich the maximum is very different. For each region we apply the acceptance metod separately considering the local maximum.
Of course the geration is done in each region as meny times as higher is the local maximum
![Page 11: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/11.jpg)
11
Random generator softwareRandom generator software
![Page 12: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/12.jpg)
12
Monte Carlo event generatorMonte Carlo event generator
Consider a very simple event
Generate cosθ and φ
Less simple events:
For this processes dedicated software (PYTHIA, HERWIG, ISAJET, …) are used
Output = ‘events’, i.e., for each event we get a list of generated particles and their momentum vectors, types, etc
![Page 13: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/13.jpg)
13
Monte Carlo event generatorMonte Carlo event generator
PYTHIA generator:p p → gluino gluino
First, the elementary particles
...then, the oservable particles
![Page 14: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/14.jpg)
14
Monte Carlo detector simulationMonte Carlo detector simulationTakes as input the particle list and momenta from generator.
Simulates detector response:multiple Coulomb scattering (generate scattering angle),particle decays (generate lifetime),ionization energy loss (generate Δ),electromagnetic, hadronic showers,production of signals, electronics response, …
Output = simulated raw data → input to reconstruction software: track finding, fitting, etc.
Predict what you should see at ‘detector level’ given a certain hypothesis for ‘generator level’. Compare with the real data in particular channel to tune and calibrate detector simulated response
Estimate ‘efficiencies’ = #events found / # events generated.Programming package: GEANT4
![Page 15: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/15.jpg)
15
Statistical test:Statistical test:general conceptsgeneral concepts
![Page 16: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/16.jpg)
16
HypothesesHypotheses
A hypothesis H is some theory or model that specifies the probability for the data, i.e., the outcome of the observation.
x could be uni-multivariate, continuous or discrete.
x could represent e.g. observation of a single particle, a single event, or an entire “experiment”.
Possible values of x form the sample space S (or “data space”).
Simple (or “point”) hypothesis: f (x|H) completely specified.
Composite hypothesis: H contains unspecified parameter(s).
The probability for x given H is also called the likelihood of the hypothesis, written L(x|H).
![Page 17: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/17.jpg)
17
Definition of a testDefinition of a test
The goal is to make some statement, based on the observed data x, on the validity of the possible hypotheses.
Consider e.g. a simple hypothesis H1 and alternative H0.
A test of H1 is defined by specifying a critical region W of the data space such that there is no more than some (small) probability α, assuming H1 is correct, to observe the data there, i.e.,
If x is observed in the critical region, reject H1.
α is called the size or significance level of the test.
Critical region also called “rejection” region; complement is acceptance region.
![Page 18: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/18.jpg)
18
Definition of a test (2)Definition of a test (2)
In general there are a lot of possible critical regions that give approximately the same significance α, (unless the Hypothesis H makes very precise predictions on the event x)
So the choice of the critical region for a test of H1 needs to take into account the alternative hypothesis H0. It is more difficult to give an absolute evaluation of a single hypothesis than to compare two hypotheses
In other wards, we place the critical region where there is a low probability to be found if H1 is true, but high if H0 is true:
xCritical region W
![Page 19: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/19.jpg)
19
Rejecting a hypothesisRejecting a hypothesis
Note that rejecting H1 is not necessarily equivalent to the statement that we believe it is false and H0 true.
In frequentist statistics only associate probability with outcomes of repeatable observations (the data).
In Bayesian statistics, probability of the hypothesis (degree of belief) would be found using Bayes’ theorem:
which depends on the prior probability P(H).
What makes a frequentist test useful is that we can compute the probability to accept/reject a hypothesis assuming that it is true, or assuming some alternative is true (there are no a priori probabilities, all hypotheses are equivalent at the beginning).
![Page 20: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/20.jpg)
20
Errors of type-I or type-IIErrors of type-I or type-II
The first mistake that we can make is to reject a hypothesis that is the right one. We reject H1 despite it is the right hypothesis. This is a type-I error and the maximum probability to occur is the size of the test
On the other hand we can accepts H1 as true, while it is false and the hypothesis H0 is true. This is called type-II error and occurs with probability
1-β is called the power of the test with respect to the alternative H0
xCritical region W
αβ
![Page 21: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/21.jpg)
21
Example 1Example 1An example is the selection of a particle decay by the reconstruction of the daughter tracks. To select this events we can see their invariant mass
As we can easily see from MC simulation, signal and background (combinatoric) have a completely different distribution
The signal is a Gaussian (or a Crystal Ball function) while the background is a negative exponential (almost linear).
Going enough far from the peak, we can easily find a rejection region with α very low
Since the background is almost flat, the β1 probability will be quite large.
In this case, the power of this test against the background contamination is limited.
Acceptance region
background contamination in the signal region
![Page 22: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/22.jpg)
22
Example 2Example 2
An other example is the selection of Z → τ τ. To detect this channel we use the τ→µν decay.
A possible background for this channel is the direct decay Z → µ µ
To reject it, the invariant mass has to be considered far from the Z mass
We can also consider the variable:
This variable peaks at zero for the background Z → µµ signal
background
![Page 23: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/23.jpg)
23
Event example in HEPEvent example in HEP
High pTmuons
High pTjets of hadrons
Simulated SUSY event
Missing transverse energy
![Page 24: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/24.jpg)
24
Event example in HEPEvent example in HEPSimulated minimum bias event
This event from Standard Model t-tbar production also has high pT jets and muons, and some missing transverseenergy. → can easily mimic a SUSY event.
![Page 25: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/25.jpg)
25
Statistical tests in a particle physics contextStatistical tests in a particle physics context
For each event the results is a collection of numbers:Momentum (for each particle reconstructed)Particle mass (for each particle reconstructed and recognized)Jets energy and pTMissing energy
x follows some n-dimensional joint pdf, which depends on the type of event produced, e.g.
background SUSY signal
We can define for each event two basic hypotheses: H0 (background), H1 (signal) and build the conditional probability of such event given that it is a background or signal event
![Page 26: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/26.jpg)
26
Selecting eventsSelecting events
Suppose we have a data sample with two kinds of events, corresponding to hypotheses H0 (background) and H1 (signal) and we want to select those of type H1.
Each event is a point in space (in the present example 2-dimensional space). What ‘decision boundary’ should we use to accept/reject events as belonging to event types H0 or H1?
A first selection can be done with simple linear cuts:
Accepted region
![Page 27: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/27.jpg)
27
Other selection cutsOther selection cuts
We could also decided to select signal events in other ways.
Accepted region
Accepted region
linear non linear
How to find the best selection way?
![Page 28: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/28.jpg)
28
Test statisticTest statistic
A generic way to proceed is to introduce a scalar function of the event variables.
t is called test statistic.
Then, let us consider the pdf of the t variable
Selection can be done cutting in the variable t. The cut value tcut divides the space in two regions:
t < tcut: signal, event accepted
t > tcut: background, event rejected
signal background
In general, we do not have explicit pdf form but we have the Monte Carlo simulation that give us the distributions
xCritical region W
αβ
![Page 29: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/29.jpg)
29
EfficiencyEfficiency
The probability to reject the background hypothesis (then accept the signal hypothesis) for a background event (background contamination) is:
signal
background
The probability to accept the signal hypothesis for a signal event (signal efficiency) is:
α and β are the tails beyond the cut of the background and signal distribution
xCritical region W
αβ
![Page 30: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/30.jpg)
30
Purity of event selectionPurity of event selection
We want now to quantify how ”pure” is our signal sample or what is the probability of the event to be a signal, given that it has been accepted (purity).
Suppose also that the fractions of background and signal events in our sample are πb and πs (prior probabilities)
We can use the Bayes theorem:
So the purity depends on the prior probabilities as well as on the signal and background efficiencies.
To maximize it:
or better:
![Page 31: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/31.jpg)
31
Building a test statisticBuilding a test statistic
How can we build a good (optimal) critical region, starting from a test function t?
We can invoke the Neyman-Pearson lemma that states:
To get the highest power for a given significance level in a test ofH0, (background) versus H1, (signal) the critical region should have
inside the region, and ≤ c outside, where c is a constant which determines the power.
Equivalently optimal scalar test statistic is:
Bayes factor, or Likelihoods ratio
![Page 32: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/32.jpg)
32
Building a test statistic (2)Building a test statistic (2)
The problem is that in many cases we do not have the analytical expression of the pdf to evaluate P(x|H1) and P(x|H0)
Instead we may have Monte Carlo models for signal and background processes, so we can produce simulated data, and enter each event into an n-dimensional histogram (n variables considered for each event). Use e.g. M bins for each of the n dimensions, total of Mn cells.
But n is potentially large, → prohibitively large number of cells to populate with Monte Carlo data.
Compromise: make Ansatz* for form of test statistic t(x) with fewer parameters; determine them (e.g. using MC) to give best discrimination between signal and background.
Multivariate methods: Fisher discriminant, Neural networks, Kernel density methods, Support Vector Machines, Decision trees (Boosting, Bagging)
*An ansatz is an assumed form for a mathematical statement that is not based on any underlying theory or principle
![Page 33: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/33.jpg)
33
Linear test statisticLinear test statistic
Ansatz: linear dependence
Choose the parameters a1, ..., an so that the pdfs have maximum ‘separation’: large distance between mean values and small widths
Fischer: maximize
In physics and mathematics, an ansatz is an educated guess that is verified later by its results
![Page 34: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/34.jpg)
34
Linear test statistic (2)Linear test statistic (2)
We have:
In terms of mean and variance of t:
![Page 35: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/35.jpg)
35
Linear test statistic (3)Linear test statistic (3)The numerator of J(a) is
We have to maximize
and the denominator is
gives the Fisher’s linear discriminant function
linear decision boundary
![Page 36: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/36.jpg)
36
Fisher discriminant for Gaussian dataFisher discriminant for Gaussian dataSuppose f(x|Hk) is multivariate Gaussian with
The Fischer discriminant (plus an offset) is:
and the Likelihood ratio is
That is, t µ log(r) + const., (monotonic) so for this case, the Fisher discriminant is equivalent to using the likelihood ratio, and thus gives maximum purity for a given efficiency.
For non-Gaussian data this no longer holds, but linear discriminant function may be simplest practical solution.
Often try to transform data so as to better approximate Gaussian before constructing Fisher discriminant
![Page 37: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/37.jpg)
37
Fisher discriminant for Gaussian data (2)Fisher discriminant for Gaussian data (2)In case of a multivariate Gaussian with common covariance matrix, it is interesting to evaluate the ”a posteriori” probability af a certain hypothesis
and for a particular choice of the offset a0, we can have:
which is the logistic sigmoid function
![Page 38: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/38.jpg)
38
Linear decision boundariesLinear decision boundaries
A linear decision boundary is optimal only when both the classes follow a multivariate Gaussian with same covariance matrix but different means
For other cases the linear boundaris can be cumpletely useless
![Page 39: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/39.jpg)
39
Non linear transformationsNon linear transformationsWe can try to find a non linear transformation of variables such that in the new variables a linear boundary is effective
Non linear test statistic is in many cases mandatory
Accepted region
non linear
Multivariate statistical methods are a Big Industry:
● Neural Networks,● Boosted decision tree,● Kernel density methods● ….
![Page 40: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/40.jpg)
40
Decision treeDecision treeBoosting algorithms can be applied to any classifier. Here they are applied to decision trees.
S means signal, B means background, terminal nodes called leaves are shown in boxes. The key issue is to define a criterion that describes the goodness of separation between signal and background in the tree split.
Assume the events are weighted with each event having weight Wi. Define the purity of the sample in a node by
Note that P(1 − P) is 0 if the sample is pure signal or pure background. For a given node let (n = number of events in the node)
The criterion chosen is to minimize
To determine the increase in quality when a node is split into two nodes, one maximizes
At the end, if a leaf has purity greater than 1/2 (or whatever is set), then it is called a signal leaf, otherwise, a background leaf. Events are classified signal (have score of 1) if they land on a signal leaf and background (have score of -1) if they land on a background leaf.
The resulting tree is a decision tree.
![Page 41: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/41.jpg)
41
Boosted decision tree (BDT)Boosted decision tree (BDT)
Decision trees have been available for some time. They are known to be powerful but unstable, i.e., a small change in the training sample can produce a large change in the tree and the results.
Combining many decision trees to make a “majority vote” can improve the stability somewhat In the boosted decision tree method, the weights of misclassified events are boosted for succeeding trees.
If there are N total events in the sample, the weight of each event is initially taken as 1/N. Suppose that there are M trees and m is the index of an individual tree
There are several commonly used algorithms for boosting the weights of the misclassified events in the training sample. The boosting performance is quite different using various ways to update the event weights
I = 1 if event is misclassifiedI = 0 if event is right classified
![Page 42: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/42.jpg)
42
Adaptive boost (Ada Boost)Adaptive boost (Ada Boost)
![Page 43: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/43.jpg)
43
BDT exampleBDT exampleApply this method to select the decay
In order to discriminate this channel from the minimum bias event and combinatorial background, we identify a certain number of kinematic variables.
![Page 44: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/44.jpg)
44
BDT example (2)BDT example (2)Other variables
![Page 45: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/45.jpg)
45
BDT exampleBDT example
We train the BDT with data (Monte Carlo simulated) and background.Once the BDT is trained (and fixed) we apply this cut to the real data
![Page 46: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/46.jpg)
Boosted decision treesin practice
Yann Coadou
CPPM Marseille
School of Statistics SOS2012, Autrans1 June 2012
![Page 47: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/47.jpg)
Outline
1 BDT performanceOvertraining?Clues to boosting performance
2 Concrete examplesFirst is bestXOR problemCircular correlationMany small trees or fewer largetrees?
3 BDTs in real life (. . . or at leastin real physics cases. . . )
Single top search at D0More applications in HEP
4 Software and references
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 2/37
![Page 48: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/48.jpg)
Before we go on...
!!! VERY IMPORTANT !!!
Understand your inputs wellbefore you start playing with multivariate techniques
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 3/37
![Page 49: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/49.jpg)
Training and generalisation error
Clear overtraining, but still better performance after boosting
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 4/37
![Page 50: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/50.jpg)
Cross section significance
More relevant than testing error
Reaches plateau
Afterwards, boosting does not hurt (just wasted CPU)
Applicable to any other figure of merit of interest for your use caseYann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 5/37
![Page 51: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/51.jpg)
Clues to boosting performance
First tree is best, others are minor corrections
Specialised trees do not perform well on most events ⇒ decreasingtree weight and increasing misclassification rate
Last tree is not better evolution of first tree, but rather a pretty badDT that only does a good job on few cases that the other treescouldn’t get right
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 6/37
![Page 52: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/52.jpg)
Concrete examples
1 BDT performanceOvertraining?Clues to boosting performance
2 Concrete examplesFirst is bestXOR problemCircular correlationMany small trees or fewer large trees?
3 BDTs in real life (. . . or at least in real physics cases. . . )Single top search at D0More applications in HEP
4 Software and references
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 7/37
![Page 53: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/53.jpg)
Concrete example
Using TMVA and some code modified from G. Cowan’s CERNacademic lectures (June 2008)
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 8/37
![Page 54: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/54.jpg)
Concrete example
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 9/37
![Page 55: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/55.jpg)
Concrete example
Specialised trees
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 10/37
![Page 56: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/56.jpg)
Concrete example
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 11/37
![Page 57: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/57.jpg)
Concrete example: XOR
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 12/37
![Page 58: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/58.jpg)
Concrete example: XOR
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 12/37
![Page 59: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/59.jpg)
Concrete example: XOR with 100 events
Small statistics
Single tree or Fischerdiscriminant not so good
BDT very good: highperformance discriminant fromcombination of weak classifiers
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 13/37
![Page 60: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/60.jpg)
Circular correlation
Using TMVA and create circ macro from$ROOTSYS/tmva/test/createData.C to generate dataset
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 14/37
![Page 61: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/61.jpg)
Circular correlation
Boosting longer
Compare performance of Fisher discriminant, single DT and BDTwith more and more trees (5 to 400)
All other parameters at TMVA default (would be 400 trees)
Fisher bad (expected)
Single (small) DT: notso good
More trees ⇒ improveperformance untilsaturation
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 15/37
![Page 62: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/62.jpg)
Circular correlation
Decision contours
Fisher bad (expected)
Note: max tree depth = 3
Single (small) DT: not sogood. Note: a larger treewould solve this problem
More trees ⇒ improveperformance (less step-like,closer to optimalseparation) until saturation
Largest BDTs: wiggle alittle around the contour⇒ picked up features oftraining sample, that is,overtraining
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 16/37
![Page 63: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/63.jpg)
Circular correlationTraining/testing output
Better shape with more trees: quasi-continuous
Overtraining because of disagreement between training and testing?Let’s see
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 17/37
![Page 64: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/64.jpg)
Circular correlationPerformance in optimal significance
Best significance actually obtained with last BDT, 400 trees!
But to be fair, equivalent performance with 10 trees already
Less “stepped” output desirable? ⇒ maybe 50 is reasonableYann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 18/37
![Page 65: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/65.jpg)
Circular correlation
Control plots
Boosting weight decreases fast and stabilises
First trees have small error fractions, then increases towards 0.5(random guess)
⇒ confirms that best trees are first ones, others are small corrections
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 19/37
![Page 66: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/66.jpg)
Circular correlation
Separation criterion for node splitting
Compare performance of Gini, entropy, misclassification error, s√s+b
All other parameters at TMVA default
Very similar performance (evenzooming on corner)
Small degradation (in thisparticular case) for s√
s+b: only
criterion that doesn’t respectgood properties of impuritymeasure (see yesterday:maximal for equal mix of signaland bkg, symmetric in psig andpbkg , minimal for node witheither signal only or bkg only,strictly concave)
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 20/37
![Page 67: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/67.jpg)
Circular correlation
Performance in optimal significance
Confirms previous page: very similar performance, worse for BDToptimised with significance!
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 21/37
![Page 68: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/68.jpg)
Many small trees or fewer large trees?
Using same create circ macro but generating larger dataset toavoid stats limitations
20 or 400 trees; minimum leaf size: 10 or 500 events
Maximum depth (max number of cuts to reach leaf): 3 or 20
Overall: very comparable performance. Depends on use case.
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 22/37
![Page 69: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/69.jpg)
BDTs in real life
1 BDT performanceOvertraining?Clues to boosting performance
2 Concrete examplesFirst is bestXOR problemCircular correlationMany small trees or fewer large trees?
3 BDTs in real life (. . . or at least in real physics cases. . . )Single top search at D0More applications in HEP
4 Software and references
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 23/37
![Page 70: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/70.jpg)
Single top production evidence at D0 (2006)
Three multivariate techniques:BDT, Matrix Elements, BNN
Most sensitive: BDT
σs+t = 4.9± 1.4 pbp-value = 0.035% (3.4σ)
SM compatibility: 11% (1.3σ)
σs = 1.0± 0.9 pbσt = 4.2+1.8
−1.4 pb
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 24/37
![Page 71: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/71.jpg)
Decision trees — 49 input variables
Object Kinematics Event KinematicspT (jet1) Aplanarity(alljets,W )pT (jet2) M(W ,best1) (“best” top mass)pT (jet3) M(W ,tag1) (“b-tagged” top mass)pT (jet4) HT (alljets)pT (best1) HT (alljets−best1)pT (notbest1) HT (alljets−tag1)pT (notbest2) HT (alljets,W )pT (tag1) HT (jet1,jet2)pT (untag1) HT (jet1,jet2,W )pT (untag2) M(alljets)
M(alljets−best1)Angular Correlations M(alljets−tag1)
∆R(jet1,jet2) M(jet1,jet2)cos(best1,lepton)besttop M(jet1,jet2,W )cos(best1,notbest1)besttop MT (jet1,jet2)cos(tag1,alljets)alljets MT (W )cos(tag1,lepton)btaggedtop Missing ETcos(jet1,alljets)alljets pT (alljets−best1)cos(jet1,lepton)btaggedtop pT (alljets−tag1)cos(jet2,alljets)alljets pT (jet1,jet2)cos(jet2,lepton)btaggedtop Q(lepton)×η(untag1)
cos(lepton,Q(lepton)×z)besttop
√s
cos(leptonbesttop,besttopCMframe) Sphericity(alljets,W )cos(leptonbtaggedtop,btaggedtopCMframe)cos(notbest,alljets)alljetscos(notbest,lepton)besttopcos(untag1,alljets)alljetscos(untag1,lepton)btaggedtop
Adding variablesdid not degradeperformance
Tested shorterlists, lost somesensitivity
Same list used forall channels
Best theoreticalvariable:HT (alljets,W ).But detector notperfect ⇒ capturethe essence fromseveral variationsusually helps“dumb” MVA
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 25/37
![Page 72: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/72.jpg)
Decision trees — 49 input variables
Object Kinematics Event KinematicspT (jet1) Aplanarity(alljets,W )pT (jet2) M(W ,best1) (“best” top mass)pT (jet3) M(W ,tag1) (“b-tagged” top mass)pT (jet4) HT (alljets)pT (best1) HT (alljets−best1)pT (notbest1) HT (alljets−tag1)pT (notbest2) HT (alljets,W )pT (tag1) HT (jet1,jet2)pT (untag1) HT (jet1,jet2,W )pT (untag2) M(alljets)
M(alljets−best1)Angular Correlations M(alljets−tag1)
∆R(jet1,jet2) M(jet1,jet2)cos(best1,lepton)besttop M(jet1,jet2,W )cos(best1,notbest1)besttop MT (jet1,jet2)cos(tag1,alljets)alljets MT (W )cos(tag1,lepton)btaggedtop Missing ETcos(jet1,alljets)alljets pT (alljets−best1)cos(jet1,lepton)btaggedtop pT (alljets−tag1)cos(jet2,alljets)alljets pT (jet1,jet2)cos(jet2,lepton)btaggedtop Q(lepton)×η(untag1)
cos(lepton,Q(lepton)×z)besttop
√s
cos(leptonbesttop,besttopCMframe) Sphericity(alljets,W )cos(leptonbtaggedtop,btaggedtopCMframe)cos(notbest,alljets)alljetscos(notbest,lepton)besttopcos(untag1,alljets)alljetscos(untag1,lepton)btaggedtop
Adding variablesdid not degradeperformance
Tested shorterlists, lost somesensitivity
Same list used forall channels
Best theoreticalvariable:HT (alljets,W ).But detector notperfect ⇒ capturethe essence fromseveral variationsusually helps“dumb” MVA
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 25/37
![Page 73: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/73.jpg)
Decision tree parameters
BDT choices
1/3 of MC for training
AdaBoost parameter β = 0.2
20 boosting cycles
Signal leaf if purity > 0.5
Minimum leaf size = 100 events
Same total weight to signal andbackground to start
Goodness of split - Gini factor
Analysis strategy
Train 36 separate trees:
3 signals (s,t,s + t)2 leptons (e,µ)3 jet multiplicities (2,3,4 jets)2 b-tag multiplicities (1,2 tags)
For each signal train against the sum of backgrounds
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 26/37
![Page 74: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/74.jpg)
Analysis validation
Ensemble testing
Test the whole machinery with many sets of pseudo-data
Like running D0 experiment 1000s of times
Generated ensembles with different signal contents (no signal, SM,other cross sections, higher luminosity)
Ensemble generation
Pool of weighted signal + background events
Fluctuate relative and total yields in proportion to systematic errors,reproducing correlations
Randomly sample from a Poisson distribution about the total yield tosimulate statistical fluctuations
Generate pseudo-data set, pass through full analysis chain (includingsystematic uncertainties)
Achieved linear response to varying input crosssections and negligible bias
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 27/37
![Page 75: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/75.jpg)
Cross-check samples
Validate method on data in no-signal region
“W+jets”: = 2 jets,HT (lepton,/ET ,alljets) <175 GeV
“ttbar”: = 4 jets,HT (lepton,/ET ,alljets) >300 GeV
Good agreement
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 28/37
![Page 76: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/76.jpg)
Boosted decision tree event characteristics
DT < 0.3 DT > 0.55 DT > 0.65
High BDT region = shows masses of real t and W ⇒ expectedLow BDT region = background-like ⇒ expected
Above doesn’t tell analysis is ok, but not seeing this could be a sign of aproblem
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 29/37
![Page 77: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/77.jpg)
Boosted decision tree event characteristics
DT < 0.3 DT > 0.55 DT > 0.65
High BDT region = shows masses of real t and W ⇒ expectedLow BDT region = background-like ⇒ expectedAbove doesn’t tell analysis is ok, but not seeing this could be a sign of aproblem
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 29/37
![Page 78: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/78.jpg)
Comparison for D0 single top evidence
ayesian NN, ME
Cannot know a priori which methodwill work best
⇒ Need to experiment with differenttechniques
Power curve
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 30/37
![Page 79: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/79.jpg)
Search for single top in tau+jets at D0 (2010)
Tau ID BDT and single top search BDT
4% sensitivity gainover e + µ analysis
PLB690:5,2010
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 31/37
![Page 80: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/80.jpg)
Recent results in HEP with BDT
ATLAS tau identification
Now used bothoffline and online
Systematics:propagate variousdetector/theoryeffects to BDToutput andmeasure variation
Hot from the press: ATLAS Wt production evidence (since Monday)
arXiv:1205.5764
BDT output used in final fit tomeasure cross section
Constraints on systematics fromprofiling
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 32/37
![Page 81: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/81.jpg)
Recent results in HEP with BDT
ATLAS tt→ e/µ + τ+jets production cross section
BDT for tau ID: one to rejectelectrons, one against jets
Fit BDT output to get taucontribution in data
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 33/37
![Page 82: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/82.jpg)
Boosted decision trees in HEP studies
MiniBooNE (e.g. physics/0408124 NIM A543:577-584,physics/0508045 NIM A555:370-385, hep-ex/0704.1500)
D0 single top evidence (PRL98:181802,2007, PRD78:012005,2008)
D0 and CDF single top quark observation (PRL103:092001,2009,PRL103:092002,2009)
D0 tau ID and single top search (PLB690:5,2010)
Fermi gamma ray space telescope (same code as D0)
BaBar (hep-ex/0607112)
ATLAS/CMS: Many other analyses
b-tagging for LHC (physics/0702041)
LHCb: B0(s) → µµ search, selection of B0
s → J/ψφ for φsmeasurement
More and more underway
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 34/37
![Page 83: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/83.jpg)
Boosted decision tree software
Historical: CART, ID3, C4.5
D0 analysis: C++ custom-made code. Can use entropy/Gini,boosting/bagging/random forests
MiniBoone code at http://www-mhp.physics.lsa.umich.edu/∼roe/
Much better approach
Go for a fully integrated solution
use different multivariate techniques easilyspend your time on understanding your data and model
Examples:
Weka. Written in Java, open source, very good published manual. Notwritten for HEP but very completehttp://www.cs.waikato.ac.nz/ml/weka/StatPatternRecognitionhttp://www.hep.caltech.edu/∼narsky/spr.htmlTMVA (Toolkit for MultiVariate Analysis). Now integrated inROOT, complete manual http://tmva.sourceforge.net
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 35/37
![Page 84: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/84.jpg)
References I
L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification andRegression Trees, Wadsworth, Stamford, 1984
J.R. Quinlan, “Induction of decision trees”, Machine Learning, 1(1):81–106, 1986
J.R. Quinlan, “Simplifying decision trees”, International Journal of Man-MachineStudies, 27(3):221–234, 1987
R.E. Schapire, “The strength of weak learnability”, Machine Learning,5(2):197–227,1990
Y. Freund, “Boosting a weak learning algorithm by majority”, Information andcomputation. 121(2):256–285, 1995
Y. Freund and R.E. Schapire, “Experiments with a New Boosting Algorithm” inMachine Learning: Proceedings of the Thirteenth International Conference, editedby L. Saitta (Morgan Kaufmann, San Fransisco, 1996) p. 148
Y. Freund and R.E. Schapire, “A short introduction to boosting” Journal ofJapanese Society for Artificial Intelligence, 14(5):771-780 (1999)
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 36/37
![Page 85: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2new.pdf · 2013-05-07 · Monte Carlo methods Statistical test. 2](https://reader033.vdocuments.net/reader033/viewer/2022041816/5e5ab44bb4f3c0212017025d/html5/thumbnails/85.jpg)
References II
Y. Freund and R.E. Schapire, “A decision-theoretic generalization of on-linelearning and an application to boosting”, Journal of Computer and SystemSciences, 55(1):119–139, 1997
J.H. Friedman, T. Hastie and R. Tibshirani, “Additive logistic regression: astatistical view of boosting”, The Annals of Statistics, 28(2), 377–386, 2000
L. Breiman, “Bagging Predictors”, Machine Learning, 24 (2), 123–140, 1996
L. Breiman, “Random forests”, Machine Learning, 45 (1), 5–32, 2001
B.P. Roe, H.-J. Yang, J. Zhu, Y. Liu, I. Stancu, and G. McGregor, Nucl. Instrum.Methods Phys. Res., Sect.A 543, 577 (2005); H.-J. Yang, B.P. Roe, and J. Zhu,Nucl. Instrum.Methods Phys. Res., Sect. A 555, 370 (2005)
V. M. Abazov et al. [D0 Collaboration], “Evidence for production of single topquarks,”, Phys. Rev. D78, 012005 (2008)
Yann Coadou (CPPM) — Boosted decision trees (practice) SOS2012, Autrans, 1 June 2012 37/37