1 tmva workshop, 21 jan 2011 andreas hoecker – introduction tmva workshop – introduction andreas...

38
1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland http:// tmva . sf.ne

Upload: jemima-preston

Post on 11-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

1TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

TMVA Workshop – Introduction

Andreas Hoecker (CERN)TMVA Workshop, 21 January 2011, CERN, Switzerland

http://tmva.sf.net

Page 2: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

2TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Goals of the WorkshopIntroduction to multivariate classification and regression with TMVA

• Pedagogical talks on various multivariate methods (morning)

• Talks from users (14:00)

• Tutorial (15:30)

Page 3: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

3TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

TMVA

• ROOT: is the analysis framework used by most (HEP)-physicists

• Idea: rather than just implementing new MVA techniques and making them available in ROOT:

﹣ Have one common platform / interface for all MVA methods

﹣ Have common data pre-processing capabilities

﹣ Train and test all classifiers on same data sample and evaluate consistently

﹣ Provide common analysis (ROOT scripts) and application framework

﹣ Provide access with and without ROOT, through macros, C++ executables or python

Page 4: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

4TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

TMVA

• TMVA started in 2006 on the Sourceforge development platform

• 6 core developers, 21 contributors so far

• TMVA is written in C++, and relies on ROOT functionality

• Since ROOT 5.15 / TMVA v3.7.2 TMVA is part of ROOT, and developed directly in ROOT SVN

﹣ Continue to maintain primary tmva-users mailing list on Sourceforge

﹣ New TMVA versions also published as downloadable tgz files on Sourceforge

﹣ For bug reports, use ROOT Savannah

Page 5: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

5TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Simulated Higgs Event in CMS

Higgs event in an LHC proton–proton collision at high luminosity (together with ~24 other inelastic events)

7 TeVLHC 2010Such events occur only in a tiny fraction

of the proton-proton collisions O(10−10)

Page 6: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

6TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Event Classification in HEP

• Most HEP analyses require discrimination of signal from background:﹣ Event level (Higgs searches, …)

﹣ Cone level (Tau-vs-jet reconstruction, …)

﹣ Track level (particle identification, …)

﹣ Object level (flavour tagging, …)

﹣ Parameter estimation (significance, mass, CP violation in B system, …)

• The multivariate input information used for this has various sources ﹣ Kinematic variables (masses, momenta, decay angles, …)

﹣ Event properties (jet/lepton multiplicity, sum of charges, …)

﹣ Event shape (sphericity, Fox-Wolfram moments, …)

﹣ Detector response (silicon hits, dE/dx, Cherenkov angle, shower profiles, muon hits, …)

• Traditionally few powerful input variables were combined. New methods allow to use up to 100 and more variables w/o loss of classification powere.g. MiniBooNE: NIMA 543 (2005), or D0 single top: Phys.Rev. D78, 012005 (2008)

Page 7: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

7TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Event Classification

• Suppose data sample with two types of events: H0, H1

﹣ We have found discriminating input variables x1, x2, …

﹣ What decision boundary should we use to select events of type H1 ?

Linear boundary? A nonlinear one?Rectangular cuts?

H1

H0

x1

x2 H1

H0

x1

x2 H1

H0

x1

x2

Low variance (stable), high bias methods High variance, small bias methods

Page 8: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

8TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Event Classification

• Suppose data sample with two types of events: H0, H1

﹣ We have found discriminating input variables x1, x2, …

﹣ What decision boundary should we use to select events of type H1 ?

Linear boundary? A nonlinear one?Rectangular cuts?

H1

H0

x1

x2 H1

H0

x1

x2 H1

H0

x1

x2

• How can we decide this in an optimal way ? Let the machine learn it !

Page 9: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

9TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Parameter Regression

• How to estimate a “functional behaviour” from a set of measurements?﹣ Energy deposit in a the calorimeter, distance between overlapping photons, …

﹣ Entry location of a particle in the calorimeter or on a silicon pad, …

x

f(x)

x

f(x)

x

f(x)

Linear function ? A non-linear one ?Constant ?

• Looks trivial? What if we have many input variables?

Page 10: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

10TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Multivariate Event Classification

Page 11: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

11TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Each event, Signal or Background, has “D” measured variables.

D

“feature space”

Find a mapping from D-dimensional input-observable = ”feature” spaceto one dimensional output class labels

Page 12: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

12TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Each event, Signal or Background, has “D” measured variables.

D

“feature space”

y(x)

Most general formy = y(x); x in D x = {x1,….,xD}: input variables

y(x): Rn ® R:

Plotting the resulting y(x) values:

Find a mapping from D-dimensional input-observable = ”feature” spaceto one dimensional output class labels

Page 13: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

13TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Each event, Signal or Background, has “D” measured variables.

D

“feature space”

y(x): Rn ® R:

y(x)

y(x): “test statistic” in D-dimensional space of input variables

Distributions of y(x): PDFS(y) and PDFB(y)

Overlap of PDFS(y) and PDFB(y) affects separation power, purity

> cut: signal= cut: decision boundary< cut: background

y(x):Used to set the selection cut !

y(x) = const: surface defining the decision boundary

Efficiency and purity

y(B) 0, y(S) 1

Page 14: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

14TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Multi-Class Classification

Binary classification: two classes, “signal” and “background”

Signal Background

Page 15: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

15TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Class 1

Class 2Class 3

Class 5

Class 6

Class 4

Multi-Class Classification

Multi-class classification – natural extension for many classifiers

Page 16: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

16TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

P(Class=C|x) (or simply P(C|x)) : probability that the event class is of type C, giventhe measured

observables x = {x1,….,xD} y(x)

P(y | C) P(C)P(Class = C | y) =

P(y)

Prior probability to observe an event of “class C”, i.e., the relative abundance of “signal” versus “background”

Overall probability density to observe the actual measurement y(x), i.e.,

Classes

P(y) = P(y | Class)P(Class)

Probability density distribution according to the measurements x and the given mapping function

Posterior probability

Event Classification

Page 17: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

17TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

P(y | C)P(C)P(Class = C | y) =

P(y)ANDMinimum error in misclassification if C chosen such that it has maximum P(C|y)

To select S(ignal) over B(ackground), place decision on:

P(S | y) P(y | S) P(S)c

P(B | y) P(y | B) P(B)

Likelihood ratio as discriminating

function y(x)

Prior odds ratio of choosing a signal event(relative probability of signal vs. bkg)

“c” determines efficiency and purity

x = {x1,….,xD}: measured observablesy = y(x)

[ Or any monotonic function of P(S|y) / P(B|y) ]

Posterior odds ratio

Bayes Optimal Classification

Page 18: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

18TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Trying to select signal events:(i.e. try to disprove the null-hypothesis stating it were “only” a background event)

Type-2 error: Fail to identify an event from Class C as such(reject a hypothesis although it would have been true)(fail to reject the null-hypothesis/accept null hypothesis although it is false)

loss of efficiency (in selecting signal events)

Decide to treat an event as “Signal” or “Background”

Signal Back-ground

Signal Type-2 error

Back-ground

Type-1 error

Accept as:Truly is:

Type-1 error: Classify event as Class C even though it is not(accept a hypothesis although it is not true)(reject the null-hypothesis although it would have been the correct one)

loss of purity (in the selection of signal events)

Any Decision Involves a Risk

Significance α: Type-1 error rate:(=p-value): α = background selection “efficiency”

Size β: Type-2 error rate:Power: 1- β = signal selection efficiency

should be small !

should be small !

“A”: region where event is called signal

Page 19: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

19TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Neyman-Peason:

The Likelihood ratio used as “selection criterion” y(x) gives for each selection efficiency the best possible background rejection.

i.e. it maximises the area under the “Receiver Operation Characteristics” (ROC) curve

Varying y(x) > “cut” moves the working point (efficiency and purity) along the ROC curve

• How to choose “cut”? need to know prior probabilities (S, B abundances)

﹣ Measurement of signal cross section: maximum of S/√(S+B) or equiv. √(e·p)﹣ Discovery of a signal : maximum of S/√(B)﹣ Precision measurement: high purity (p)﹣ Trigger selection: high efficiency ( ) e (sometimes high background rejection)

P(x | S)Likelihood Ratio : y(x)

P(x | B)

0 1

1

0

1- e

ba

ckg

r.

esignal

random guessing

good classification

better classification

“limit” in ROC curve

given by likelihood ratio

Type-1 error smallType-2 error large

Type-1 error large Type-2 error small

Neyman-Pearson Lemma

Page 20: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

20TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Unfortunately, the true probability densities functions are typically unknown: Neyman-Pearson’s lemma doesn’t really help us…

Supervised (machine) learning

* Hyperplane in the strict sense goes through the origin. Here is meant an “affine set” to be precise.

Use MC simulation, or more generally: set of known (already classified) “events”

Use these “training” events to:

• Try to estimate the functional form of P(x|C) from which the likelihood ratio can be obtained e.g. D-dimensional histogram, Kernel densitiy estimators, MC-based matrix-element methods, …

• Find a “discrimination function” y(x) and corresponding decision boundary

(i.e. hyperplane* in the “feature space”: y(x) = const) that optimally separates signal from backgrounde.g. Linear Discriminator, Neural Networks, Boosted Decision, …

Realistic Event Classification

Page 21: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

21TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Unfortunately, the true probability densities functions are typically unknown: Neyman-Pearson’s lemma doesn’t really help us…

Supervised (machine) learning

* Hyperplane in the strict sense goes through the origin. Here is meant an “affine set” to be precise.

Use MC simulation, or more generally: set of known (already classified) “events”

Use these “training” events to:

• Try to estimate the functional form of P(x|C) from which the likelihood ratio can be obtained e.g. D-dimensional histogram, Kernel densitiy estimators, MC-based matrix-element methods, …

• Find a “discrimination function” y(x) and corresponding decision boundary (i.e. hyperplane* in the “feature space”: y(x) = const) that optimally separates signal from backgrounde.g. Linear Discriminator, Neural Networks, …

Realistic Event Classification

Of course, there is no magic in here. We still need to:

﹣ Choose the discriminating variables

﹣ Choose the class of models (linear, non-linear, flexible or less flexible)

﹣ Tune the “learning parameters” bias vs. variance trade off

﹣ Check the generalisation properties (avoid overtraining)

﹣ Consider trade off between statistical and systematic uncertainties

Page 22: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

22TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Multivariate Analysis Methods in TMVA Examples for classifiers and regression methods– Rectangular cut optimisation

– Projective and multidimensional likelihood estimator

– k-Nearest Neighbor algorithm

– Fisher and H-Matrix discriminants

– Function discriminants

– Artificial neural networks

– Boosted decision trees

– RuleFit

– Support Vector Machine

Preprocessing methods:

– Decorrelation, Principal Value Decomposition, Gaussianisation

Examples for synthesis methods:

– Boosting, Categorisation (valid for all methods, and their combinations)

Joerg

Jan

Helge

Eckhard

Peter

Page 23: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

23TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

We have a Users Guide !

Available on http://tmva.sf.net

TMVA Users Guide142pp, incl. code examples

arXiv physics/0703039

Page 24: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

24TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

T MVA tutorial

U s i n g T M V A

A typical TMVA analysis consists of two main steps:

1. Training phase: training, testing and evaluation of classifiers using data samples with known signal and background composition

2. Application phase: using selected trained classifiers to classify unknown data samples

Page 26: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

26TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Can be ROOT scripts, C++ executables or python scripts (via PyROOT), or any other high-level language that interfaces with ROOT

T MVA tutorial

Code Flow for Training and Application

Page 27: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

27TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

T MVA tutorial

Strong Methods need Strong Evaluation

A lot of evaluation information is already provided in the logging output of the training

Simple GUIs provide access to evaluation plots and tools for single and multi-class classification and regression

Page 33: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

33TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

﹣ A detector element may only exist in the barrel, but not in the endcaps

﹣ A variable may have different distributions in barrel, overlap, endcap regions

Ignoring this dependence may reduce performance and creates correlations between variables, which must be learned by the classifier

﹣ Classifiers such as the projective likelihood, which do not account for correlations, significantly loose performance if the sub-populations are not separated

Categorisation means splitting the data sample into categories defining disjoint data samples with the following (idealised) properties:

﹣ Events belonging to the same category are statistically indistinguishable

﹣ Events belonging to different categories have different properties

Multivariate training samples often have distinct sub-populations of data

Page 34: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

34TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

MethodCategory is Your Friend !

It provides fully transparent support for categorisation of your input data,

applicable to any TMVA method

See Peter’s talk later today

Page 35: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

35TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

The HEP community has already a lot of experience with MVA classification

In particular for rare events searches, O(all) mature experiments use it

They increase the experiment’s sensitivity, and may reduce systematic errors due to a smaller background component

MVAs are not black boxes, but (possibly involved) Rn ® R mapping functions

Should acquire more experience in HEP with multivariate regression

Our calibration schemes are often still quite simple: linear or simple functions, look-up-table based, mostly depending on few variables (e.g., η, pT)

Non-linear multivariate regression may significantly boost calibration and corrections applied, in particular if it is possible to train from data

Available since TMVA 4 for: LD, FDA, k-NN, PDERS, PDEFoam, MLP, BDT

Page 36: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

36TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

S t a t u s & O u t l o o k

• 2-class classification supported by all methods

• Multi-class classification supported by MLP (NN), BDTG, FDA

• Single-target regression: PDE-RS, PDE-Foam, k-NN, LD, FDA, MLP, BDT

• Multi-target regression: PDE-Foam, k-NN, MLP

• All methods support categorised classification and generalised boosting

• Priority on to-do list for future releases﹣ Automatic self-optimisation of parameter settings for all methods

﹣ For this: full support of cross validation

﹣ Increase support of multi-dimensional classification and regression

﹣ Individual improvements of methods (see, eg, MLP talk by Jan)

﹣ Introduction of unsupervised learning

Page 37: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

37TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

• Several similar data mining efforts with rising importance in most fields

of science and industry

• Important for HEP:﹣ Parallelised MVA training and evaluation pioneered by Cornelius package (BABAR)

﹣ Also frequently used: StatPatternRecognition package by I. Narsky (Cal Tech)

﹣ Many implementations of individual classifiers exist

• TMVA is open source software

• Use & redistribution of source permitted according to terms in BSD license

Contributed to TMVA have: Andreas Hoecker (CERN, Switzerland), Jörg Stelzer (CERN, Switzerland), Peter Speckmayer (CERN, Switzerland), Jan Therhaag (Universität Bonn, Germany), Eckhard von Toerne (Universität Bonn, Germany), Helge Voss (MPI für Kernphysik Heidelberg, Germany), Moritz Backes (Geneva University, Switzerland), Tancredi Carli (CERN, Switzerland), Asen Christov (Universität Freiburg, Germany), Or Cohen (CERN, Switzerland and Weizmann, Israel), Krzysztof Danielowski (IFJ and AGH/UJ, Krakow, Poland), Dominik Dannheim (CERN, Switzerland), Sophie Henrot-Versille (LAL Orsay, France), Matthew Jachowski (Stanford University, USA), Kamil Kraszewski (IFJ and AGH/UJ, Krakow, Poland), Attila Krasznahorkay Jr. (CERN, Switzerland, and Manchester U., UK), Maciej Kruk (IFJ and AGH/UJ, Krakow, Poland), Yair Mahalalel (Tel Aviv University, Israel), Rustem Ospanov (University of Texas, USA), Xavier Prudent (LAPP Annecy, France), Arnaud Robert (LPNHE Paris, France), Doug Schouten (S. Fraser U., Canada), Fredrik Tegenfeldt (Iowa University, USA, until Aug 2007), Alexander Voigt (CERN, Switzerland), Kai Voss (University of Victoria, Canada), Marcin Wolter (IFJ PAN Krakow, Poland), Andrzej Zemla (IFJ PAN Krakow, Poland).

Copyrights & Credits

Page 38: 1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland

38TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction

Software packages for Multivariate Data Analysis/Classification:

Individual classifier software: e.g. “JETNET” C.Peterson, T. Rognvaldsson, L.Loennblad,many, many other packages!

“All inclusive” packagesStatPatternRecognition: I.Narsky, arXiv: physics/0507143http://www.hep.caltech.edu/~narsky/spr.html

TMVA: Hoecker, Speckmayer, Stelzer, Therhaag, von Toerne, Voss, arXiv: physics/0703039http://tmva.sf.net or every ROOT distribution

WEKA: http://www.cs.waikato.ac.nz/ml/weka/

Huge data analysis library available in “R”: http://www.r-project.org/

Literature:

T. Hastie, R. Tibshirani, J. Friedman, “The Elements of Statistical Learning”, Springer 2001C.M. Bishop, “Pattern Recognition and Machine Learning”, Springer 2006

Conferences: PHYSTAT, ACAT,…

A few References