jens zimmermann, mpi für physik münchen, acat 2005 zeuthen1 backups jens zimmermann...

Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 1

Backups

Jens [email protected]

Max-Planck-Institut für Physik, München

Forschungszentrum Jülich GmbH


Check Behaviour

determine efficiency by theprinciple of orthogonal triggers

Determine efficiencyin dependence of

important quantities

DVCS dataset


k-Nearest-Neighbour

0 1 2 3 4 5 6 x10# formulas

# s

lide

s

0

1

2

3

4

5

6 x

10

k=1out=

k=2out=

k=3out=

k=4out=

k=5out=

For every evaluation position the distances to eachtraining position need to be determined!

Regularization:Parameter k


Maximum Likelihood / Naive Bayes

0 2 4 6 x10 0 2 4 6 x10

# formulas # slides

31 32

24.05

3

5

2Thp

04.05

1

5

1Expp

out=

Correlation gets lost completely by projection! Regularization:Binning


Linear Discriminant Analysis

slides

formulasout

021.0

012.03.0

(-0.49,0.87)

out=0.0

out=1.0

out=0.5

YAAAγ TT 1)(ˆ

Only one separating hyperplaneis usually not enough!

Can we combine two or more?

nkvxxy kdkdkk 1,1,10 Fisher

dnn

d

xx

xx

A

,1,

,11,1

1

1

AγY

0 1 2 3 4 5 6 x10# formulas

# s

lide

s

0

1

2

3

4

5

6 x

10

1930


Neural Networks

aeaσ

1

1)(

0 1 2 3 4 5 6 x10

0

1

2

3

4

5

6 x

10

-50

+0.1+1.1 -1.1

+20

+0.2

+3.6 +3.6

-1.8

# formulas # slides

sxwσy ii 0

1

Construct NN with two separating hyperplanes:Train NN with two hidden neurons (gradient descent):

N

iii xouty

NE

1

2)(1


NN Training

N

iii xouty

NE

1

2)(1

8 hidden neurons = 8 separating lines

Test-Error

Train-Error

signal

background

Training Epochs


Support Vector Machines

Separating hyperplane with maximum distance to each datapoint: Maximum margin classifier

Found by setting up condition for correct classficationand minimizing which leads to the Lagrangian

1)( bxwy ii

2

w

1)(2

1 2 bxwyαwL iii

Necessary condition for a minimum is

So the output becomes

iii xyαw

bxxyαout iii sgn

Only linear separation?

The mapping to feature spaceis hidden in a kernel

FRd :)()(),( yxyxK

No! Replace dot products: )()( yxyx

KKT: only SV have 0iα

Non-separable case: iξCww

22

2

1

2

1


Bagging – Procedure

Training eventsDraw with replacement

Draw with replacementDraw with replacement

Resampled events 1

Resampled events 2

Resampled events n

Train

Train

Train

Classifier1

Classifier2

Classifiern

Combine tofinal decision

• majority voting• (weighted) averaging

Around 63% oforiginal events,

rest are replications

Bootstrap aggregating

Aim is to create strong classifiers which are as independent as possible.


Random Forests

Modification:At each node of the tree:Search only through arandomly selected subsetof all features

Tree, Randomness, Combination

RF

Use Bagging on this classifier

1 – 2,1 2 – 2,1 1 – 1,2

Training:

Testing/Evaluation:

final output =

final output =

Basis:Decision Tree (CART)without pruning

Create3 trees


Boosting – Procedure

Training eventsnormal weights

Train Classifier1

Raise weights ofmisclassified events Training events

weight config 1

TrainClassifier2

Raise weights ofmisclassified eventsTraining events

weight config 2

Train Classifiern

i i iE out true

1

1 N

i ii

E E wN

1 E

E

iEi iw w

Weight classifiers withtheir performance andcombine to final decision

Misclassified eventsget higher weights,are learned better.

Boosting tries to equalizemisclassification ratesfor each event.

?

!


L2 Neural Network Trigger

L1 2.3 µs

L2 20 µs

L4 100 ms

10 MHz

500 Hz

50 Hz

10 Hz

DVCS, J/Psi µµ, D*, DiJetCC, J/Psi ee TC

Trigger Scheme

H1 at HERA ep Collider, DESY

„L2NN“

new

TE L1ST Physics

*00 78 Charged Current old

01 68 Phi K+K-

02 52,54 J/Psi ee

03 83 DiJet

04 54 J/Psi µµ

05 32 D* untagged

06 40 Spacal back2back

07 78 Charged Current

08 33 J/Psi ee TC (1999)

09 41 DVCS

10 83 D* tagged

*11 33 J/Psi ee TC (2004)

12 15 J/Psi µµ inelastic


L2NN Rates and Efficiencies

Last daybefore

shutdownS83 DiJets

des=50%rej=50%

S32 D*des=94%rej=90%

S78 CCdes=58%rej=60%

S41 DVCSdes=80%rej=80%

S83 D*des=43%rej=50%

S33 J/Psides=94%rej=90%

S15 J/Psides=30%rej=30%

All measuredrate-reductionsmatch design.

No wrong prediction for efficiency found.

S83 DiJetsS32 D*S78 CCS41 DVCSS83 D*S33 J/Psi eeS15 J/Psi µµ

95%58%

100%97%95%

>95%96%


Performance Measurement - Classification

Eff@Rej = xx%Rej@Eff = xx%

0 output 1

signal

background

Misclassification =200%-Eff-Rej


Performance Measurement - Regression

=y-out(x)

²=<>²+²

N

iii xouty

NE

1

2)(1

2

1

( )1 Ni i

i i

y out xE

N y

22 2

i i

i i

m y out x

s y out x m

0

5

10

15

20

25

-15.0 -7.5 0 7.5 15.0

=y-out(x)


From Classification to Regression

k-NN

3

4

5

3

2

2

53

8out

RS

3

4

5

3

2

2

54

13out

NN

N

iii xouty

NE

1

2)(1

Fit Gauss

a=(-2.1x - 1) b=(+2.1x - 1) out=(-12.7a-12.7b+9.4)

jens zimmermann, mpi für physik münchen, acat 2005 zeuthen1 backups jens zimmermann...

Documents

mpi fr physik mnchen

institut fr physik

binning slide

original events

training position

replacement resampled

train nn

classifier n