detect unknown systematic effect: diagnose bad fit to multiple data sets advanced statistical...
Post on 21-Dec-2015
218 views
TRANSCRIPT
![Page 1: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/1.jpg)
Detect Unknown Systematic Effect: Diagnose bad fit to
multiple data sets
Detect Unknown Systematic Effect: Diagnose bad fit to
multiple data sets
Advanced Statistical Techniques in Particle Physics
Grey College, Durham
18 - 22 March 2002
M. J. Wang
Institute of Physics
Academia Sinica
Advanced Statistical Techniques in Particle Physics
Grey College, Durham
18 - 22 March 2002
M. J. Wang
Institute of Physics
Academia Sinica
![Page 2: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/2.jpg)
PrefacePreface
• Motivation and gratitude – Learn quite a lot at the workshop on
confidence limits at Fermilab in 2000 – Thanks for hosting this conference• Main title: Detect Unknown Systematic
Effect – More suitable to this conference aim – Important for experimentalists – Might be able to detect it in global fit• Sub-title: Diagnose bad fit to multiple
data sets – Global fit is not internally consistent – Don’t know which part is wrong?
– Need to diagnose the data sample
![Page 3: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/3.jpg)
OutlineOutline
• Introduction
• Global fit and its goodness of fit
• Parameter fitting criterion
• Diagnose bad fit to multiple data sets
• Conclusion
![Page 4: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/4.jpg)
IntroductionIntroduction
• Knowledge of parton distribution function is essential for hadron collider research
• Global fit is used to obtain parton distribution function
• Uncertainties of parton distribution function parameters
– Precision hadron collider results require estimates of uncertainties of parton distribution function parameters
– Important for Fermilab RunII and LHC physics analyses
![Page 5: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/5.jpg)
IntroductionIntroduction
• Knowledge of parton distribution function is essential for hadron collider research
– Interpretation of data with SM
– SM parameter precision measurement
– Search for beyond SM signal
• Global fit is used to obtain parton distribution function
– Non-perturbative parton distribution functions could not be determined by PQCD
– Therefore, they are determined by global fit
![Page 6: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/6.jpg)
Global fit and goodness of fit
Global fit and goodness of fit
• Reliable parton distribution function parameter and uncertainty estimates require passing goodness of fit criterion
– Total chi-square is used for goodness of fit
– +/- sqrt(2N) is used as a accepted range
• Is total chi-square good enough for goodness of fit ?
– Total chi-square is insensitive to small subset of data with bad fit
• Is there any way for more stringent criterion?
– Need new idea
![Page 7: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/7.jpg)
Parameter fitting criterionParameter fitting criterion
• Idea motivated by Louis Lyons’s
goodness of fit paradox at ACAT 2000
• J.C. Collins and J. Pumplin applied this idea to the goodness of fit for global fit
– Hypothesis-testing vs parameter-fitting criteria
– Subset chi-square against total chi-square
– Found inconsistent data sets in CTEQ5 data sets
• Still don‘t know which part is correct or wrong ?
![Page 8: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/8.jpg)
Parameter fitting criterionParameter fitting criterion
– Hypothesis-testing vs parameter-fitting criteria ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.3 )
![Page 9: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/9.jpg)
Parameter fitting criterionParameter fitting criterion
– Subset chi-square against total chi-square( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.10 )
![Page 10: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/10.jpg)
Parameter fitting criterionParameter fitting criterion
– Found inconsistent data sets in CTEQ5 data ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p. 13 )
![Page 11: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/11.jpg)
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Importance of studying bad fit – Is the inconsistent data set free of
unknown systematic effects? – Is the theoretical prediction adequate? – Is there any hint for new physics?
• Any statistics for the diagnose purpose? – Pull can be used to identify
inconsistent experiment or data point ( thanks to F. James’s “Statistical methods in experimental physics” )
– But for real data, there is no measured pull distribution for each data point
– What should we do with pull ?
![Page 12: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/12.jpg)
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Pull definition for each data point
Mi = Ti + ( random error )
Ri = Ti - Mi = -( random error )
Pi = Ri / sigma( Ri )
• Pull properties
– Gaussian shape
– Center at zero
– With unit variance
– Independence among pulls of different data points
![Page 13: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/13.jpg)
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Systematic effects introduce correlation among pulls
– Constant shift on all data points
– Correlated shift on all data points
![Page 14: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/14.jpg)
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Correlation among pulls is the key for detecting unknown systematic effects
• Pull correlation study
– Pull distribution consists of all data points in one experiment( experiment pull distribution )
– Pull as a function of measurement variable X
![Page 15: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/15.jpg)
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
Mi = Ti + ( random error ) + Si ( or S )
Ri = Ti - Mi = -( random error ) - Si ( or S )
Pi = Ri / sigma( Ri )
![Page 16: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/16.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( MC data vs true curve )
![Page 17: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/17.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( residual dis. of first 6 channels with 10,000 entries )
![Page 18: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/18.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( 10% uncertainty on error estimate of the first 6 channels with 10,000 entries )
![Page 19: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/19.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( pull dis. of the first 6 channels with 10,000 entries )
![Page 20: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/20.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( effect of error estimate uncertainties 0%,10%,20% on pull dis. With 10,000 entries )
![Page 21: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/21.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift ( experiment residual and pull dis. with 100,000 entries )
![Page 22: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/22.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift ( experiment residual and pull profiles as function of X with 100,000 entries )
![Page 23: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/23.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( experiment residual and pull dis. with 100 entries )
![Page 24: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/24.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
1. Constant horizontal shift( experiment residual and pull profile as function of X with 100 entries )
![Page 25: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/25.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( MC data vs true curve )
![Page 26: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/26.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( residual dis. Of the first 6 channels with 10,000 entries )
![Page 27: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/27.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )
![Page 28: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/28.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( experiment residual and pull dis. with 100,000 entries )
![Page 29: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/29.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( experiment residual and pull profile as function of X with 100,000 entries )
![Page 30: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/30.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( experiment residual and pull dis. as function of X with 100 entries )
![Page 31: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/31.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
2. Constant vertical shift( experiment residual and pull profiles as function of X with 100 entries )
![Page 32: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/32.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( MC data vs true curve )
![Page 33: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/33.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( residual dis. Of the first 6 channels with 10,000 entries )
![Page 34: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/34.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )
![Page 35: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/35.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. with 100,000 entries )
![Page 36: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/36.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100,000 entries )
![Page 37: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/37.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. as function of X with 100 entries )
![Page 38: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/38.jpg)
Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets
• Naive case without known systematic uncertainties
– Representative systematic shifts
3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100 entries )
![Page 39: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/39.jpg)
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Real case with known systematic uncertainties
Mi = Ti + ( random error ) +
( systematic error ) + Si ( or S )
Ri = Ti – Mi = - ( random error ) –
( systematic error ) - Si( or S )
Pi = Ri / sigma( Ri )
![Page 40: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/40.jpg)
Diagnose bad fit to multiple data sets
Diagnose bad fit to multiple data sets
• Real case with known systematic uncertainties
– Need to take out known systematic uncertainty term in order to restore the independence property
– Need to fit the residual systematic effect with the aid of global fit
– Regain the naive case results
![Page 41: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -](https://reader034.vdocuments.net/reader034/viewer/2022051619/56649d635503460f94a45847/html5/thumbnails/41.jpg)
ConclusionConclusion
• Global fit is important in determining parton distribution function parameter and uncertainties
• There are inconsistent data samples found by the parameter fitting criterion
• Correlations among pulls could be a technique of detecting unknown systematic effects
• Will apply and implement this technique to global fit