practical statistics - university of arizonaircamera.as.arizona.edu/astr_518/sep-2-numstat.pdf ·...
TRANSCRIPT
Practical Statistics • Lecture 3 (Sep. 2)
Read: W&J Ch. 4-5
- Correlation
- Hypothesis Testing
• Lecture 4 (Sep. 4) - Principle Component
Analysis
• Lecture 5 (Sep. 9): Read: W&J Ch. 6
- Parameter Estimation
- Bayesian Analysis
- Rejecting Outliers
- Bootstrap + Jack-knife
• Lecture 6 (Sep. 11) Read: W&J Ch. 7
- Random Numbers
- Monte Carlo Modeling !
• Lecture 7 (Sep. 16): - Markov Chain MC
!
• Lecture 8 (Sep. 18): Read: W&J Ch. 9
- Fourier Techniques
- Filtering
- Unevenly Sampled Data1
Review: Process of Decision Making
2
Ask a Question
Take Data
Reduce Data
Derive Statistics describing data
Does the Statistic answer your question?
Probability Distribution
Error Analysis
Publish!
No
Reflect on what is needed
Yes
Hypothesis Testing
Simulation
P1(n) = pn(1� p)M�n
P (n) =M !
n!(M � n)!pn(1� p)M�n =
✓Mn
◆pn(1� p)M�n
Review: The Binomial distribution! You are observing something that has a probability, p,
of occurring in a single observation. ! You observe it M times. ! Want chance of obtaining n successes. For one,
particular sequence of observations the probability is:
! There are many sequences which yield n successes:
Example: The importance of the null result
You are reading a telescope proposal to observe stars with transition disks for binarity. They aim to disprove that transition disks are caused by a stellar mass companion.
The proposal requests time for 20 objects. The stated goal is to prove that, in general, transition disks are not due to a stellar companion.
4
Is this a reasonable sample?
If there is only time to do 10, should you give them time?
Example 2:Counting Statistics
I am at the Kuiper 1.5 m telescope on Mt. Bigelow, getting ready to measure a transit of a planet in front of a star. I expect the drop in brightness to be 1% of the star’s total flux.
!I take a 1 s exposure and measure the number of digital
units (DU) to be 200 for the star. The camera’s user manual tells me that I expect 5 e-/DU. So photoelectrons=1000.
5
How long should I expose for each frame to get good quality data?
Example 3: Recursive BayesianIf we have a coin we suspect is double headed, how
many flips would it take us to be reasonably confident it really is double headed?
!
6
What posterior probability criteria should we choose?
Assume we adopt the prior belief that there is only a 1% chance. . .
Assume we have seen two coins, one double-headed and the other normal. We don’t know which is being used. What is the prior?
Correlation
• Often the first approach to analyzing data is to look for correlations in various parameters.
- May or may not be physically motivated. - Understand experimental effects first (be skeptical). - Be careful of “subclusters” of points. - Correlation is not (necessarily) causation (remain skeptical).
7
A mass-separation correlation?
8
Are people born early in the year better hockey players?
See “Outliers” book by Malcolm Gladwell 9
r =P
i(Xi� < X >)(Yi� < Y >)pPi(Xi� < X >)2
Pi(Yi� < Y >)2
� =covariance(x, y)
⇥x
⇥y
Correlation coefficient• The correlation coefficient for two parameters, x and y,
is defined as the covariance between parameters over the scatter in the distribution for each parameter:
10
• The correlation coefficient can be estimated directly from the data:
prob(⇢|data) / (1� ⇢
2)(N�1)/2
(1� ⇢r)N�3/2(1 +
1N � 1/2
1 + ⇢r
8+ ...)
Probability of correlation
• For a bivariate Gaussian distribution, Bayes’ theorem can be used to estimate the probability of correlation:
11
• This is often useful for comparing correlations or giving relative chances on the correlation of data.
Use of Jeffrey’s correlation distribution
12
W&J Figure 4.5
Probability of a positive correlation
13
r=0.75
r=0.5
r=0.25
W&J Figure 4.6
What if we see a correlation?• It’s common (but dangerous!) to just fit a line to the
data:
14“Anscombe’s quartet” illustrates the potential pitfalls of line fitting
�i =nX
j=1
aijxj
Principle Component Analysis
• If we have N objects, n measured variables (x_n) for each object then:
- We want a minimum number of variables that are independent.
- These variables will be linear combinations of the observed variables:
15
The goal is to define the new variables to minimize the residual variance in the data
Geometrical view of PCA
• Iterative approach of finding the component with maximum variance.
16
PCA manipulation
17
Statistics for Hypothesis Testing
! Hypothesis testing uses some metric to determine whether two data sets, or a data set and a model, are distinct.
! Typically, the problem is set up so that the hypothesis is that the data sets are consistent (the null hypothesis).
! A probability is calculated that the value found would be obtained again with another sample.
! Based on the required level of confidence, the hypothesis is rejected or accepted.
Parametric Tests
•Often, the most intuitive way to understand our data is to choose the parameter of interest (say the mean) and compare it to a model.
•Alternatively, we might be comparing two data sets by asking whether the differences in a statistic are meaningful.
!
•These general tests are called “Parametric tests” •They can use frequentist approaches to accept or reject
the hypothesis. •They can use Bayesian approaches to calculate
probabilities of different results. 19
Are two data sets drawn from the same distribution?
! The “t” statistic quantifies the likelihood that the means are the same.
! The “F” statistic quantifies the likelihood that the variances of two data sets are the same.
! Consider two data sets, x and y, with m and n data points:
s2 =nS
x
+ mSy
n + m
F =P
(xi � x)2/(n� 1)P(yi � y)2/(m� 1)
S
x
=P
(xi
� x)2
n
t =x� y
s
p1/m + 1/n
� = m + n� 2
Student's t test
! Calculate the t statistic. A perfect agreement is t=0. ! Evaluate the probability for t>value.
s2 =nS
x
+ mSy
n + m
t =x� y
s
p1/m + 1/n
F test! Calculate the F statistic.
!! Calculate the probability that F>value.
F =P
(xi � x)2/(n� 1)P(yi � y)2/(m� 1)
Student-t Test Example
Imagine we are observing a sample of stars with known hot Jupiters.
A fraction of these are observed to have stellar companions.
A fraction of the sample have orbits that are significantly different than the stellar spin axis.
There appears to be a connection between these.
!What is the chance that the sample with and without stellar
companions are drawn from the same distribution?
23
Student-t Test Example
There are 27 degrees of freedom in this example (29 observations and two means to calculate).
!The mean of the misaligned sample is 0.77 detections/star.
The mean of the aligned sample is 0.25 detections/star.
We can calculate the t-statistic is t=2.0.
Indicates there is a 0.5% probability the two samples are randomly drawn from the same distribution.
24
Non-Parametric Tests
If we don’t know the underlying distribution, or have small number statistics, there are still tests that can be used to accept or reject a hypothesis.
Non-parametric tests still make some assumption about the data: Usually this is something related to the data following
counting statistics, or the binomial distribution (randomness assumed, in the appropriate form)
25
The Kolmogorov-Smirnov Test
! Calculate the cumulative distribution function for your model (C_model(x)).
! Calculate the cumulative distribution function for your data(C_data(x).
! Find maximum of |Cmodel(x)-Cdata(x)| ! The variables, x, must be continuous to use K-S test.
K-S test example
Right panel is the CDF of known single radial velocity planets (solid line).
If we model this as a mixture of single planets, and double planets (which mimic a single eccentric planet) the correct mixture is ~50%, constrained by the K-S test.
Chi-squared test
The chi-squared statistic can be used to compare any model to a data set:
28
�2 =NX
i=1
(Ei �Oi)2
Ei
Assumes variation in data is due to counting statistics !Data must be binned so that E_i is reasonable for the model
General Picture:
Correlation -> Hypothesis Testing -> Model Fitting -> Parameter Estimation.
!Is there a correlation? Is it consistent with an assumed distribution? Does the assumed model fit the data? What parameters can we derive for the model with what
uncertainty?
29