st3905 lecturer : supratik roy email : [email protected]@ucc.ie (unix) :...
Post on 21-Dec-2015
230 views
TRANSCRIPT
ST3905
Lecturer : Supratik Roy
Email : [email protected]
(Unix) : [email protected]
Phone: ext. 3626
What do we want to do?
1. What is statistics?
2. Describing Information :
3. Summarization, Visual and non-Visual representation
4. Drawing conclusion from information :
5. Managing uncertainty and incompleteness of information
Describing Information1. Why summarization of information?
2. Visual representation (aka graphical Descriptive Statistics)
3. Non-visual representation (numerical measures)
4. Classical techniques vs modern IT
Stem and Leaf PlotDecimal point is 2 places to the right of the colon
0 : 8
1 : 000011122233333333333344444
1 : 55555566666677777778888888899999999999
2 : 0000000111111111111222222233333333444444444
2 : 555556666666666777778889999999999999999
3 : 000000001111112222333333333444
3 : 55555555666667777777888888899999999
4 : 0122234
4 : 55555678888889
5 : 111111134
5 : 555667
6 : 44
6 : 7
Pie-Chart
diffgeom
com
plex
algebra
rea
ls
statistics
diffgeom
com
plex
algebra
rea
ls
statistics
DotChart
10 20 30
ooo
oo
o
oo
o
oo
o
oo
o
Old Suburb Coast County
New Suburb
Old Suburb Coast County
New Suburb
Old Suburb Coast County
New Suburb
Old Suburb Coast County
New Suburb
Old Suburb Coast County
New Suburb
Child Care
Health Services
Community Centers
Family & Youth
Other
Histogram
-4 -3 -2 -1 0 1 2
05
10
15
50 samples from a t distribution with 5 d.f.
my.sample
Histogram-Categorical
Northeast South North Central West
05
10
15
state.region
Rules for Histograms1. Height of Rectangle proportional to frequency of class
2. No. of classes proportional to sqrt(total no. of observations) [not a hard and fast rule]
3. In case of categorical data, keep rectangle widths identical, and base of rectangles separate.
4. Best, if possible, let the software do it.
Data
-0.053626486 -0.828128399 0.214910482 0.346570399
[5] -0.849316517 0.001077376 0.736191791 1.417540397
[9] -2.382332275 -2.699019949 -0.111907192 1.384903284
[13] 2.113286699 -1.828108272 -1.108280724 0.131883612
[17] -0.394494473 0.829806888 0.023178033 0.019839537
[21] -0.346280222 -0.251981108 1.159853307 -0.249501904
[25] -1.342704742 -2.012653224 -1.535503208 0.869806233
[29] -1.313495887 -0.244408426 -0.998886998 -1.446769605
[33] 1.224528053 -0.410163230 0.032230907 -0.137297112
[37] -2.717620031 -0.728570438 0.034697116 2.202863874
[41] -0.170794163 0.353651680 -0.673296374 3.136364814
[45] -1.260108638 -0.367334893 -0.652217259 -0.301847039
[49] 0.315180215 0.190766333
TabulationClass freq
-3,-2 //// 4
-2,-1 //// // 7
-1,0 //// //// //// /// 18
0,1 //// //// //// 14
1,2 //// 4
2,3 // 2
3,4 / 1
Total 50
Box-Plot - I
20
04
00
60
08
00
Box Plot – II
12
34
56
7
18-24 25-34 35-44 45-54 55-64 65+
Box Plot – III
20
04
00
60
08
00
0 1 2 3 4 5 6 7 8 9
NJ Pick-it Lottery (5/22/75-3/16/76)
Leading Digit of Winning Numbers
Pa
yoff
Non-Visual (numerical measures)
1. Pictures vs. quantitative measures
2. Criteria for selection of a measure – purpose of study
3. Qualities that a measure should have
4. We live in an uncertain world – chances of error
Measures of Location1. Mean :
2. Mode
3. Median
Location : mean, median algebra test scores
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
43 50 41 69 52 38 51 54 43 47 54 51 70 58 44 54 52 32 42 70
21 22 23 24 25 50 49 56 59 38
Mean = 50.68
10% trimmed mean of scores = 50.33333
Median = 51
Location : Non-classicalAn M-estimate of location is a solution mu of the equation:
sum(psi( (y-mu)/s )) = 0.
Data set : car.miles
(bisquare) 204.5395
(Huber’s ) 204.2571
Tabular method of computing
Class freq Class-midpt
Rel. freq
r.f X midpt
-3,-2 4 -2.5 0.08 -0.20
-2,-1 7 -1.5 0.14 -0.21
-1,0 18 -0.5 0.36 -0.18
0,1 14 0.5 0.28 0.14
1,2 4 1.5 0.08 0.12
2,3 2 2.5 0.04 0.10
3,4 1 3.5 0.02 0.07
50 -0.16
Tabular method of computing
Class freq Class-midpt(x)
A=-0.5
x-A/d
Rel. freq
r.f X x
-3,-2 4 -2.5 -2 0.08 -0.16
-2,-1 7 -1.5 -1 0.14 -0.14
-1,0 18 -0.5 0 0.36 0
0,1 14 0.5 1 0.28 0.28
1,2 4 1.5 2 0.08 0.16
2,3 2 2.5 3 0.04 0.12
3,4 1 3.5 4 0.02 0.08
50 0.34
Measures of Scale (aka Dispersion)
1. Variance (unbiased) : sum((x-mean(x))^2)/(N-1)
2. Variance (biased) : sum((x-mean(x))^2)/(N)
3. Standard Deviation : sqrt( variance)
Tabular method of computing
Class Class-midpt(x)
A=-0.5
x’=(x-A)/d
x^2 Rel. freq
r.f X x^2
-3,-2 -2.5 -2 4 0.08 0.32
-2,-1 -1.5 -1 1 0.14 0.14
-1,0 -0.5 0 0 0.36 0
0,1 0.5 1 1 0.28 0.28
1,2 1.5 2 4 0.08 0.32
2,3 2.5 3 9 0.04 0.36
3,4 3.5 4 16 0.02 0.32
1.74
Robust measures of scale1. The MAD scale estimate generally has very small bias
compared with other scale estimators when there is "contamination" in the data.
2. Tau-estimates and A-estimates also have 50% breakdown, but are more efficient for Gaussian data.
3. The A-estimate that scale.a computes is redescending, so it is inappropriate if it necessary that the scale estimate always be increasing as the size of a datapoint is increased. However, the A-estimate is very good if all of the contamination is far from the "good" data.
Comparison of scale measuresMAD(corn.yield) =4.15128
scale.tau(corn.yield) = 4.027753
scale.a(corn.yield) = 4.040902
var(corn.yield) = 19.04191
sqrt(var(corn.yield)) = 4.363703
N.B. To really compare you have to compare for various probability distributions as well as various sample sizes.
Probability1. Concept of an Experiment on Random observables
2. Sets and Events, Random variables, Probability
(a).Set of all basic outcomes = Sample space = S
(b).An element of S or union of elements in S = An event
(Asingleton event = simple event, else compound)
(c) A numerical function that associates an event with a number(s) = Random Variable
(d) A map from E onto [0,1] obeying certain rules = probability
Examples of ProbabilityConsider toss of single coin :
1. A single throw : Only two possible outcomes – Head or Tail
2. Two consecutive throws : Four possible outcomes – (Head, Head), (Head, Tail), (Tail, Head), (Tail, Tail)
3. Unbiased coin : P(Head turns up) = 0.5
4. Define R.V. X to be X(Head)=1, X(Tail)=0. P(X=1)=0.5, P(X=0)=0.5.
Axioms of Probability1. 0 <= P(A) <= 1 for any event A
2. P[A B] = P[A]+P[B] if A,B are disjoint sets/events
3. P[S] =1
Basic Formulae-I1. P[A’] = 1- P[A]
2. P[A B] = 0 if A,B are disjoint
3. P[A B] = P[A]+P[B]-P[A B]
4. P[A B C] = P[A]+P[B]+ P[C]
-P[A B] –P[A C] – P[B C]
+P[A B C]
Basic Formulae - II
1. Counting Principle : For an ordered sequence to be formed from N groups G1,G2,….GN with sizes k1,k2,….kN, the total no. of sequences that can be formed are k1 x k2 x ….kN.
2. An ordered sequence of k objects taken from a set of n distinct objects is called a Permutation of size k of the objects, and is denoted by Pk,n.
3. For any positive integer m, m! is read as “m-factorial” and defined by m!=m(m-1)(m-2)…3.2.1
4. Any unordered subset of size k from a set of n distinct objects is called a Combination, denoted Ck,n.
Basic Formulae-III1. Pk,n = n!/(n-k)!
2. Ck,n = n!/[k!(n-k)!]
3. For any two events A and B with P(B)>0, the Conditional Probability of A given (that ) B (has occurred)is defined by P(A|B) = P(A B)/P(B) [=0 if P(B)=0]
4. Let A,B be disjoint and C be any event with P[C]>0. Then P(C)=P(C|A)P(A)+P(C|B)P(B) [Law of Total Probability]
5. Let A,B be disjoint and C be any event with P[C]>0. Then P(A|C)=P(C|A)P(A)/[P(C|A)P(A)+P(C|B)P(B)]. [Bayes Theorem]
Random Variables - Discrete1. A discrete set is a set such that either it is finite or there exists a
map from each element of the set into a subset of the set of Natural numbers.
2. A discrete random variable is a r.v. which takes values in a discrete set consisting of numbers.
3. The probability distribution or probability mass function (pmf) of a discrete r.v. X is defined for every number x by p(x)=P(X=x)=P(all s S: X(s)=x) [P[X=x] is read “the probability that the r.v. X assumes the value x”. Note, p(x) >= 0, sum of p(x) over all possible x is 1
Cumulative Distribution Function1. The Cumulative distribution function (cdf) F(x) of a discrete
r.v. X with pmf p(x) is defined for every number x by F(x)=P(Xx)={y : y x} p(y)
2. For any number x, F(x) is the probability that the observed value of X will be at most x.
3. For any two numbers a,b with a b, P(a X b) = F(b)-F(a-) where a- represents the largest possible X value that is strictly less than a.
Operations on RV’s1. Expectation of a RV
2. Expectations of functions of RV’s
3. Special Cases : Moments, Covariance
Expected Values of Random Variables
1. Let X be a discrete r.v. with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X) or X , is E(X) = X ={xD} x.p(x)
2. Note that E(X) may not always exists. Consider p(x)=k/x2
Expected Values of functions of Random Variables
1. Let X be a discrete r.v. with set of possible values D and pmf p(x). The expected value or mean value of f(X), denoted by E(f(X)) or f(X) , is E(f(X)) ={xD} f(x).p(x)
2. Example : Variance. Var(X)=V(X)=E[X-E(X)]2=E(X2)-[E(X)]2
Random Variables - Continuous
Joint distribution of >1 RV’s
Gaussian or Normal Distribution
Sample as Random Observables
Parametric Inference
Tests of Hypothesis
Hypothesis Tests for Normal Population