3 katanomΕΣ Παραμετροι

65
ΜΑΘΗΜΑΤΙΚΑ και ΣΤΑΤΙΣΤΙΚΗ στη ΒΙΟΛΟΓΙΑ 3 Κατανομες και Παραμετροι Ι. Αντωνιου, Κ. Κρικωνης Τμημα Μαθηματικων Aριστοτελειο Πανεπιστημιο Θεσσαλονικης Χειμερινο Εξαμηνο

Upload: michael-amadeus-andritsopoulos

Post on 18-Aug-2015

223 views

Category:

Documents


5 download

DESCRIPTION

katanomes

TRANSCRIPT

3 . , . A : : {, } : ={ x1 , x2 , , x} : ={x1 , x2 , } : {-, +} F(x)=P[X(y)x] : ={ x1 , x2 , } p(x)=F(x) F(x1)F(x)=p(x1) +p(x2) ++p(x), =1,2, : {-, +} p()=

()F() = dy() F(x) Cumulative Distribution Function of the RV T (x) probabilityFunctionof the RV T

(x) = 1 F(x) E Survival Functionor ReliabilityFunction of the RV K ( ) ( ) : X: 2 Observable Events Probability P(x) F(x) E 1 F(x) 22 ={ (1,1)}1/361/36=2.78%35/36 33 ={ (1,2), (2,1)}2/363/36=8.4%34/36 44 ={ (2,2), (1,3),(3,1)}3/366/36=16.7%30/36 55 ={ (1,4), (2,3),(3,2), (4,1)}4/3610/36=27.8%20/36 66 ={ (1,5), (2,4),(3,3), (4,2), (5,1)}5/3615/36=41.7%21/36 77 ={ (1,6), (2,5),(3,4), (4,3), (5,2), (6,1)} 6/3621/36=58.3%15/36 88 ={ (2,6), (3,5),(4,4), (5,3), (6,2)}5/3626/36=72.2%10/36 99 ={ (3,6), (4,5),(5,4), (6,3)}4/3630/36=83.3%6/36 1010 ={ (4,6), (5,5),(6,4)}3/3633/36=91.7%3/36 1111 ={ (5,6), (6,5)}2/3635/36=97.2%1/36 1212 ={ (6,6)}1/3636/36=100%0 P[A(y) B(y)]=0 P[A] =[B], [A] =[B] . K E www.stat.rice.edu/~dobelman/textfiles/DistributionsHandbook.pdf http://en.wikipedia.org/wiki/List_of_probability_distributions deterministic distribution: the support has only one value The discrete uniform distribution, where all elements of a finite set are equally likely. This is the theoretical distribution model for a balanced coin, an unbiased die, a casino roulette, or the first card of a well-shuffled deck. The Bernoulli distribution, which takes value 1 with probability p and value 0 with probability q =1 p. The Rademacher distribution, which takes value 1 with probability 1/2 and value 1 with probability 1/2. The binomial distribution, which describes the number of successes in a series of independent Yes/No experiments all with the same probability of success.

The beta-binomial distribution, which describes the number of successes in a series of independent Yes/No experiments with heterogeneity in the success probability. The degenerate distribution at x0, where X is certain to take the value x0. This does not look random, but it satisfies the definition of random variable. This is useful because it puts deterministic variables and random variables in the same formalism. The hypergeometric distribution, which describes the number of successes in the first m of a series of n consecutive Yes/No experiments, if the total number of successes is known. This distribution arises when there is no replacement. The Poisson binomial distribution, which describes the number of successes in a series of independent Yes/No experiments with different success probabilities. Fisher's noncentral hypergeometric distribution Wallenius' noncentral hypergeometric distribution The beta negative binomial distribution The Boltzmann distribution, a discrete distribution important in statistical physics which describes the probabilities of the various discrete energy levels of a system in thermal equilibrium. It has a continuous analogue. Special cases include:The Gibbs distribution The MaxwellBoltzmann distribution The BoseEinstein distribution The FermiDirac distribution. The extended negative binomial distribution The geometric distribution, a discrete distribution which describes the number of attempts needed to get the first success in a series of independent Yes/No experiments. The logarithmic (series) distribution The negative binomial distribution, a generalization of the geometric distribution to the nth success. The parabolic fractal distribution The Poisson distribution, which describes a very large number of individually unlikely events that happen in a certain time interval.The ConwayMaxwellPoisson distribution, a two-parameter extension of the Poisson distribution with an adjustable rate of decay. The Skellam distribution, the distribution of the difference between two independent Poisson-distributed random variables. The YuleSimon distribution The zeta distribution has uses in applied statistics and statistical mechanics, and perhaps may be of interest to number theorists. It is the Zipf distribution for an infinite number of elements. Zipf's law or the Zipf distribution. A discrete power-law distribution, the most famous example of which is the description of the frequency of words in the English language. The ZipfMandelbrot law is a discrete power law distribution which is a generalization of the Zipf distribution. The Dirac delta function although not strictly a function, is a limiting form of many continuous probability functions. It represents a discrete probability distribution concentrated at 0 a degenerate distribution but the notation treats it as if it were a continuous distribution. The rectangular distribution is a uniform distribution on [-1/2,1/2]. The continuous uniform distribution on [a,b], where all points in a finite interval are equally likely. The Beta distribution on [0,1], of which the uniform distribution is a special case, and which is useful in estimating success probabilities. The Arcsine distribution on [a,b], which is a special case of the Beta distribution if a=0 and b=1. The Logitnormal distribution on (0,1). The Irwin-Hall distribution is the distribution of the sum of n i.i.d. U(0,1) random variables. The Kent distribution on the three-dimensional sphere. The Kumaraswamy distribution is as versatile as the Beta distribution but has simple closed forms for both the cdf and the pdf. The logarithmic distribution The raised cosine distribution on [ s, + s]. The triangular distribution on [a, b], a special case of which is the distribution of the sum of two independent uniformly distributed random variables (the convolution of two uniform distributions). The truncated normal distribution on [a, b]. The U-quadratic distribution on [a, b]. The von Mises distribution on the circle. The von Mises-Fisher distribution on the N-dimensional sphere has the von Mises distribution as a special case. The Wigner semicircle distribution is important in the theory of random matrices. -The Beta prime distribution The chi distribution The noncentral chi distribution The chi-squared distribution, which is the sum of the squares of n independent Gaussian random variables. It is a special case of the Gamma distribution, and it is used in goodness-of-fit tests in statistics. The inverse-chi-squared distribution The noncentral chi-squared distribution The Scaled-inverse-chi-squared distribution The Dagum distribution The exponential distribution, which describes the time between consecutive rare random events in a process with no memory. The F-distribution, which is the distribution of the ratio of two (normalized) chi-squared distributed random variables, used in the analysis of variance. It is referred to as the beta prime distribution when it is the ratio of two chi-squared variates which are not normalized by dividing them by their numbers of degrees of freedom. The noncentral F-distribution Fisher's z-distribution The folded normal distribution The Gamma distribution, which describes the time until n consecutive rare random events occur in a process with no memory. The Erlang distribution, which is a special case of the gamma distribution with integral shape parameter, developed to predict waiting times in queuing systems The inverse-gamma distribution The half-normal distribution Hotelling's T-squared distribution The inverse Gaussian distribution, also known as the Wald distribution The Lvy distribution The log-logistic distribution The log-normal distribution, describing variables which can be modelled as the product of many small independent positive variables. The MittagLeffler distribution The Pareto distribution, or "power law" distribution, used in the analysis of financial data and critical behavior. The Pearson Type III distribution The Rayleigh distribution The Rayleigh mixture distribution The Rice distribution The type-2 Gumbel distribution The Weibull distribution or Rosin Rammler distribution, of which the exponential distribution is a special case, is used to model the lifetime of technical devices and is used to describe the particle size distribution of particles generated by grinding, milling and crushing operations. The Cauchy distribution, an example of a distribution which does not have an expected value or a variance. In physics it is usually called a Lorentzian profile, and is associated with many processes, including resonance energy distribution, impact and natural spectral line broadening and quadratic stark line broadening. Chernoff's distribution The Fisher-Tippett, extreme value, or log-Weibull distribution The Gumbel distribution, a special case of the Fisher-Tippett distribution Fisher's z-distribution The generalized logistic distribution The generalized normal distribution The geometric stable distribution The Holtsmark distribution, an example of a distribution that has a finite expected value but infinite variance. The hyperbolic distribution The hyperbolic secant distribution The Landau distribution The Laplace distribution The Lvy skew alpha-stable distribution or stable distribution is a family of distributions often used to characterize financial data and critical behavior; the Cauchy distribution, Holtsmark distribution, Landau distribution, Lvy distribution and normal distribution are special cases. The Linnik distribution The logistic distribution Innovations, Neural networks The map-Airy distribution The normal distribution, also called the Gaussian or the bell curve. It is ubiquitous in nature and statistics due to the central limit theorem: every variable that can be modelled as a sum of many small independent, identically distributed variables with finite mean and variance is approximately normal. The Normal-exponential-gamma distribution The Pearson Type IV distribution (see Pearson distributions) The skew normal distribution Student's t-distribution, useful for estimating unknown means of Gaussian populations.The noncentral t-distribution The type-1 Gumbel distribution The Voigt distribution, or Voigt profile, is the convolution of a normal distribution and a Cauchy distribution. It is found in spectroscopy when spectral line profiles are broadened by a mixture of Lorentzian and Doppler broadening mechanisms. The Gaussian minus exponential distribution is a convolution of a normal distribution with (minus) an exponential distribution. (Location Parameters) (mean) (Moments) K (modes) (median). (Quantiles, Percentiles) - (Dispersion Parameters) (range). (variance) (standard deviation) (relative error)= (CV) E ( ) (Shape Parameters) (skewness). (kurtosis). O E[X] =m =X(y1)p1+X(y2)p2+=x1p1+x2p2+=x1 (F(x1)0)+ x2(F(x2) F(x1))+ E[X] =m =Y X(y)dP = ()+ = + 1) 2) , 3) : X = 2 X Observable Events Probability P(x) F(x) 22 ={ (1,1)}1/361/36=2.78% 33 ={ (1,2), (2,1)}2/363/36=8.4% 44 ={ (2,2), (1,3),(3,1)}3/366/36=16.7% 55 ={ (1,4), (2,3),(3,2), (4,1)}4/3610/36=27.8% 66 ={ (1,5), (2,4),(3,3), (4,2), (5,1)}5/3615/36=41.7% 77 ={ (1,6), (2,5),(3,4), (4,3), (5,2), (6,1)}6/3621/36=58.3% 88 ={ (2,6), (3,5),(4,4), (5,3), (6,2)}5/3626/36=72.2% 99 ={ (3,6), (4,5),(5,4), (6,3)}4/3630/36=83.3% 1010 ={ (4,6), (5,5),(6,4)}3/3633/36=91.7% 1111 ={ (5,6), (6,5)}2/3635/36=97.2% 1212 ={ (6,6)}1/3636/36=100% E[X] =m =2136+3236+4336+5436+6536+7636+8536+9436+10336+11236+ 12136

m=236+636+1236+ 2036+3036+ 4236+ 4036+3636+ 3036+2236+ 1236= 7 1) E[cX]= cE[X], c E[X1+X2]= E[X1] + E[X2] 2) E[] 0, 0 3) E[1]=1, 1(y)=1, 1 []=0, (y)= 0, 0 4) E[X1] E[X2], X1 X2 5) |[]| E[|X|] 6) [()] (E[X]), :, :[()] =(x1) p1+ (x2)p2+ = ()()+ arkov P[|X| ] E[|X|]

, >0 E[|X|]

: X = 2 P[|X| 7] [||]

=

= P[|X| 7]=

+

+

+

+

+

=

= . P[|X| 8] [||]

=

= . P[|X| 8]=

+

+

+

+

=

= . P[|X| 9] [||]

=

= . P[|X| 9]=

+

+

+

=

= . O Po r-, r=1,2,3,mr =E[Xr ] =(x1)rp1+(x2)r p2+ =

()+

1) m1 =m =E[X], 2) m2 =E[X2] (mean square) X 3) A , , (moment problem). 4 : X = 2 E[

] =

+

+

+

+

+

+

+

+

+

+

E[

] =

+

+

+

+

+

+

+

+

+

+

E[

] = . +. +. +. + +. + . + + . + . + E[

] = 54.83 O (des) x=mode (x) () (Unimodal) 1 (imodal) 2 : X = 2 mode=7=m O H x=1/2: P[x< x1/2] 12 P[x x1/2] : X = 2 / = 7 = m = mode , , , : = = : 3( ) :3 >0 > > :3 3

4

4 = 3 ,

4

4 < 3

4

4 3 =xcess Kurtosis ( ) 8 (ean) K (odes) (edian) (Standard Deviation) (Relati ve Error) (Skewness) (Kurtosis) (Entropy) , , { 0.5} { 0.5} O M X,Y (Covariance)

= cov(X,Y)= [ (X-[X]) (Y-[Y]) ]= [ (X mX) (YmY) ]

= 2=var(X) the Variance of X

= [XY]

O M X,Y (Correlation) Cor(XY)= E[XY]= Cor(XX)= E[X2]== ||X2 || = X O Pearson M X,Y (Covariance)

= r(X, Y) = (, )()()=

1) Pearson [-1,1]: 1

1 2)

= 1 , : = +, , =

+

,

= 1 =

+

,

= 1 . CauchySchwarz PearsonGalton 1888 (K ), Darwin Pearson 1895 Galton Pearson K. 1920, Notes on the History of Correlation, Biometrika 13, 25-45 Rodgers J . L. , Nicewander W. A. 1988, Thirteen ways to look at the correlation coefficient, The American Statistician, 42(1), 5966 Stigler S. M.1989,Francis Galton's Account of the Invention of Correlation, Statistical Science 4 (2): 7379 O M X,Y

=[X ; Y]min ( [X] , [Y] ) : [X ;Y)] = [X] + [Y] [X,Y] = (x, y)log2(x|y)(x) x,y = (x, y)log2(x,y)(x)(y)x,y (Mutual Information) X,Y [X] =(x)log2(x)x X [Y] =(y)log2(y)y [X,Y] = (x, y)log2(x, y)x,y X,Y

1) 0 rMI 1 2) rMI=0 X , Y 3) rMI=1 : , (, ) =

(

)

+

: rMI = 1 2(2 2) =min(

,

) |r|=0 rMI = 0