# leon leon@

Post on 31-Dec-2015

21 views

Category:

## Documents

Embed Size (px)

DESCRIPTION

eatworms.swmed.edu/~leon leon@eatworms.swmed.edu. Combining probabilities Samples and Populations Four useful statistics: The mean, or average. The median, or 50% value. Standard deviation. Standard Error of the Mean (SEM). Three distributions: The binomial distribution. - PowerPoint PPT Presentation

TRANSCRIPT

• eatworms.swmed.edu/~leonleon@eatworms.swmed.edu

• Basic StatisticsCombining probabilitiesSamples and PopulationsFour useful statistics:The mean, or average.The median, or 50% value.Standard deviation.Standard Error of the Mean (SEM).Three distributions:The binomial distribution.The Poisson distribution.The normal distribution.Four testsThe chi-squared goodness-of-fit test.The chi-squared test of independence.Students t-testThe Mann-Whitney U-test.

• Combining probabilitiesWhen you throw a pair of dice, what is the probability of getting 11?

• Combining probabilitiesThe probability that all of several independent events occurs is the product of the individual event probabilities.

The probability that one of several mutually exclusive events occurs is the sum of the individual event probabilities.

• Combining probabilitiesWhen you throw a pair of dice, what is the probability of getting 11?

When you throw five dice, what is the probability that at least one shows a 6?

• Combining probabilitiesWhen you throw a pair of dice, what is the probability of getting 11?

When you throw five dice, what is the probability that at least one shows a 6?

• Populations and samplesWhat proportion of the population is female?

• Populations and samplesWhat proportion of the population is female?Abstract populations: what does a mouse weigh?

• Populations and samplesWhat proportion of the population is female?Abstract populations: what does a mouse weigh?Population characteristics:Central tendency: mean, medianDispersion: standard deviation

• Four sample statistics

Sample mean:

Sample median:

is the middle value in a sample of odd size, the average of the two middle values in a sample of even size.

Sample standard deviation:

Standard Error of the Mean:

_936962343.unknown

_937035986.unknown

_937036511.unknown

_936962342.unknown

• Standard deviation and SEMUse standard deviation to describe how much variation there is in a population.Example: income, if youre interested in how much income varies within the US population.Use SEM to say how accurate your estimate of a population mean is.Example: measurement of -gal activity from a 2-hybrid test.

• Sample stats: recommendationsWhen you report an average, report it as meanSEM. Same for error bars in graphs.In the figure caption or the table heading or somewhere, say explicitly that thats what youre reporting.Use the median for highly skewed data.

• Three distributionsThe binomial distributionWhen you count how many of a sample of fixed size have a certain characteristic.The Poisson distributionWhen you count how many times something happens, and there is no upper limit.The normal distributionWhen you measure something that doesnt have to be an integer or when you average several continuous measurements.

• The binomial distribution

When you count how many of a sample of fixed size have a certain characteristic.

Parameters:N: the fixed sample sizep: the probability that one thing has the characteristicq: the probability that it doesnt: (1-p)

Formula:

Example:Females in a population, animals having a certain genetic characteristic.

_937036628.unknown

_1031483742.unknown

• The Poisson distribution

When you count how many times something happens, and there is no (or only a very large) upper limit.

Parameter:(: the population mean

Formula:

Example:Radioactivity counts, positive clones in a library.

_1031484037.unknown

• The normal distribution

When you measure a something that doesnt have to be an integer, e.g. weight of a mouse, or velocity of an enzyme reaction, and especially when you average several such continuous measurements.

Parameters:(: the population mean

: the population variance

Formula:

Example:Weight, heart rate, enzyme activity

_1031484223.unknown

_1031484231.unknown

• Hypothesis testing

• A genetic mapping problem

Moms genotype:

At SSR:

(/(

(/(

At disease locus:

e/+

e/+

Assume we know that Mom inherited both the ( allele of the SSR and the e mutation from her father, and likewise that Dad inherited ( and e from his father.

Suppose SSR and disease locus are unlinked (the null hypothesis). What is the probability that an epileptic (e/e) child has SSR genotype (/(?

• A genetic mapping problem

Moms genotype:

At SSR:

(/(

(/(

At disease locus:

e/+

e/+

Assume we know that Mom inherited both the ( allele of the SSR and the e mutation from her father, and likewise that Dad inherited ( and e from his father.

Suppose SSR and disease locus are unlinked (the null hypothesis). What is the probability that an epileptic (e/e) child has SSR genotype (/(?

Now suppose that SSR and disease locus are genetically linked. What is the probability that an epileptic (e/e) child has SSR genotype (/(?

• A genetic mapping problem

Moms genotype:

At SSR:

(/(

(/(

At disease locus:

e/+

e/+

Assume we know that Mom inherited both the ( allele of the SSR and the e mutation from her father, and likewise that Dad inherited ( and e from his father.

Suppose SSR and disease locus are unlinked (the null hypothesis). What is the probability that an epileptic (e/e) child has SSR genotype (/(?

Now suppose that SSR and disease locus are genetically linked. What is the probability that an epileptic (e/e) child has SSR genotype (/(?

• The experimentLook at the SSR genotype of 40 e/e kids.If about 1/4 are /, the SSR is probably unlinked.If the number of / is much less than 1/4, the SSR is probably linked.Were going to figure out how to make the decision in advance, before we see the results.

Chart3

0.0000100566

0.0001340878

0.0008715707

0.0036799652

0.0113465595

0.0272317428

0.0529506109

0.0857295605

0.1178781457

0.139707432

0.1443643464

0.1312403149

0.1057213648

0.0759025183

0.048794476

0.0281923639

0.0146835229

0.0069098931

0.0029431026

0.0011359343

0.000397577

0.0001262149

0.0000363346

0.0000094786

0.000002238

0.0000004774

0.0000000918

0.0000000159

0.0000000025

0.0000000003

0

0

0

0

0

0

0

0

0

0

0

Pr(x)

x

Pr(x)

Binomial, N=40, p=0.25

Sheet1

pN

0.2540

0

xPr(x)Upper tailLower tail

00.00001005660.00001005661

10.00013408780.00014414440.9999899434

20.00087157070.00101571510.9998558556

30.00367996520.00469568030.9989842849

40.01134655950.01604223980.9953043197

50.02723174280.04327398260.9839577602

60.05295061090.09622459350.9567260174

70.08572956050.1819541540.9037754065

80.11787814570.29983229970.818045846

90.1397074320.43953973170.7001677003

100.14436434640.5839040780.5604602683

110.13124031490.71514439290.416095922

120.10572136480.82086575770.2848556071

130.07590251830.89676827590.1791342423

140.0487944760.9455627520.1032317241

150.02819236390.97375511590.054437248

160.01468352290.98843863880.0262448841

170.00690989310.99534853190.0115613612

180.00294310260.99829163450.0046514681

190.00113593430.99942756890.0017083655

200.0003975770.99982514590.0005724311

210.00012621490.99995136080.0001748541

220.00003633460.99998769540.0000486392

230.00000947860.9999971740.0000123046

240.0000022380.9999994120.000002826

250.00000047740.99999988950.000000588

260.00000009180.99999998130.0000001105

270.00000001590.99999999720.0000000187

280.00000000250.99999999960.0000000028

290.000000000310.0000000004

30010

31010

32010

33010

34010

35010

36010

37010

38010

39010

40010

0

Sheet1

Upper tail

Lower tail

Pr(x)

x

Pr(x)

Tail probabilities

Binomial, N=40, p=0.25

Sheet2

Pr(x)

x

Pr(x)

Binomial, N=40, p=0.25

Sheet3

• Is the SSR linked?We want to know if the SSR is linked to the epilepsy gene.What would your answer be if:10/40 kids were /?0/40 kids were /?5/40 kids were /?Need a way to set the cut-off.

• Type I errorsSuppose that in reality, the SSR and the epilepsy gene are unlinked.Still, by chance, the number of / in our sample may be
• Whats the probability of a type I error () if we cut off at 5?

Sheet1

pN

0.2500040

0.0

x0Pr(x = x0)Pr(x

• Probability of a type I error

Chart3

0.00001005660.0000100566

0.00014414440.0001340878

0.00101571510.0008715707

0.00469568030.0036799652

0.01604223980.0113465595

0.04327398260.0272317428

0.09622459350.0529506109

0.1819541540.0857295605

0.29983229970.1178781457

0.43953973170.139707432

0.5839040780.1443643464

0.71514439290.1312403149

0.82086575770.1057213648

0.89676827590.0759025183

0.9455627520.048794476

0.97375511590.0281923639

0.98843863880.0146835229

0.99534853190.0069098931

0.99829163450.0029431026

0.99942756890.0011359343

0.99982514590.000397577

0.99995136080.0001262149