prepare for stat170 exam

11

Basic assumptions about you

Many elementary concepts have been skipped. At this stage, it is assumed that you should know them well. In particular, you MUST know how to do HATPC for each of the 8½ hypothesis tests.

Only important things, or those that inter-connect several topics together, are elaborated here.

You have ABSOLUTELY NO hope of passing STAT170 if you do not know the 8½ HATPCs. This PP file will NOT push you from F to P.

The contents of this file will only help the P or above students, given the presumed basic knowledge.

22

Binding things together

Review of:

• 5 types of graphics

• 5 types of research questions

• 8½ statistical tests

• 8 or MORE types of reports

33

Displaying Data: 5 types of graphics

DATA categorical numerical

categorical

numerical

clustered bar chart

comparative box plots

bar chart or pie chart

comparative box plots

scatter plothistogram or stem-and-leaf

plot

histogram or stem-and-leaf

plot

bar chart or pie chart

44

Combination of variable(s) Graphic

One categorical

(Lecture 2, 11)

•Bar chart

•pie chart

One numerical

(Lecture 2, 7)

•Histogram

• stem-&-leaf

Two categorical

(Lecture 2, 11, 12)

Clustered bar chart

Two numerical

(Lectures 2, 9 & 10)

Scatter plot

One categorical and one numerical

(Lecture 2, 8)

Comparative box plots

Displaying Data: 5 types of graphics

(The following table conveys the same information as the previous slide.)

55

5 types of graphics

STAT170 is restricted to only 5 types of combinations of variables, 5 different types of graphics, and 5 possible research questions.

The most important step is correctly identifying the types of variables: NUMERICAL vs CATERGORICAL. Surprisingly, many students have difficulty in this very first step.

The correct/wrong identification of variables would lead you to the correct/wrong:

• Type of graphic

• Research question, and

• Statistical test. 66

How to comment on graphics:

1. Comments on a single bar chart (seldom asked)

Comment depends on whether variable is ordinal or nominal

• Ordinal: comment similar to histogram

• Nominal: comment on which categories have the highest and lowest frequencies

0

50

100

150

200

250

300

350

400

meat vegetarian vegan

diet

count

Skewed to the right.

This doesn’t make any sense!

77

1. Comment on shape (skewed left/right, normal)

2. Range from xxxx to xxxx

3. Majority (high frequencies) of data about xxxx

4. Comment outliers (if present)

5. Comment on any unusual features (if present)

0 5 10 15 20 25 30

0

100

200

300

400

500

AssessmentFreq.

2. Comments on a single histogram(or stem-and-leaf plot)

80 3 6 9 12

0

20

40

60

80

100

Individual DaysFreq.

Example:

• U-shaped, high frequencies near both ends, lowest frequencies near the centre

• U-shaped, but slightly skewed left

• Range from 0 to 12

99

3. Comments on comparative boxplots

• Compare medians

• Compare spread (IRQ)

• Compare outliers

(Even when there are no outliers, say “no outliers”.)

Class

day

evening

15 20 25 30 35

Age

Class

1010

4. Comments on scatter plot

• Comment on linear/curved? Positive or negative slope?

• Comment on amount of scatter (big or small?)

• Comment on outliers, if any

• Comment on residuals

– Sym on both sides of the line/normal?

– Constant SD?

0

5

10

15

20

25

30

35

40

45

50

10 15 20 25 30 35 40 45

Median Age

Birth Rate

10

15

20

25

30

35

40

45

50

55

10 20 30 40 50 60 70

husband age

age marriage

10

20

30

40

50

60

70

80

90

100

110

-1 0 1 2 3 4 5GPA

UAI

1111

5. Comments on clustered bar charts

Compare the shapes of the clusters, NOT the sizes.

Shapes (not size) similar⇒ The 2 variables independent (ie have no association)

(since % are the same)

Shapes (not size) not similar⇒ The 2 variables not independent (ie have association)

(because % are not the same)

12

Comments on clustered bar charts: explanation

Never compare the actual frequencies (sizes).

Only compare % (or proportions) (shapes).

Since proportions are almost the same, ie about 1/3 and 2/3 for smokers and non-smokers,

smoking status is independent of Activity Level (no association)

13

Comments on clustered bar charts: explanation

Never compare the actual frequencies (sizes).

Only compare % (or proportions) (shapes).

Since percentages of smokers and non-smokers are obviously different for males and females, there is an association between smoking status and gender.

1414

Different shape, (although same size)

similar in shape (although different sizes)

1515

The 8½ hypothesis tests in STAT170

DATA categorical numerical

categorical

numerical

Clustered barchart

Chi sq test of

association + OR

comparative

boxplots

2-sample t test

bar chart

z-test of proportion

or chi sq test of

proportions

comparative

boxplots

2-sample t test

scatter plot

T-test of ββββ

Histogram1-sample Z or

t test

Histogram1-sample Z or

t test

bar chart


or chi sq test of

proportions

Note: 7 tests above + paired t-test +”OR”= 8½ tests in STAT170 1616

Determining numerical vs. categorical

You only need to be able to identify between numerical and categorical.

No need to further classify into continuous or discrete(=integer), nor further classify into nominal or ordinal.

If you cannot distinguish between nominal and ordinal, you’ll only lose a few marks in Q.1.

But how about numerical vs categorical ? See next slide.

1717

Example: Numerical vs Categorical

Age: age in years

⇒ Numeric (continuous)

⇒ Histogram / stem-leaf

=> z-test or t-test

Age: 0-12 children (1),

13-18 teenager (2),

> 18 adult (3),

⇒ Categorical (ordinal)

⇒ bar chart /pie chart

=> Chi sq test of proportions (GOF test)

A mistake will cost you at least 6 marks in HATPC, plusother marks in subsequent parts of the questions.

The key is look at the definition, not the meaning we use in daily language. Read the question! The results are unchanged if we use the names ABC or XYZ instead of AGE.

1818

No one can help you …

How many such mistakes can you afford to make in exam? 3 such mistakes => you’ll fail in STAT170

You have absolutely no hope of passing STAT170 if you cannot distinguish between numerical and categorical variables – since the whole philosophy of STAT170 is based on classifying categorical and numerical variables. (This is unlike other 1st-year stat courses in other universities.)

1919

Absolute bottom line:1. HOW MANY variables?

2. Are the variables numerical or categorical?

Answering these 2 questions correctly will lead you to one of the 5 cases, and almost the correct test. The HATPC is then, hopefully, bookwork.

2020

How students fail ?But many students already have trouble in the first question: How to determine how many variables are there?

For example,

How many variables are there? 3 or 1?

Think of the survey. How many questions? 3 or 1?

How many columns do you need to store the data? 3 or 1?

You are doomed if you choose “3 variables”. In fact there is no test in STAT170 that involves 3 variables.

Who do you find it easier to make friends with?

0

50

100

150

200

250

300

350

400

same sex opposite sex either

response

frequency

21

How students fail ?

Another example:

How many variables are there? 1, 2 or 4?

You are doomed if you choose “4 variables”.

Smoker Non-smoker

Male 4 11

Female 5 8

22

Getting a pass in STAT170

You need to be able to do ALL of the following:

1. Count how many variables

2. Identify the variables as numerical or categorical

3. Do ALL 8½ hypothesis tests

You will fail in STAT170 if you cannot do just ONE of them!

(In fact, if you can do ALL of them well, a Cr is guaranteed.)

2323

Variable(s) Graphics Research Question

(e.g.)

Answering the research Q: Formal stat test

One

categorical

Barchart,

pie chart

•Is the proportion of smokers equal to 0.3?

•Are the proportions of meat-eaters, vegetarians & vegansequal to 0.8, 0.15 & 0.05?

•z-test of proportion (Lect 7) – 2 categories only

• χ2 test of proportions (GOF ) (Lect 11) -- 2 or more categories

One

numerical

Hist, stem-leaf, boxplot

Is the mean equal to …? z and t-tests of mean

(Lect 7)

Two

categorical

Clusteredbarchart

Is there an associationbetween … and …?

Chi sq test of association (Lect 11, 12) or Odds ratio

Two

numerical

Scatter plot Is there a relationbetween … and …?

Regression analysis: Test of slope (Lect 9,10)

One categ(binary) & one numeric

Comparativeboxplots

Is there a diff in heights between males and females?

2-sample t-test (Lect 8)

Note: 1. There is the paired t-test which doesn’t fit in any of 5 cases above, perhaps it fits best in the 2nd case (one sample t-test).2. 7½ tests above + paired t-test = 8½ hypothesis tests in STAT170

How to determine the appropriate test

2424

Beware of the paired t-test

The paired t-test may be mistaken as:

• 2-sample t-test

• Regression

Read the given Research Question

If you see “relation” or “predict” => regression

If you see “difference” => 2-sample t or paired t. Then think!

Eg: Weight loss program? Y1=Wt before, Y2=Weight after

2525

How to determine the appropriate testMethod 1

The ONLY SURE way to determine the correct test is to identify the variable types correctly!

Method 2

IF you cannot do (1), then you may look for keywords in the research questions. But be warned it is NOT 100% fool-proof.

• “association” => Chi-sq test of association

• “relation”, “predict” => Regression (with t-test on slope)

• “difference” => 2-sample t-test, or paired t-test

• “Proportion” (singular!), “percentage” => Z-test of proportion

• “Proportions” (plural), “percentages” => Chi-sq test of proportions(GoF)

• “mean”, “average” => One-sample z-test or t-test

See the underlined keywords in the previous slide.

NOT 100% fool-proof! Eg: Are proportions of smokers the same for males and females? => Chi-sq test of association

100% certain

2626

How to determine the appropriate test(continued)

Method 3 (Easiest for you)

Look at the given graphic, then deduce the appropriate test. This is almost certain, but many questions do NOT show graphs!

• ONE histogram/stem-leaf => z-test or t-test or paired t

• Bar chart/pie chart => chi-sq test of proportions (GOF)

(if binary, GOF or z-test of proportion)

• Clustered bar chart => chi-sq test of association

• Scatter plot => regression: test of slope

• TWO histograms/stem-leafs and/OR comparative box plots => 2-sample t

2727

3 types of statistical tests involving categorical data

Statistical test

Keywords in Res. Q

Ho Assumptions Test statistic


Proportion, %

Ho:π= π0 nπ0≥5, n(1-π0)≥5

Chi sq goodness of fit (chi sq test of proportions)

Proportions, percentages (plural)

Ho: π1=…,π2=…, π3=…

Ei=n*πi ≥5df=c-1

Chi sq test of independence (no association)

Association, independent, proportions

X and Y are independent

df=(r-1) (c-1)

n

pz

)1( 00

0

πππ−

−=

∑−

=j

jj

E

EO 22 )(

χ

∑−

=ij

ijij

E

EO 22 )(

χ

5

totalgrand

totcol tolrow

≥

×=iE

2828

3 types of statistical tests involving categorical data (CONTINUED)

Ho 95% C.I. Conclusion

(NOT reject Ho)

Conclusion (reject Ho)

Ho:π= π0 . . . . . . . . . Proportion π could be equal to π0

Proportion π is higher/lower than π0.

Ho: π1=…,π2=…, π3=…

. . . . . . . . . Read from computer output

The proportions π1=…, π2=…, π3=… COULD be correct.

The proportions π1=…, π2=…, π3=… are NOT correct.

X and Y are independent

. . . . . . . . .

-----------

X and Y COULD be independent (not associated)

X and Y are dependent (associated)

n

ppp

)1(96.1

−±

Copy Ho + “could be”

Opposite of Ho + is higher/lower

2929

5 types of statistical tests involving continuous data

Statistical test Keywords in Res. Q.

Ho Assumptions Test statistic

1-sample z-test of mean Mean,

average

Ho:µ=µ0

(σ known)Normal population, or

n ≥25 (CLT)1-sample t-test of mean

Ho:µ=µ0(σ unknown) df=n-1

Paired t-test difference Ho:µd=µ0 Difference from normal popn, or n ≥25 (CLT)

df=n-1

2-sample t-test difference Ho:µ1=µ2 Both groups from normalpopn, same SD

df=n1+n2-2

Test of linear relation between 2 variables

Relation, predict

Ho: β=0 LinearRes normalRes const SD

t=b/SEb

df=n-2

n

yz

/0

σµ−=

ns

yt

/0µ−=

ns

yt

d

dd

/

µ−=

nns

yyt

2

1

1

1p

21

+

−=

3030

5 types of statistical tests involving

continuous data (CONTINUED)

Ho 95% CI Conclusion

(NOT reject Ho) (Reject Ho)

Ho: µ=µ0

(σ known). . . . . . . . .

Ave xxx COULD be equal to µ0

Ave xxx is higher/lower thanµ0

Ho: µ=µ0(σ unknown)

. . . . . . . . .

Ho: µd=µ0

(paired t) . . . . . . . . .

The difference COULD be µ0 on

ave

The difference is higher/lower than µ0 on ave

Ho: µ1=µ2

(2-sample t)

. . . . . . . . . There COULD be no difference between ave xxxand ave xxx

Ave xxx is higher/lower thanave xxx

Ho: β=0b ± tn-2 SEb

There COULD be no relation between X & Y

There is a positive/negative relation.

ny

σ96.1±

n

sty n 1−±

n

sty dnd 1−±

21

21

11

)(

nnst

yy

p +±

−

ν

In ALL hypothesis tests, include CI in the conclusion.

Copy Ho + “could be”

Opposite of Ho + is higher/lower

3131

Examples of the 8 HATPCs?

It is assumed that you know them well at this stage. There are tons of examples of EACH in Lecture and Tutorial notes.

You have absolutely no hope of passing STAT170 if you cannot do the 8 HATPCs – since hypothesis tests, and related questions, span more than 60% of exam materials.

3232

1. One sample t-test (See Tutorial 8)

2. One-sample z-test

3. Paired t-test

4. 2-sample t-test

5. Z-test of proportion

6. Regression

7. Chi-sq test of proportions

8. Chi-sq test of independence (See Lect 13)

8 types of Simple Reports – involve only 1 hypothesis test only reports

3333

Key points to write in the Simple report

(Check list) – 1-hypothesis-test only

Introduction

*What this study is about, and why this study – if known

*Research question – any wording is OK

*Target population

Method

*How the sample was collected (why random and representative)

*Define variables

*Statistical method used

*Null hypothesis

*Justify assumptions [put under Method or Result, depending on the type of test]

3434

Results (NO HATPC; NO calculations)

*Test statistic

*P-val, decision (reject/not reject null)

Conclusion

*Decision in words: There is evidence/no evidence …

[Check that the research question is answered.]

*Your conclusion should be almost the same (several sentences) as the conclusion you have in the proper hypothesis test (HATPC), e.g. 95% CI if appropriate.

Note: It is most important that you identify the correct statistical method used (how???). For example, if it is a chi-sq test and you mention t-test, then the rest does not make sense, and you’ll lose most of the marks –and your time!

3535

Complex Reports: Involve severalhypothesis tests

Reports involving hypothesis tests of the same type:

• SIBT 2008B, 2009A – regressions • MQC 2009A, 2009C, 2010B, 2010C – regressions• SIBT 2009C, MQC 2010A – chi squares • University 2007, Term 2 – 2-sample t

Reports involving hypothesis tests of different types:

• SIBT 2008C, 2009B – 2-sample t & chi squares• MQC 2009B, 2011A, 2011B – regressions & 2-sample t 3636

Note: No matter how complicated it may appear (many X’s), there should only be ONE Y. (Several Y’s would bring you to post-graduate level!)

Since so many (at least 5) cases are possible,

it is stupid to copy a sample report (eg the one inTute 8) in your crib sheet, since there are

• 8½ possible simple reports

• at least 5 complex reports

3737

1st Example: SIBT 2008B exam (report question)

(I do not have a copy of the exam paper.)

Given 6 regressions (6 tables and 6 scatter plots):

Y vs x1, y vs x2, … y vs x6

Research Question: Which variables X1, … X6 are significant predictors for Y, and which bestpredicts Y?

3838

2nd Example: SIBT 2008C examResearch question: Which variables X1, X2, X3 and X4 affect Y?

Y and X4

Y and X3

Y and X1

Y and X2

3939

1st General Rule for COMPLEX report

Discard the bad variables:

• those where assumptions are violated – not valid.

• those whose p-val > 0.05 (ie those where Ho is NOT rejected, because null hypothesis represents no effect)

(eg no difference in 2-sample t, no relation in regression, no association in chi-sq test)

Variable P-val Significant variable?(Reject Ho?)

Result

X1 0.01 Yes (Reject Ho) Keep X1

X2 0.08 No (Not reject Ho) -------------(Discard X2)

X3 0.02 Yes (Reject Ho) Keep X34040

1st General rule for COMPLEX report

Warning: Common mistake:

• P-val<0.05 => reject Ho => reject the variable X Keep X

• P-val>0.05 => not reject Ho => => not reject variable X

Discard X

Golden rule: You may avoid mentioning Ho!

• p-val<0.05 => Keep X (Small prob (<5%), alarm bell rings)

• P-val>0.05 => Discard X

Warning: If you misunderstand the above, the conclusion of your report will be exactly opposite of what it should be, and you will lose MANY marks!

4141

2nd General rule for complex report

Sometimes the question may ask for the BEST variable that determines Y. Choose the best one within each group. Do NOT compare the p-val of one type of graph with the p-val of another type of graph. (Compare an apple with an apple; compare an orange with an orange.)

:

Regressions � choose best X

:

:

2-sample t’s � choose best X

:

:

Chi squares � choose best X

: 4242

What is the “best” X and how to choose it?

• In EACH set, choose the variable with the smallest p-val(ie the one that strongly rejects Ho) – EXCEPT regression.

• For regression, choose the largest r2, not smallest p-val

43

Example of choosing/discarding variables

Hence only X2 and X3 are significant (important) variables affecting Y. And X3 is the best predictor for Y.

Variable Assumptions

satisfied?

P-val Significant variable?

(Reject Ho?)

r2 Result

X1 No ----- ----- ----- -----

X2 Yes 0.006 Yes 0.53

X3 Yes 0.000 Yes 0.67 Best

X4 No ----- ----- ----- -----

X5 Yes 0.07 No (p-val>5%) ----- -----

An example on regression to illustrate 1st general rule:

Needed for

choosing

the BEST

variable(s)

4444

2nd Example: SIBT 2008C examResearch question: Which variables (X1, X2, X3 and X4) affect Y?

Y and X4

Y and X3

Y and X1

Y and X2

4545

Compare:

• Y vs WT : p-val = 0.00055

• Y vs STARTS: p-val = 0.0012

Both p-val< 0.05 => both Wt and Starts affect Y, but Wt has a stronger effect (because of smaller p-val).

Y vs Wt

Y vsStarts

4646

Compare:

• Y vs WIN: p-val=0.5641

• Y vs PAYOUT: p-val=0.0000

Hence WIN has no effect on Y. Payout has an effect.

Y vs WIN

Y vsPayout

4747

Key points to write in the COMPLEXreport (No rigid rules!)

Introduction

*Research question

*Target population

Method

*How the sample was collected (why random and representative)

*Define the Y and X variable(s)

*List ALL statistical methods used

*Check assumptions [put under Method or Result, depending on the type of test] in EACH case. (But AVOID lengthy repetitive checking the assumptions one by one.)

4848

Results (NO HATPC; NO calculations)

*Discard poor ones (assumptions violated, or p-val>0.05)

(AVOID lengthy repetitive checking p-val one by one.)

*IF required by the question, pick the best one within each group.

Conclusion

Answer the research question!

---------------------------------------------------------------

BTW, what is the research question like?

Two possibilities:

• Which of the variables X1, x2, …. affect variable Y?

• Which of the variables X1, x2, …. BEST affect variable Y?

4949

Hints and Tips: normal tables

1. 2-tailed normal table vs. 1-tailed normal table:

• 1-tailed – probability calculations

• 2-tailed – hypothesis testing

Suggestions:

The FIRST thing you should do in exam, before you start writing anything, is (on the two z-tables):

(This applies to the HD students as well!)

5050

5151

2. T statistic and the “tν” in C.I.

(This applies to ALL t tests: 1-sample t, 2-sample t, t in regression slope, paired t.)

• The t-statistic is calculated (not read from tables)

• The “tν” in 95% CI is read from table (row νand column 0.05)

The SECOND thing you should do in exam, before you start writing anything, is (on the t-table):

(This applies to the HD students as well!)

Hints and Tips: t and tcrit

5252

53

Hints and Tips: chi sq table

3. You should only use the top few rows of chi sq.

5454

Hints and Tips: y and y-bar

4. in probability calculations:

Look for the keyword “mean” or “average” => y-bar.

Note that there are NO such formulas:

n

yzvs

yz

σµ

σµ −=−= .

n

yzvs

yz

σµ

σµ −=−= .

5555

Hints and Tips: 2-sample t and paired t

5. Paired-t test vs. 2-sample t-test

No rules!

1st clue:

Different n1 & n2 => CANNOT be paired t-test; must be 2-sample t-test

If n1=n2 => either test is possible.

5656

5.

2nd clue:

Ask yourself “Can I move the values of one variable without moving the corresponding values of the other variable?”

• Can move values of one variable => independent data => 2-sample t

• Cannot move values of one variable (need to move BOTH variables) => dependent data => paired-t

5757

2nd clue:

From Lect 13: Age difference between husband and

wife

Can we swap the fathers’ of ages “33”and “46” WITHOUT moving the wives and the babies?

• Move alone =>indep’t => 2-sample• Move pairwise together => paired t

Baby ID

Mother’s age

Father’s age

21 28 33

22 34 40

23 24 26

24 34 45

25 32 35

26 24 27

27 30 39

28 29 27

29 37 34

30 41 46

5858

Hints and Tips: z and t tests

6. Z-test vs. t-test

• Know population standard deviation σ => z-test

• Do NOT know σ => t-test

Clues:

* “It is known that SD=xxx” => likely σ => z-test* Given numerical summary of data (MUST be sample):

The SD from a data set (sample) MUST be s, never σ=> t-test

* Do watch out if both σ and s are given. Once we have σ, s is useless => use z-test.

n mean StDev

xxxx xxxx xxxx

59

7. This is a common mistake: “When sample size is large (n≥25), the sample is approximately distributed.”

The statement means that if we make a histogram of the sample (n≥25), then the histogram should be approximately bell-shaped. This is NOT CLT; it is WRONG!

We know that as sample size n becomes larger and larger, the sample histogram looks more and more like the population, which could be anything.

The above statement is NOT CLT. The correct statement of CLT is: “When sample size is large (n≥25), the sample mean (y-bar) is approximately normally distributed.” The applies to one-sample z or t test, 2-sample t and the pair t-tests.

Hints and Tips: CLT

6060

Tips and Hints: Which condition?

nπ≥5 and n(1-π)≥5, or np≥5 and n(1-p)≥5 ?8. Lect 5 (prob calculation on p) or

Lect 7 (z-test on π)Check nπ≥5 and n(1-π)≥5

Lect 6: CI for ππππππππCheckCheck npnp≥5 and n(1-p)≥5Rule: Rule: p p goes goes with p, with p, ππππ goes with ππππ,

p NEVER goes with ππππ together.

Note that although the above 2 formulas are in the formula sheet, the 2 corresponding conditions are not. You have to know which one is the correct condition for checking.

n

pz

)1( πππ

−−=

n

ppp

)1(96.1

−±

6161

9. Find pth percentile

(a) Given ANY sample of size n, use the formula:

n*p/100 (Lect 2)

Then check result is integer or non-integer etc.Eg AGE: 12, 17, 28, 32, 33, 40, 40, 67 (MUST be sorted first!)

Tips and Hints:pth percentile

(including LQ, LQ)

6262

pth percentile

100

µ = 100σ = 15

(b) Given population (of infinite size) of known (given) µ and σ:

(i) Given normal:

Find z from the given area p (1-tailed)

Then find y = µ+σ*zEg: “It is known that IQ is normally distributed with mean 100 and SD 15. What is the 10th percentile?”What is the LQ?

(ii) non-normal (or unknown distribution)

CANNOT do it!

6363

10. No association/association between males and females.No association/association between smokers and non-smokers

(In fact, males, females, smokers and non-smokers are NOT variables.)

It should be: “Could be no association/There is association between Sex and Smoking Status.”

Smoker Non-smoker

Male 4 11

Female 5 8

Hints and Tips: Association

6464

Hints and Tips: Writing conclusion when Ho is NOT rejected

11. Many versions, hence students are confused.

Eg in 2-sample t-test:

(1) There could be no difference …; (there is strong no evidence to indicate otherwise.)

(2) There is probably no difference …

(3) There is no significant difference …

(4) There is no evidence to indicate a difference …

All of the above are correct!

Please stick to (1), which is easiest! (3) and (4) are double negatives, which you may make mistakes, with (3) being terrible. Keep things simple!

Note that in (2) or (3), if you miss out ‘probably’ or ‘significant’, then “There is no difference …” is wrong (accepting the null hypothesis).

6565

Try the chi sq test of association:

Ho: There is NO association between X and Y

Suppose we do NOT reject Ho.

Conclusion:

(1) There could be no association …; (there is strong no evidence to indicate otherwise.)

(2) There is probably no association …

(3) There is no significant association …

(4) There is no evidence of an association …

Again all of them are correct.

66

Writing conclusion in HATPC: the rules:

Eg 2-sample t: “Ho: There is no difference in exam marks on average for boys and girls.”

Eg chi sq test of association: “Ho: There is NO association between X and Y”

Eg regression: “Ho: β=0” (No relation between X and Y)

P-val<0.05 =>Reject Ho•Negate (make opposite) Ho•Be certain, use the verb “is”•Also give further info: “is greater/less than”, “is longer/shorter” (eg one-sample or 2-sample t) –except chi sq

P-val>0.05 =>Do not reject Ho

•Copy Ho

•Change the verb “is” to “could be”.

67

Writing conclusion in HATPC: Example 1

Eg 2-sample t: “Ho: There is no difference in exam marks on average between boys and girls.”

P-val<0.05 =>Reject Ho“There is a difference in exam marks between boys and girls, with girls have higher average than that of boys.”

P-val>0.05 =>Do not reject Ho“There could be no difference in average exam marks between boys and girls.”

68

Writing conclusion in HATPC: Example 2

Eg chi sq test of association: “Ho: There is NO association between sex and smoking status”

P-val<0.05 =>Reject Ho“There is association between sex and smoking status.”

P-val>0.05 =>Do not reject Ho“There could be no association between sex and smoking status.”

69

Hints and Tips: Symbols – their writings and meanings

12. Last, but not least, MANY students have lots of problems here. Surprising, it is not much more difficult than Primary 1 !!!

(a) Confusion of symbols of similar meanings:

1st yr Uni, STAT170:p=sample proportionπ=population proportionµ and s and σA confusion between p and π, µ and , and s and σwill cost your dearly in exam.

Primary 1:This is my book;this is your book;this is Mary’s book.My book, your book and Mary’s book are not the same.You will be in big trouble if you regard Mary’s money as the same as yours.

y

y

70

Hints and Tips: Symbols – their writings and meanings

(b) Confusion of look-alike symbols:

1st yr Uni, STAT170:µ and u σ and θβ and Bω and wΣ and E

Primary 1:i, j

g, p, qa, o, e, cd, bh, km, nl, 1u, vz, 2

Which is more difficult? Surprisingly, students find the symbols in STAT170 more difficult than the 26 English letters in Primary 1. If you have problems in the left column, you will be in big trouble. You will NOT lose “just a a few marks”, but many!

71

Predicting the future

The following happened in past semesters without exceptions, and WILL likewise occur in the future in this semester (prob=0.99999):

1. Someone will write u instead of µ.2. Someone will copy a sample report (from past exam

papers or Tute 8) onto the crib (pink) sheet.

3. Someone will leave the whole page blank on the hypothesis test on slope in regression, which is the easiest HATPC.

4. Someone will not know the meaning of r2.

5. Someone will write “There is an association between males and females”. This makes no sense at all.

72

Predicting the future

6. Someone will write

7. Someone will use the “formula” for 2-sample t or paired-t

n

yzand

yz

σµ

σµ −=−=

nss

yy

ns

yt

d

dd

/)(

0)(

/ 21

21

−−−=−= µ

7373

Ask yourself …

“How many hours did I spend on STAT170 each week, on average?”

Macquarie University recommends (minimum):

3 credit points * 3 hours

= 9 hours

= 4 hours in class + 5 hours on your own at home

Every WEEK.

74

Profile of students who fail –Failure check list

The followings are common characteristics of those who fail:

• Low class attendance• Did not study on a weekly basis• No/few attempts of online quizzes• #Can do at most one hypothesis test in exam• #Cannot do t-test on regression• #Cannot count how many variables• #Cannot distinguish between categorical and numerical• Do not know parameters vs statistics• Do not know the symbols µ, σ, π, β and ω• Mix up p and π, y-bar and µ, s and σ

75

Failure check list (continued!)

• Did not do the exercises on the tutorial sheets• Gave up assignment(s)• Do not know how to use calculator to find SD• Low marks in Practical Test• Copy past exam solutions, word by word, onto crib sheet• Copy report(s), word by word, onto crib sheet• Do not read past exam papers

How many ticks do you have in the above list ? ____Unfortunately, even just ONE tick, eg “Can do just

one hypothesis test”, can (and will) make a failure!

Note: # = fatal

prepare for stat170 exam

Documents