Prepare for STAT170 Exam

Download Prepare for STAT170 Exam

Post on 21-Apr-2015

1.085 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<p>Basic assumptions about youMany elementary concepts have been skipped. At this stage, it is assumed that you should know them well. In particular, you MUST know how to do HATPC for each of the 8 hypothesis tests. Only important things, or those that inter-connect several topics together, are elaborated here. You have ABSOLUTELY NO hope of passing STAT170 if you do not know the 8 HATPCs. This PP file will NOT push you from F to P. The contents of this file will only help the P or above students, given the presumed basic knowledge.1</p> <p>Binding things together</p> <p>Review of: 5 types of graphics 5 types of research questions 8 statistical tests 8 or MORE types of reports</p> <p>2</p> <p>Displaying Data: 5 types of graphicsDATAcategorical numerical</p> <p>Displaying Data: 5 types of graphics(The following table conveys the same information as the previous slide.)Combination of variable(s) Graphic Bar chart pie chart Histogram stem-&amp;-leaf Clustered bar chart Scatter plot Comparative box plots4</p> <p>categorical</p> <p>clustered bar chart comparative box plots bar chart or pie chart</p> <p>comparative box plots</p> <p>bar chart or pie chart histogram or stem-and-leaf plot</p> <p>One categorical (Lecture 2, 11) One numerical (Lecture 2, 7) Two categorical (Lecture 2, 11, 12) Two numerical (Lectures 2, 9 &amp; 10)3</p> <p>numerical</p> <p>scatter plot histogram or stem-and-leaf plot</p> <p>One categorical and one numerical (Lecture 2, 8)</p> <p>5 types of graphicsSTAT170 is restricted to only 5 types of combinations of variables, 5 different types of graphics, and 5 possible research questions. The most important step is correctly identifying the types of variables: NUMERICAL vs CATERGORICAL. Surprisingly, many students have difficulty in this very first step. The correct/wrong identification of variables would lead you to the correct/wrong: Type of graphic Research question, and Statistical test.</p> <p>How to comment on graphics:1. Comments on a single bar chart(seldom asked)</p> <p>Comment depends on whether variable is ordinal or nominal Ordinal: comment similar to histogram Nominal: comment on which categories have the highest count and lowest frequencies 400350 300 250 200 150 100 50 0 meat vegetarian diet vegan</p> <p>Skewed to the right.5</p> <p>This doesnt make any sense!</p> <p>6</p> <p>2. Comments on a single histogram (or stem-and-leaf plot)1. 2. 3. 4. 5.Freq.500 400 300 200 100 00 5 10 15 20 25 30</p> <p>Comment on shape (skewed left/right, normal) Range from xxxx to xxxx Majority (high frequencies) of data about xxxx Comment outliers (if present) Comment on any unusual features (if present)Assessment</p> <p>Example: U-shaped, high frequencies near both ends, lowest frequencies near the centre U-shaped, but slightly skewed left Range from 0 to 12Freq.100 80 60 40 207</p> <p>Individual Days</p> <p>00 3 6 9 12</p> <p>8</p> <p>3. Comments on comparative boxplots Compare medians Compare spread (IRQ) Compare outliers (Even when there are no outliers, say no outliers.)ClassClass</p> <p>4. Comments on scatter plot Comment on linear/curved? Positive or negative slope? Comment on amount of scatter (big or small?) Comment on outliers, if any Comment on residuals Sym on both sides of the line/normal? Constant SD?UAI</p> <p>day</p> <p>evening</p> <p>15</p> <p>20</p> <p>25</p> <p>30</p> <p>35</p> <p>Age</p> <p>Birth Rate50 45 40 35 30 25 20 15 10 5 0 10 15 20 25 30 35 40 45</p> <p>age marriage 55 50 45 40 35 30 25 20 15 10 10 20 30 40 husband age 50 60 70</p> <p>110 100 90 80 70 60 50 40 30 20 10 -1 0 1 2 GPA 3 4 5</p> <p>9</p> <p>Median Age</p> <p>10</p> <p>5. Comments on clustered bar chartsCompare the shapes of the clusters, NOT the sizes.Shapes (not size) similar The 2 variables independent (ie have no association) (since % are the same)</p> <p>Comments on clustered bar charts: explanation</p> <p>Shapes (not size) not similar The 2 variables not independent (ie have association) (because % are not the same)</p> <p>11</p> <p>Never compare the actual frequencies (sizes). Only compare % (or proportions) (shapes). Since proportions are almost the same, ie about 1/3 and 2/3 for smokers and non-smokers, smoking status is independent of Activity Level (no association)</p> <p>12</p> <p>Comments on clustered bar charts: explanation</p> <p>Never compare the actual frequencies (sizes). Only compare % (or proportions) (shapes). Since percentages of smokers and non-smokers are obviously different for males and females, there is an association between smoking status and gender.</p> <p>similar in shape (although different sizes)</p> <p>Different shape, (although same size)</p> <p>13</p> <p>14</p> <p>The 8 hypothesis tests in STAT170DATAcategoricalClustered barchart Chi sq test of association + OR</p> <p>Determining numerical vs. categoricalYou only need to be able to identify between numerical and categorical. No need to further classify into continuous or discrete(=integer), nor further classify into nominal or ordinal. If you cannot distinguish between nominal and ordinal, youll only lose a few marks in Q.1. But how about numerical vs categorical ? See next slide.15 16</p> <p>numerical comparative boxplots 2-sample t test scatter plot T-test of Histogram 1-sample Z or t testbar chart z-test of proportion or chi sq test of proportions</p> <p>categorical</p> <p>numerical</p> <p>comparative boxplots 2-sample t testbar chart z-test of proportion or chi sq test of proportions</p> <p>Histogram 1-sample Z or t test</p> <p>Note: 7 tests above + paired t-test +OR= 8 tests in STAT170</p> <p>Example: Numerical vs CategoricalAge: age in years Numeric (continuous) Histogram / stem-leaf =&gt; z-test or t-testAge: 0-12 children (1), 13-18 teenager (2), &gt; 18 adult (3), Categorical (ordinal) bar chart /pie chart =&gt; Chi sq test of proportions (GOF test)</p> <p>No one can help you How many such mistakes can you afford to make in exam? 3 such mistakes =&gt; youll fail in STAT170 You have absolutely no hope of passing STAT170 if you cannot distinguish between numerical and categorical variables since the whole philosophy of STAT170 is based on classifying categorical and numerical variables. (This is unlike other 1st-year stat courses in other universities.)18</p> <p>A mistake will cost you at least 6 marks in HATPC, plus other marks in subsequent parts of the questions. The key is look at the definition, not the meaning we use in daily language. Read the question! The results are unchanged if we use the names ABC or XYZ instead of 17 AGE.</p> <p>Absolute bottom line:1. HOW MANY variables? 2. Are the variables numerical or categorical? Answering these 2 questions correctly will lead you to one of the 5 cases, and almost the correct test. The HATPC is then, hopefully, bookwork.</p> <p>How students fail ?But many students already have trouble in the first question: How to determine how many variables are there?to make friends with? Who do you find it easierfrequency 400 350 300 250 200 150</p> <p>For example, How many variables are there? 3 or 1?</p> <p>100 50 0 same sex opposite sex response either</p> <p>Think of the survey. How many questions? 3 or 1? How many columns do you need to store the data? 3 or 1? You are doomed if you choose 3 variables. In fact there is no test in STAT170 that involves 3 variables.20</p> <p>19</p> <p>How students fail ?Smoker Male Female 4 5 Non-smoker 11 8</p> <p>Getting a pass in STAT170You need to be able to do ALL of the following: 1. Count how many variables 2. Identify the variables as numerical or categorical 3. Do ALL 8 hypothesis tests You will fail in STAT170 if you cannot do just ONE of them! (In fact, if you can do ALL of them well, a Cr is guaranteed.)21 22</p> <p>Another example: How many variables are there? 1, 2 or 4? You are doomed if you choose 4 variables.</p> <p>How to determine the appropriate testVariable(s) One categorical Graphics Barchart, pie chart Research Question (e.g.)Is the proportion of smokers equal to 0.3? Are the proportions of meateaters, vegetarians &amp; vegans equal to 0.8, 0.15 &amp; 0.05?</p> <p>Answering the research Q: Formal stat test</p> <p>Beware of the paired t-testThe paired t-test may be mistaken as: 2-sample t-test Regression Read the given Research Question If you see relation or predict =&gt; regression If you see difference =&gt; 2-sample t or paired t. Then think! Eg: Weight loss program? Y1=Wt before, Y2=Weight after</p> <p>z-test of proportion (Lect 7) 2 categories only 2 test of proportions (GOF ) (Lect 11) -- 2 or more categories z and t-tests of mean (Lect 7) Chi sq test of association (Lect 11, 12) or Odds ratio Regression analysis: Test of slope (Lect 9,10) 2-sample t-test (Lect 8)</p> <p>One numerical Two categorical Two numerical</p> <p>Hist, stem- Is the mean equal to ? leaf, boxplot Clustered barchart Scatter plot Is there an association between and ? Is there a relation between and ?</p> <p>Comparative Is there a diff in heights One categ (binary) &amp; boxplots between males and one numeric females?</p> <p>Note: 1. There is the paired t-test which doesnt fit in any of 5 cases above, perhaps it fits best in the 2nd case (one sample t-test). 2. 7 tests above + paired t-test = 8 hypothesis tests in STAT170</p> <p>23</p> <p>24</p> <p>How to determine the appropriate testMethod 1 The ONLY SURE way to determine the correct test is to identify the variable types correctly! Method 2 IF you cannot do (1), then you may look for keywords in the research questions. But be warned it is NOT 100% fool-proof. 100% association =&gt; Chi-sq test of association certain relation, predict =&gt; Regression (with t-test on slope) difference =&gt; 2-sample t-test, or paired t-test Proportion (singular!), percentage =&gt; Z-test of proportion Proportions (plural), percentages =&gt; Chi-sq test of proportions (GoF) mean, average =&gt; One-sample z-test or t-test See the underlined keywords in the previous slide. NOT 100% fool-proof! Eg: Are proportions of smokers the same for 25 males and females? =&gt; Chi-sq test of association</p> <p>How to determine the appropriate test (continued)Method 3 (Easiest for you) Look at the given graphic, then deduce the appropriate test. This is almost certain, but many questions do NOT show graphs! ONE histogram/stem-leaf =&gt; z-test or t-test or paired t Bar chart/pie chart =&gt; chi-sq test of proportions (GOF) (if binary, GOF or z-test of proportion) Clustered bar chart =&gt; chi-sq test of association Scatter plot =&gt; regression: test of slope TWO histograms/stem-leafs and/OR comparative box plots =&gt; 2-sample t26</p> <p>3 types of statistical tests involving categorical dataStatistical test z-test of proportion Keywords in Res. Q Proportion, % Ho Ho:= 0 Assumptions n05, n(1-0)5 Test statistic</p> <p>3 types of statistical tests involving categorical data (CONTINUED)Copy Ho + could be Opposite of Ho + is higher/lower</p> <p>z=</p> <p>p 0 0 (1 0) n(O j E j ) Ej2</p> <p>Ho Ho:= 0</p> <p>95% C.I.</p> <p>Conclusion Conclusion (NOT reject Ho) (reject Ho) Proportion could be equal to 0 The proportions 1=, 2=, 3= COULD be correct. X and Y COULD be independent (not associated) Proportion is higher/lower than 0. The proportions 1=, 2=, 3= are NOT correct. X and Y are dependent (associated)</p> <p>Chi sq goodness Proportions, of fit (chi sq percentages test of (plural) proportions)</p> <p>Ho: 1=, 2=, 3=</p> <p>.........</p> <p>Ei=n*i 5</p> <p>2 = </p> <p>p 1.96</p> <p>p(1 p) n</p> <p>df=c-1</p> <p>Chi sq test of Association, X and Y are Ei = row tol col tot grand total independence independent, independent 5 (no association) proportions</p> <p>2 = </p> <p>(Oij Eij ) 2 Eij</p> <p>Ho: 1=, 2=, 3=</p> <p>.........</p> <p>Read from computer output</p> <p>df=(r-1) (c-1)</p> <p>X and Y are independent</p> <p>......... -----------</p> <p>27</p> <p>28</p> <p>5 types of statistical tests involving continuous dataStatistical test Keywords in Ho Res. Q. Ho:=0 ( known) Ho:=0( unknown)</p> <p>Copy Ho + could be Ho Ho: =0 ( known) Ho: =0( unknown)</p> <p>5 types of statistical tests involving continuous data (CONTINUED)95% CI (NOT reject Ho)</p> <p>Opposite of Ho + is higher/lower</p> <p>Assumptions Normal population, or n 25 (CLT)</p> <p>Test statisticz= y 0 / n</p> <p>Conclusion (Reject Ho) Ave xxx is higher/lower than 0 The difference is higher/lower than 0 on ave Ave xxx is higher/lower than ave xxx There is a positive/negative relation.30</p> <p>1-sample z-test of mean Mean, average 1-sample t-test of mean Paired t-test difference</p> <p>......... .........</p> <p>y 0 t= df=n-1 s / nyd d sd / n df=n-1 t=t= sp y1 y 21</p> <p>y 1.96</p> <p>n</p> <p>y tn 1</p> <p>Ho:d=0</p> <p>Difference from normal popn, or n 25 (CLT) Both groups from normal popn, same SD</p> <p>Ho: d=0 (paired t)</p> <p>.........</p> <p>yd t n1</p> <p>s n sd</p> <p>Ave xxx COULD be equal to 0 The difference COULD be 0 on ave</p> <p>n</p> <p>2-sample t-test difference</p> <p>Ho:1=2</p> <p>n1</p> <p>+</p> <p>1</p> <p>Ho: 1=2 . . . . . . . . . ( y1 y 2 ) (2-sample t)</p> <p>n2</p> <p> t s p</p> <p>df=n1+n2-2Test of linear relation between 2 variables</p> <p>There COULD be no difference 1 1 between ave xxx + n1 n2 and ave xxx There COULD be no relation between X &amp; Y</p> <p>Relation, predict</p> <p>Ho: =0</p> <p>Linear Res normal Res const SD</p> <p>t=b/SEb df=n-229</p> <p>Ho: =0 b tn-2 SEb</p> <p>In ALL hypothesis tests, include CI in the conclusion.</p> <p>Examples of the 8 HATPCs?It is assumed that you know them well at this stage. There are tons of examples of EACH in Lecture and Tutorial notes. You have absolutely no hope of passing STAT170 if you cannot do the 8 HATPCs since hypothesis tests, and related questions, span more than 60% of exam materials. 1. 2. 3. 4. 5. 6. 7. 8.</p> <p>8 types of Simple Reports involve only 1 hypothesis test only reportsOne sample t-test (See Tutorial 8) One-sample z-test Paired t-test 2-sample t-test Z-test of proportion Regression Chi-sq test of proportions Chi-sq test of independence (See Lect 13)</p> <p>31</p> <p>32</p> <p>Key points to write in the Simple report (Check list) 1-hypothesis-test onlyIntroduction *What this study is about, and why this study if known *Research question any wording is OK *Target population Method *How the sample was collected (why random and representative) *Define variables *Statistical method used *Null hypothesis *Justify assumptions [put under Method or Result, depending on the type of test]33</p> <p>Results (NO HATPC; NO calculations) *Test statistic *P-val, decision (reject/not reject null) Conclusion *Decision in words: There is evidence/no evidence [Check that the research question is answered.] *Your conclusion should be almost the same (several sentences) as the conclusion you have in the proper hypothesis test (HATPC), e.g. 95% CI if appropriate. Note: It is most important that you identify the correct statistical method used (how???). For example, if it is a chi-sq test and you mention t-test, then the rest does not make sense, and youll lose most of the marks 34 and your time!</p> <p>Complex Reports: Involve several hypothesis testsReports involving hypothesis tests of the same type: SIBT 2008B, 2009A regressions MQC 2009A, 2009C, 2010B, 2010C regressions SIBT 2009C, MQC 2010A chi squares University 2007, Term 2 2-sample t Reports involving hypothesis tests of different types: SIBT 2008C, 2009B 2-sample t &amp; chi squares MQC 2009B, 2011A, 2011B regressions &amp; 2sample t 35 Note: No matter how complicated it may appear (many Xs), there should only be ONE Y. (Several Ys would bring you to post-graduat...</p>