statistics topic list descriptive vs. inferential statistics concepts of data, variables, scales...

24
Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of Central Tendency Measures of Variability Shapes of Distributions z-scores Correlation 1

Upload: opal-gray

Post on 18-Jan-2016

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Statistics Topic List

Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of Central Tendency Measures of Variability Shapes of Distributions z-scores Correlation

1

Page 2: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Two Main Areas of Statistics:Descriptive vs. Inferential

Descriptive Statistics is used to organize, consolidate or summarize data we have in front of us. Typically in descriptive statistics we describe:

a set of data elements by graphically displaying the information; or its central tendencies and how it is distributed in relation to this

center; or the relationship between two data elements.

Inferential Statistics is a leap into the unknown. We use samples (a selected portion of the data set) to draw inferences about populations (the complete set of data elements).

2

Page 3: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Variables

A good place to begin is with the concept of “variables”. Our students “vary” with regard to many characteristics related to aptitude and achievement. We can think of these variable characteristics using three levels of generality.

3

Page 4: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Making and Reading Frequency Tables Part 1:Frequency Distributions - with special focus on bins (also known as intervals, categories and class intervals)

Purpose of Creating these Tables – To organize data in ways to make our inspection of those data much more manageable.

Frequency Distribution We construct or read a table of counts per score. BUT, when we have many scores, we create intervals (I like

the term “bins”) and place the individual scores in the bins. When making bins:

Determine your score range Determine an appropriate number of bins. Rule of thumb: no fewer than

5 or more than 20 class intervals work best for a frequency table. Make sure no overlap exists so that no data fall into more than one bin. Count each score in its one and only appropriate bin. Notice that in the resulting table, individual scores are lost.

4

Page 5: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Making and Reading Frequency Tables Part 2:Cumulative Distributions

Cumulative frequency distribution: A distribution that indicates cumulative frequency counts (cum f) in each bin, and/or percentage of the total number of cases at and below the upper limit of the associated bin. Sometimes this is referred to simply as cumulative distribution or cumulative frequency.

Note: Educators are using the description statistics of cumulative distributions when speaking of students’ relative standing. Percentile: The point on the original measurement scale at

and below which a specified percentage of scores falls. Also called a percentile point.

Percentile rank: The percentile rank of a score is the point on the percentile scale that gives the percentage of scores falling at and below a student’s specified score.

5

Page 6: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Frequency Distribution Table

6

Page 7: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Tables are Nice, but Pictures are Nicer

Frequency distributions are often converted into graphic form. Bar Graph – Individual counts. The count bins are

separated on the horizontal line. Histogram – Grouped counts. The bins touch each other on

the horizontal line. Pie Graph – Either individual or grouped counts. The media

likes to display data using these graphs.

Explore the CSERD (Computational Science Education Reference Desk) Interactive Website. This is a Pathways project of the National Science Digital Library and funded by the National Science Foundation.

7

Page 8: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Ideas of Data “Centers”; How Does Data Cluster?. . . . starting with a concept from Garrison Keillor.

Keillor’s hometown is Lake Wobegon, located near the geographic center of Minnesota.

Keillor reports that in Lake Wobegon "all the women are strong, all the men are good looking, and all the children are above average."

8

Page 9: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Central Tendency

While graphs and charts are useful to visually represent data, they are inconvenient; they are difficult to display and can not be easily remembered apart from the visual. It is frequently useful to reduce data to a number (sometimes called an index number) that is easy to remember, is easy to communicate, yet captures the essence of the complete data set it represents.

One such index is called Measures of Central Tendency (i.e., how do the raw data tend to cluster) Mean – the arithmetical average Median – the middle score Mode – the most occurring score

So, these are measures of “center” regarding the data, but we are also concerned about how the raw data are spread out around the center.

9

Page 10: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Consider the two graphs below. These graphs represent the scores on two quizzes. The mean score for each quiz is 7.0. Despite the equality of means, you can see that the distributions are quite different. Specifically, the scores on Quiz 1 (top graph) are more densely packed while those on Quiz 2 (bottom graph) are more spread out. The differences among students was much greater on Quiz 2 than on Quiz 1.

10

Page 11: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Variability

Our second index is called Measures of Variability (i.e., how do the raw data tend to spread out or scatter) Range – list the lowest and highest scores, then

take the difference (aka subtract) between them Standard Deviation (S, SD, σ) – this is an

interesting concept; it is akin to finding the average distance that scores are from the center

Variance (SD2) – mathematically the standard deviation squared; we more often use the standard deviation in educational assessment.

11

Page 12: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

12

Page 13: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Shape of Normal Distributions

The frequency histograms for test score data often approximate what is called the “normal distribution” (aka bell curve, normal curve).

The normal curve has three characteristics: unimodal – one hump asymptotic – tails never touch the base symmetrical – mirror image about the center

axis

13

Page 14: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Normal Curve

14

Page 15: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Shape of Other Distributions

Kurtosis – platykurtic looks more flat leptokurtic looks more peaked

Skewness – positive skew means that the tail is to the right negative skew means that the tail is to the left.

-------------------------------------------------------------------- Back to the normal distribution, let’s look at

transforming a data score to a score that will tell us where that score is in relationship to the mean. This score is called a “z-score”.

15

Page 16: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

z-scores

Formula: z = X - M

SD Definition: A measure of how many standard

deviations a raw score is from the mean. If the z score is negative, we say the score is

below the mean If the z score is positive, we say the score is

above the mean

16

Page 17: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

z-scores in normal curve

This Graph Leads In To Percentile Rank

17

Page 18: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Comparing Two Variables

So far we have only dealt with one variable (aka

univariate statistics). Sometimes (I would say

many times) we are curious as to the

relationship between two variables (aka

bivariate statistics). We call this curiosity an

interest in co-relationships or correlation.

18

Page 19: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Some History . . .

Francis Galton (1822-1911)and “Co-relations”

19

Cousin of Charles Darwin Interested in the mathematical treatment of heredity Used statistical analysis to study human variation

noted that arranging measures of a physical trait in a population (height, e.g.) displays a bell-shaped distribution

Coined term "eugenics"—science of improving the stock

variations (deviations) viewed as flaws as well as assets artificial and natural selection will shift median of distribution

Page 20: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

The Eugenics Movement

Scientific “evidence” was used to argue that social ills like feeble-mindedness, alcoholism, pauperism and criminal behavior are hereditary traits.

Aim - "to give the more suitable races or strains of blood a better chance of prevailing speedily over the less suitable"

Can no longer rely on natural selection: unfit survive to childbearing years due to

advances in medicine comforts of civilization social welfare

unfit reproduce at higher rate than fit, Must design society by controlling human reproduction:

encourage fit to have children prohibit unfit from having children

20

Page 21: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Scattergram – Can you “eye ball” the one line you could

draw through the data points that best describes the graphic display?

.

21

Page 22: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Correlation Coefficient – the calculated number that

best describes the relationship between two variables

Correlation coefficient – symbol is “r” – linear relationships Range -1.00 through .00 to +1.00

Sign indicates direction + indicates that as one variable increases, the other variable increases - indicates that as one variable increases, the other variable decreases

Number indicates strength Although the following table is somewhat arbitrary, the following thinking might be useful in interpretation:

-1.0 to -0.7 strong converse association. -0.7 to -0.3 weak converse association. -0.3 to +0.3 little or no association. +0.3 to +0.7 weak direct association. +0.7 to +1.0 strong direct association.

22

Page 23: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Important Notes about “r”:

Not a percentage (decimal makes it look like one)Linear assumption, not curvilinearEqual scatter assumption – no bunchingVariability affects “r”

Greater the variability, greater the “r”Less the variability, lower the “r”

“r” does not imply causation

23

Page 24: Statistics Topic List Descriptive vs. Inferential Statistics Concepts of Data, Variables, Scales Frequency Tables Bar Graphs and Histograms Measures of

Depth Chart

During your YSU field work, you will be asked to organize data through the creation of frequency tables or histograms. Thus, we discussed constructing them as well as understanding them.

Throughout your professional practice, you will be asked to utilize measures of central tendency and variability. Thus, we emphasized understanding them, basic computations, and their relationship to z-scores. These concepts are key to understanding standard scores.

In professional publications you will see correlation coefficients. We discussed (and you were asked to compute) correlation. Correlation is a key tool in exploring our next topic – reliability (and later, validity) .

Hopefully you will see value in computing measures based on your own classroom data. It is actually fun to learn to do these basic descriptive stats with a software package. Commonly used packages include SPSS, SAS, Minitab, and SYSTAT. Any system would be OK. Start simple.

24