hudm4122 probability and statistical inference january 26, 2015

Post on 17-Dec-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HUDM4122Probability and Statistical Inference

January 26, 2015

ASSISTments

• Did everyone get an account for the ASSISTments system?

• Did anyone have difficulties setting up an account?

• First homework is due in a week

Today

• Ch. 1 in Mendenhall, Beaver, & Beaver

• Variables and Variable Types• Graphing Data• Basic Exploratory Data Analysis

Variables

• What is a variable?

Variables

• What is a variable?

• “A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration.” – MBB p. 8

Which of these are examples of variables?

• GPA• Shoe size• Age• Number of correct answers in ASSISTments• Number of times gamed the system in

ASSISTments• Favorite vegetable• Favorite type of pie• Pi

What is a measurement?

What is a measurement?

• A measurement is the result of measuring a variable on a single experimental unit – A person, if you are studying people– A class, if you are studying classes– A pizza, if you are studying pizzas

A measurement

• Person furthest towards my left in the front row, what is your name?

Now I have a measurement

A measurement

• Person furthest towards my right in the second row, what is your name?

Now I have data

• A set of measurements

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

• I only know one exception

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

• I only know one exception

Everyone repeat after me

Everyone repeat after me

• “My data are in this Excel file.”

Everyone repeat after me

• “My data are in this Excel file.”• “Your data aren’t evidence for that

conclusion.”

Everyone repeat after me

• “My data are in this Excel file.”• “Your data aren’t evidence for that

conclusion.”• “His data were hard to collect.”

However…

However…

• I do not recommend insisting that data is plural in bars, on first dates, or at Thanksgiving dinner

Any questions or concerns?

Univariate Data

• A single variable is collected

Height5’11”5’11”5’10”5’6”

Univariate Data

• Two variables are collected (for the same data point)

Height Drum-Playing Skill5’11” 15’11” 25’10” 45’6” 8

Multivariate Data

• 3+ variables are collected

Name Height Drum-Playing SkillJohn Lennon 5’11” 1

Paul McCartney 5’11” 2George Harrison 5’10” 4

Ringo Starr 5’6” 8

Any questions or concerns?

Types of Variables

Quantitative/Numerical Data

• Data that can be expressed as numbers

What are some examples

• Of numerical data?

Ordinal Data

• Refers to data where there is a known order, but either– The data clearly isn’t numbers– The space between values is not guaranteed to be

equal

Examples of Ordinal Data

• Months of the year: January, February, March, April, …

• Agreement level: Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree

• Quality of university: Highly selective, selective, somewhat selective, non-selective

Other examples of ordinal data?

Nominal data

• Values have no order or spacing

• Name• State of Residence– New Jersey is not greater or less than New York

Nominal data

• Values have no order or spacing

• Name• State of Residence– New Jersey is not greater or less than New York– Although my brother might disagree

Other Examples of Nominal Data?

Another name

• Nominal data is often also called categorical data

Another name

• Nominal data is often also called categorical data

• Technically ordinal data is also categorical, but no one ever uses the term that way

Any questions or concerns?

Exploratory Data Analysis

• “Analyzing data sets to summarize their main characteristics”

• “Seeing what the data can tell us beyond the formal modeling or hypothesis testing task”

Goal

• Generate hypotheses• Understand your data better

Often (but not always)done with graphs

Which of these is your favorite type of graph?

• Pie chart• Bar graph• Frequency histogram• Line graph• Scatterplot• Stem-and-leaf plot• Box plot• Other

Pie Chart

• Take a set of categories that add to 100%• Show the proportion each category has

Pie Chart: Example

What is everyone's favorite pie?

PumpkinAppleCherryRhubarbBanana Cream

Interpret This Graph Please

What is everyone's favorite pie?

PumpkinAppleCherryRhubarbBanana Cream

Never Ever Do This:Completely Visually Misleading

Fair use; critique

Let’s make a pie chart

• Using the “your favorite graph” data

Any questions?

Alternative: Bar Graphs

Pumpkin Apple Cherry Rhubarb Banana Cream

0

5

10

15

20

25

30

What is everyone's favorite pie?

Interpret this graph please

Pumpkin Apple Cherry Rhubarb Banana Cream

0

5

10

15

20

25

30

What is everyone's favorite pie?

What are the advantages/disadvantages relative to pie chart?

Pumpkin Apple Cherry Rhubarb Banana Cream

0

5

10

15

20

25

30

What is everyone's favorite pie?

By the way: X and Y axes

Pumpkin Apple Cherry Rhubarb Banana Cream

0

5

10

15

20

25

30

What is everyone's favorite pie?

X axis

Y axis

Strengths of bar graphs

• Categories don’t have to add to 100%• Easier to see small differences between

categories• You can compare variables too

Two-group bar graph

Football Team Chess Team Spiderman Team

0

10

20

30

40

50

60

School Rankings

Midtown High

Harlem Success Academy

Qua

lity

(Hig

her i

s Be

tter

)

Let’s make a bar graph

• Using the “your favorite graph” data

Any questions?

Some suggest always using bar graphs instead of pie charts

Some suggest always using bar graphs instead of pie charts

• “The only thing worse than a pie chart is several of them.” – Edward Tufte

• “Save the pies for dessert.” – Stephen Few

But they’re wrong

But they’re wrong

• Pie charts are good for representing part-whole relationships in really easy to see ways

• Pie charts are good at representing overall proportions

Nice example(Gabrielle, 2013)

Any questions?

Frequency Histogram

• A type of bar graph – But usually when people say “bar graph”, they do

not mean “frequency histogram”– Also: by convention, no space between bars

• X axis shows values or ranges of a quantitative variable

• Y axis shows how many data points have that value or range for the quantitative variable

Example from the book

Visits to Starbucks

Another Example

51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100

0

2

4

6

8

10

12

14

16

18

Exam Grade

Freq

uenc

y

Was this an easy exam or a hard exam?

51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100

0

2

4

6

8

10

12

14

16

18

Exam Grade

Freq

uenc

y

Would you rather be in the blue class or the orange class?

51-5561-65

71-7581-85

91-950

2

4

6

8

10

12

14

16

18

Exam Grade

Freq

uenc

y

0

2

4

6

8

10

12

14

16

18

Exam Grade

By the way: outliers

31-35

36-40

41-45

46-50

51-55

56-60

61-65

66-70

71-75

76-80

81-85

86-90

91-95

96-100

0

2

4

6

8

10

12

14

16

18

Exam Grade

Freq

uenc

y

OUTLIER

If there’s time, let’s make a frequency histogram

• Everybody: What’s your height in feet-inches?

• (Example: I’m 5’9”)

Any questions?

Line Graph

• Shows trends from left-to-right• The trend is usually over time• But it doesn’t have to be…

Example Line Graph

http://www.wilderdom.com/personality/L4-1IntelligenceNatureVsNurture.htmlUsed under Creative Commons License

Example Line Graph(VanLehn, 2011)

(This graph shows perceptions, not data on effectiveness.)

Any questions?

Not going to discuss today

• Stem-and-leaf plot

• Very, very rare to see in actual use• Quite poor for any sizable data set

• If you want to learn about them, see the book

Future Classes

• Scatterplot• Box plot

Upcoming Classes

• 1/28 Describing Data with Numerical Measures– Ch. 2

• 2/2 Describing Bivariate Data (Asgn. 1 due)– Ch. 3

• 2/4 Introduction to Probability– Ch. 4

Questions? Comments?

top related