frequency distribution statistics
TRANSCRIPT
![Page 1: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/1.jpg)
FREQUENCY DISTRIBUTIONS
How to organize, present and analyze data
Content of 60s Pop Songs
YeahActual LyricsBabyOooh
![Page 2: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/2.jpg)
2
Consider the following exampleHow old is John?How old is Mary?How old is Frank?………How old am I?
FREQUENCY DISTRIBUTIONS
![Page 3: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/3.jpg)
3
On the basis of a sample with 40 values, representing the age (in years, thus discrete) of EHL students
40Ages manualCount the number of times each age appears in the sample and chalk it up on the given diagram
EXAMPLE: DISCRETE VARIABLE
![Page 4: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/4.jpg)
4
ABSOLUTE FREQUENCY DISTRIBUTION
Here the y-values represent the frequency in absolute values
![Page 5: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/5.jpg)
5
RELATIVE FREQUENCY DISTRIBUTION
Here the y-values represent the frequency in percentage
240=5%
440=10% 3
40=7.5%
![Page 6: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/6.jpg)
6
THE MOST FREQUENT VALUE: THE MODE
The MODE is found by the Xcel function: MODE (ranges) Result: 21 years
There are 8 21-year old students in this sample. This represents the LARGEST frequency, ie, the MODE
The set of these 8 21-year old students is called the MODAL CLASS
![Page 7: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/7.jpg)
7
SPECIAL CASE
This frequency distribution has two (nearly equal) peaks: Bi-modal distribution
![Page 8: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/8.jpg)
8
The median divides the data in two EQUAL parts:50% of the data’s values are BELOW the MEDIAN value50% of the data’s values are ABOVE the MEDIAN valueXcel function: MEDIAN (ranges)
THE MEDIAN VALUE: A “DEMOCRATIC” VALUE
![Page 9: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/9.jpg)
9
POSITION OF THE MEDIAN
The MEDIAN value is 21.5 years (found by Xcel)Notice that there are 20 students younger and 20 students older than the MEDIAN
![Page 10: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/10.jpg)
10
Median: the central data point of a data set after sorting.If the data has an odd number of values it’s literally the data value in the center of the sorted data set.If the data set has an even number of values it’s the average of the two values closest to the center of the sorted data set.
Example: annual precipitations in Geneva between 1976 and 1993 (mm)
After sortingTo find the position of the Median :
Here:
WHAT IS THE MEDIAN ?
583 890 777 958 875 926 524 756 619730 688 528 901 884 969 1258 850 939
524 528 583 619 688 730 756 777 850 875 884 890 901 926 939 958 969 1258
9.5 value out of 18 Center of the data set
![Page 11: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/11.jpg)
11
THE AVERAGE (AVG) VALUE: A “BALANCED” MEASURE
: the values of the variable
: SUM
: the SUM of ALL the given values
n = number of valuesXcel function: AVERAGE (ranges)
NB: In many textbooks the average is called the “mean”. This gives the honest average a poor image, so it is not used in this course.
Symbol
Formula
𝑥
𝑥=∑ 𝑥𝑖𝑛
![Page 12: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/12.jpg)
12
POSITION OF THE AVG
The AVG value is 21.65 (found by Xcel)This point on the Age axis can be considered the CENTROID of this distribution, hence the idea of a “balanced” value.
![Page 13: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/13.jpg)
13
You made a survey on 10 different families to see how many children they have. You obtained the following observations: 0, 0, 1, 1, 2, 2, 2, 3, 4, 5
Indicate whether each statement is true or false.The mode is 5The average is 2.5The median is 2The variable is quantitativeThe variable is quantitative continuous
QUICK QUIZ
![Page 14: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/14.jpg)
14
When data are classified or in any way grouped, we can calculate the average of the following
= the value of variable at the MIDDLE of the frequency class = the value of the frequency
40Ages computer
THE AVG OF CLASSIFIED DATA
Formula:
![Page 15: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/15.jpg)
15
SYMMETRICAL DISTRIBUTIONS
In perfectly symmetrical frequency distributions, the relative positions of MODE, MEDIAN and AVG coincide
![Page 16: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/16.jpg)
16
ASYMMETRICAL DISTRIBUTIONS
In a asymmetrical frequency distribution the relative positions of these three parameters appear as shown. This distribution is skewed to the right. The mirror image of this situation is also possible.
AVG MEDIANMODE
![Page 17: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/17.jpg)
17
THE RANGE OF A GROUP OF VALUES
Age distribution of 40 students
![Page 18: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/18.jpg)
18
QUICK QUIZ
The distribution is left skewedThe mode is smaller than the median and the averageMode = Median = AverageThe mode is between 50 and 60The average is higher than 5The median is between 4 and 5
From the following frequency distribution, indicate whether each statement is true or false.
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
![Page 19: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/19.jpg)
19
You are given burger sizes of the last 20 burgers sold in one fast food. Answer the following questions.
What is the type of the variable “Burger Size”?Compute the range.Calculate the mode, median and average.Classify the data into 4 classes and compute the frequency distribution.Represent graphically the relative frequency distribution and comment it.
EXERCISE 1
![Page 20: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/20.jpg)
20
QUICK QUIZ
Indicate whether each statement is true or false.
x3= 27 clientsThe sample size is 50 clientsf4 = 18% of days 28 clients came to your restaurantThe median is 28 clientsThe average cannot be calculated
You are reported in the table below the number of clients that came to your restaurant the last 50 days.
Compute the missing valuesxi ni fi Fi
25 5 10.00% 10.00%26 12.00% 27 32.00%28 9 18.00% 50.00%29 11 22.00% 72.00%30
> 30 5 10.00% 100.00%
![Page 21: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/21.jpg)
21
Using data from the customer satisfaction feedback of one service, answer the following questions:
What is the type of the variable?Compute the absolute and relative frequency distribution.Graph the relative frequency and comment your results.
EXERCISE 2
![Page 22: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/22.jpg)
22
GRAPHICAL TOOLS
Use of different graphical representations depends on the nature (qualitative or quantitative) of the variable being studied.
Qualitative Variable
• Circle diagram• Bar chart
Quantitative Variable
• Discret• Bar chart• Steam and Leaf• Box Plot
• Continous• Histogram• Density Curve• Box Plot
![Page 23: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/23.jpg)
23
GRAPHICAL TOOLS: CIRCLE DIAGRAM
Represents the terms of the variable as a disc. Surfaces for each category are determined by angles that are proportional to observed frequencies.
αi =360°*fi
![Page 24: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/24.jpg)
24
GRAPHICAL TOOLS: BAR CHART
Represents the various possible values of the variable according to their absolute or relative frequency.
![Page 25: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/25.jpg)
25
Annual precipitations in Geneva between 1976 and 1993 (mm):
Procedure:Separate each number into a stem and a leaf.Here, we choose the number of hundreds asthe stem and the tens digit as the leafGroup the numbers with the same stems
Remarks:Stem and leaf plots simultaneously show data repartition and data itselfThe leaves are sorted in increasing orderThe most difficult step is the scale choice: tens/hundreds; sometimes 5/50; 2/20, etc…
GRAPHICAL TOOLS : STEM AND LEAF PLOTS
583 890 777 958 875 926 524 756 619730 688 528 901 884 969 1258 850 939 Stem Leaf
5 2 3 86 2 9
7 3 6 88 5 8 8 99 0 3 4 6 7
101112 6
![Page 26: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/26.jpg)
26
QUICK QUIZ
Indicate whether each statement is true or false.
This graphical representation is called a histogram.The average expenditure cannot be calculated.The expenditures distribution is skewed to the left.The median is at 21.
As a marketing consultant you observed 50 consecutive shoppers at a grocery store, and recorded how much money each shopper spent in the store.
The following graph provides this information.
1| 0 matches for 10 francs
0 2 7 7 8 9
1 0 1 2 3 3 4 4 4 5 5 5 5 7 7 8 8 9
2 0 0 1 1 1 1 4 6 7 9 9
3 1 2 3 3 4 5 6 8 9
4 1 4 6
5 2
6 2 4 4 9
![Page 27: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/27.jpg)
27
QUICK QUIZ
Indicate whether each statement is true or false.
Team 2 is made out of 6 students.The range of the scores is 59.The highest obtained score is 70.The median is 32.40% of the students totaled less than 30 points.The average cannot be calculated.The variable is quantitative discrete.25% of the students have more than 36 points.The circle diagram could be a good graphical representation of the observations.
The scores of a team from the last Statistics quiz are given in the stem and leafs graph below. The quiz was graded on 70pts.
Reading scale :1 | 5 represent 15 points
1 0 7 92 1 1 3 6 83 0 1 3 5 6 7 7 4 1 1 1 25 6 9
![Page 28: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/28.jpg)
28
GRAPHICAL TOOLS: HISTOGRAM
Represents the distribution of the variable taking into account the frequency and amplitude of classes.
Distribution of employees wages according to the salary classes, Switzerland 2008
Monthly net salary, private and public sector (Confederation) together
![Page 29: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/29.jpg)
29
Great visual representation of many important characteristics of a data set.
Data needed:Minimum and MaximumAverageMedianFirst and Third quartiles (Q1 and Q3)
GRAPHICAL TOOLS: BOX PLOT
![Page 30: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/30.jpg)
36
BOX PLOT ILLUSTRATION
![Page 31: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/31.jpg)
38
QUICK QUIZ
From the Box Plot above, indicate weather each statement is true or false.
75% of airports have an annual traffic lower than 100'000 flights. Half of the airports have an annual traffic greater than 70'000 flights. The skew is positive.Two airports in particular have most traffic.
The Box Plot here under represents the Swiss Civil Aviation Airport traffic in 2009.
![Page 32: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/32.jpg)
39
GRAPH EXAMPLES
![Page 33: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/33.jpg)
40
GRAPH EXAMPLESIn October 2012, a well known newspaper published that “the average salary in Switzerland is ranked 6th among 29 countries used for the study. Below is the reference graph published by the OFS (office féféral de la statistique). What can you conclude?
![Page 34: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/34.jpg)
41
QUICK QUIZ
Given this information, indicate whether each statement is true or false?
The data cannot be graphically represented in terms of relative frequency because the last class “8000 and more” is open.The most suitable graph is the circle diagram because the variable "Salary" is Quantitative continuous.A histogram would be the best graphical representation of the data.The steam and leaf graph is not possible because the Variable "Salary" is classified.
We would like to study the distribution of net monthly salary for Swiss employees in 2013. Relative frequencies per class are given in the table below:
Salaryclassification
Relative frequency
0-3000 CHF 2%3000-4000 CHF 14%4000-5000 CHF 24%5000-6000 CHF 20%6000-7000 CHF 13%7000-8000 CHF 9%
8000 and more CHF 19%Total 100%
![Page 35: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/35.jpg)
42
The life cycle of 20 bulbs from the company Superligth SA has been measured during a control. The results obtained are in the stem-and-leaf (see Excel file).
Find the quartiles of this distribution and compute the IQR.Find the average life cycle knowing that the sum of leafs are 18800 hours.Find the mode?
EXERCISE 3
![Page 36: Frequency Distribution Statistics](https://reader031.vdocuments.net/reader031/viewer/2022021815/577ccf401a28ab9e788f4608/html5/thumbnails/36.jpg)
43
Answer the following questions using the available exam grades distribution.
How many students attended the exam?Compute the 5-number summary of the exam results.What is the average grade?Draw the graph of the distribution and comment it.
EXERCISE 4