____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 1 / 10
SALES AND MARKETING Department
MATHEMATICS
2nd Semester
________ Bivariate statistics ________
Tutorials and exercises
Online document: http://jff-dut-tc.weebly.com section DUT Maths S2.
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 2 / 10
Exercise 1. (Tutorial for lesson page 5)
Are people’s behaviour in relation to tobacco and people’s gender related, with a 10% significant level?
Here are the results of a survey made on a sample of 51 men and 66 women:
G : variable "gender" B : variable "behaviour in relation to tobacco"
Gm : men Bn : never smoked
Gw : women Bs : smoke
Bss : stopped smoking
observed
frequencies:
theoretical frequencies
according to H0: Detailed Chi-squares and total:
Gm Gw Gm Gw Gm Gw
Bn 12 23 Bn Bn
Bs 31 26 Bs Bs
Bss 8 17 Bss Bss
1) Place the subtotals and the general total in the first table, and in the second one, identically.
2) Fill the second table (6 central theoretical values) following proportional calculations.
3) Table #3: calculate the six Chi-square, then add them to get the value χ²calc.
4) Test writing:
Null hypothesis:
Observed χ²
Value of the variable χ² between the observed and the theoretical samples: χ²calc =
Rejection area
Significance level: α =
Number of dof: (r-1)(k-1) =
Value of the variable χ² limit until rejection : χ²lim =
Comparison and decision:
Exercise 2.
Two candidates compete for a presidential election: NS and FH. In a little town, there are 500 voters. 100 are
retired people, 50 are unemployed and 350 are employees. There, the vote results are:
candidates FH NS
blank/
abstention voters
unemployed 24 16 10
employees 122 148 80
retired 36 27 37
1) Decide, with a 1% significance level, whether people’s opinion depends on their social group or not.
2) What can we say if we do not include blank votes and abstentions?
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 3 / 10
Exercise 3.
The table shows attendance in two stores A and B: how many people
made at least one purchase. These clients have been sorted by age group
(10 to 15 years old, and so on).
1. Say, with a 5% significance level, whether the chosen store depends on
the age of a client.
store
age
A B
10 - 15 46 24
15 - 20 29 35
20 - 40 14 17
> 40 12 18
2) What age group mostly contributes to the previous result? Explain.
3) Give the meaning of the “5% significance level” on your first answer.
4) According to your Chi² table, can you be more accurate about the chance taken in this statement (your first
answer)?
Exercise 4.
In a survey, 100 people were asked about their age and their attendance at theatres (cinema). We name X the
variable "age" and Y the variable "number of annual cinema shows". The survey result is the following table of
quotes (fr.: citations) :
Y X [15 ; 25[ [25 ; 50[ ≥ 50
none 4 6 13
1 to 11 10 16 15
12 to 23 13 8 4
≥ 24 6 3 2
1) By a χ² independence test, with a 2% significance level, decide whether there’s a link or not between the
age and the level of attendance at the cinema.
2) Using your form table, discuss the level of confidence you can assign to the assertion : “they are
dependent”.
3) Identify the most important partial Chi-2s and give the meaning of these high values.
Exercise 5. (Tutorial for lesson page 6)
Let’s have a close look of a company’s turnover evolution through time.
2009 2010 2011 2012
tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4
(M€) 28 45 49 36 30 44 48 40 28 46 52 37 31 42 54 39
Though there are big seasonal variations, due to its particular activity, is it possible to find out a global
trend on several years?
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 4 / 10
Let’s decide to calculate and display the 5 by 5 moving means:
(do it as a group job: divide the set of calculations with your neighbours and share your results)
1-5 2-6 3-7 …
X
Y
calculations:
Exercise 6. (Tutorial for lesson page 7)
Let’s take back one of the examples introduced page 3 (lessons doc): effect of the amount of fertilizer on the
harvested production.
fertilizer harvest
plot # X (kg.ha-1) Y (q.ha-1)
1 150 46
2 80 37
3 120 46
4 220 51
5 100 43
1) For each half-cloud, determine the mean points coordinates.
2) Determine the expression of the Mayer’s line (G1G2).
3) On a graph, plot the initial table and draw this line.
Exercise 7.
Determine the expression of the Mayer’s line, taking back the case given in exercise 5.
Exercise 8. (Tutorial for lesson page 8)
Calculate or display on your calculator: the means and standard deviations; the covariance.
1) Taking the data of exercise 6 (fertilizer/harvest)
2) Taking the data of exercise 4 (age/# of cinema shows) – choose 60 as average age for the class 50 and more;
choose 36 as average number of shows for the class 24 and more.
Exercise 9. (Tutorial for lesson page 9)
Let’s consider the following time series: a company’s annual expenses in advertising.
X : year 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Y : expense (k€) 41 60 55 66 87 61 90 95 82 120 125 118
The corresponding scatter plot is represented:
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 5 / 10
Determine the expression of the Y on X fitting line, following the least square method; then, draw it.
Exercise 10.
500 people, having passed their driving license exam, are
sorted in the table below.
They are distributed with respect to the number X of times
they took the exam before passing it and to the number Y of
hours of driving lessons before their first attempt.
1) Define a margin frequency. Then, give an example from the
table.
2) Describe, shortly, the way to enter the data set in your calculator.
3) Calculate the covariance of the pair (X, Y) and give a concrete comment about this value.
4) Among those who took between 15 and 25 hours of driving lessons, what is the rate of those who passed
their exam on the third attempt?
5) Among those who passed their exam on the third attempt, what is the rate of those who took between 15
and 25 hours of driving lessons?
Exercise 11.
A sales agent wishes to analyse his (or her) activity and efficiency. On
each appointment to a prospect have been noted the length (X, in
minutes) of the presentation of the product, and the sold quantity
(Y). The twelve values inside the table were filled with the number of
appointments that correspond to each pair (X, Y).
1) Give the meaning of the frequency "8" found inside the table.
2) Calculate, manually, the average time spent per appointment.
3) Give the covariance of the pair (X, Y).
Exercise 12.
The following table indicates the sales price (€) of an equipment and the number of sold items, for 4 years.
year rank 1 2 3 4
sales price (€) X 300 210 270 375
# of sold items Y 198 240 222 160
1) Build the scatter plot with an orthogonal frame. The axes intersection must be the point (210, 160);
scales: 1 cm for €15 on the abscissas axis, 1 cm for 10 items on the ordinates axis.
year 1: 2006
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 6 / 10
2) Determine the coordinates of G, mean point of the cloud.
3) a. Determine the expression of the Y on X fitting line, following the least square method.
The coefficients will be expressed with 6 significant figures.
b. Draw this regression line on the graph.
4) Which year saw the highest turnover? For which amount?
going further:
5) Now, we assume that, each year, the number of sold items y and the sales price x are related this way:
y = – 0.498 x + 349. We denote S(x) the turnover achieved by selling y items, €x each.
a. Express S(x) with respect to x.
b. Find the variations of the function S defined in [210 ; 375].
c. Deduce the sales price we would have to set for a fifth year if we want a maximum turnover. How many
items will be sold (round to one unit)? For what turnover?
Exercise 13.
A survey wishes to compare people's expense in high tech equipment compared to their sales. Each column
of the table T below represents, in a given French land, the average monthly income of people (X) and the
average monthly expense (Y) in high-tech equipment.
land A B C D E F
income X (€) 1550 1620 1770 1850 1930 2000
expense Y (€) 57 61 66 73 76 82
1) Calculate the covariance and then the linear correlation coefficient of the pair (X, Y).
Give an interpretation of both parameters.
2) a. Give, by the mean of your calculator, the expression of the Y on X regression line.
b. Obtain the expression of the Mayer's line of the series, from the table T.
c. Both lines slightly differ. Find the income for which they both give the same expense. What makes this
common point special, inside the point cloud?
Exercise 14. (Tutorial for lesson page 12)
Data about the fuel consumption of a motorcycle have been
collected. Consumption: Y, in L/100km, speed: X, in km/h) :
X 10 20 30 40 50 60 70 80 90
Y 15.2 11.6 9.3 7.8 7 6.6 6.9 8 9.6
The scatter plot, on the right, clearly shows us that a linear
regression would be inappropriate to describe the evolution of the
consumption with respect to the speed. Thus, we will propose a
variable change.
1) Let’s define the variable T by: T = (X – 60)².
Complete the following table:
T
Y 15.2 11.6 9.3 7.8 7 6.6 6.9 8 9.6
2) Perform a linear regression of Y on T.
3) Thus, deduce the expression of the regression curve, for the initial scatter plot.
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 7 / 10
Exercise 15. quadratic fitting
A company took note of its profits Y with respect to X, produced and sold quantity:
X (tons) 2 3 5 7 11
Y (k€) 38 55 72 69 24
T
1) Thanks to your calculator, give the linear correlation coefficient between X and Y. Comment.
2) Let’s settle the variable T = -(X - 6)².
a. Complete the table.
b. Calculate Cov(T, Y) and then the linear correlation coefficient between both variables.
c. Is a linear fitting of Y on T appropriate?
d. Determine the expression of the Y on T fitting line, following the least square method.
e. Deduce an expression of the regression of Y on X.
Exercise 16. quadratic fitting
A market study was conducted on a new type of product. The table below gives, for several proposed sales
price, the number of people willing to pay that price.
unit price (€) X 2 3 4 5 6 7
number of people Y 66 47 34 25 18 14
1) Calculate the covariance of the variables X and Y, then comment its sign.
2) We set T = X(X - 20)
a. Calculate le the linear correlation coefficient between both variables T and Y.
b. Comment its value.
c. Determine the expression of the Y on T fitting line, following the least square method.
d. Deduce an expanded expression of the regression of Y with respect to X.
3) Here we examine the expected turnover (unit selling price × number of sales), if the numbers of citations
obtained in the survey are considered to be the numbers of units sold.
a. Calculate the turnovers that can be extracted from the initial table.
b. Calculate, for the same values of X, the turnovers CA' that can be got thanks to the formula obtained in
question 2)d.
c. What unit selling price should we fix, so that the best turnover would be reached?
Exercise 17. inverse fitting
A perfumery, on analysing its turnover, connects the sales quantities (Y) to various perfume brands and
models prices (X). The results are gathered in the following table:
X, bottle’s price (€) 15 25 30 40 45 60 75 90
Y, # of sold bottles 202 117 107 82 78 60 55 48
Answer the questions beginning with "calculate" by using your calculator’s results.
1) a. Calculate the covariance of X and Y; comment its sign.
b. Calculate the linear correlation coefficient of X and Y; comment its value.
2) In order to have a more precise idea of how X and Y are related, we set the variable change: 850
TX
=
a. After having calculated the list of values of T, in a third list (calculator), justify that the linear correlation
is excellent between T and Y.
b. Give the expression of the Y on T regression line, according to the least square method.
c. What is the least square criterion?
d. Deduce from question 2)b a modelled expression of Y with respect to X.
e. According to this model, how many bottles whose cost is €150 would the perfumery expect to sell?
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 8 / 10
Exercise 18. (Tutorial for lesson page 13)
Calculate the point estimates, in the given situations.
1) Taking back exercise 9, give an estimate of the expense in 2015.
2) Taking back exercise 6, give an estimate of the quantity of fertilizer that would offer a harvest of 60 q/ha.
3) Taking back exercise 13, give an estimate of the fuel consumption when the speed is 100 km/h.
Exercise 19. (Tutorial for lesson page 13)
Let’s take back exercise 9. We want to estimate the expense, for the year 2015, by a 95% confidence interval.
1) a. Get the values of Y’, from the values of X and the expression of the fitting line;
b. Get the values of Z, by dividing Y by Y’; c. Then, give the mean and standard deviation of Z.
2) Give the point estimate of the expense in 2015.
3) Give the coefficient u corresponding to the confidence level.
4) Then, give the confidence interval.
Exercise 20. (Tutorial for lesson page 13)
With exercise 6, estimate the harvest by a 99% confidence interval, due to 300 kg/ha of fertilizer.
1) a. Get the values of Y’, from the values of X and the expression of the fitting line;
b. Get the values of Z, by dividing Y by Y’; c. Then, give the mean and standard deviation of Z.
2) Give a point estimate of the harvest.
3) Give the coefficient u corresponding to the confidence level.
4) Then, give the confidence interval.
Exercise 21. (Tutorial for lesson page 13)
On each person in a sample, a survey noted the age class (X) and the visual acuity (Y, 1/10 = 0.1):
X
[5 ; 35[ [35 ; 45[ [45 ; 55[ [55 ; 65[
Y
0.3 1 5 10 20
0.6 8 12 25 18
0.9 55 30 14 6
Estimate the visual acuity of a 80 year-old person, by a 99% confidence interval.
Exercise 22.
In a country, two variables are compared: the consumer force index and the turnover of its car industry:
consumer force (index) X 3.26 3.85 3.44 3.08 3.6
car industry turnover (G€) Y 9.3 9.56 9.36 9.24 9.47
1) Give the expression of the Y on X Mayer’s line.
2) By the mean of a point estimate, give a value of the consumer force that would correspond to a G€ 10
car industry turnover.
3) Is a strong correlation between two variables a sign of a cause and effect relationship between them?
Exercise 23. least square + confidence interval
Monthly revenues of a commercial website are listed below, from January to December 2015:
in k€ : 3 5 4 8 10 9 13 12 17 18 18 21
1) In a few words, describe the least square method.
2) Thanks to the global trend of the evolution of the monthly revenue, give the 95% confidence interval of the
predictable revenue in December 2016. (number the months from 1 for January 2015)
3) Give the probability that, in December 2016, the revenue would be less than k€ 29.23.
4) Build the scatter plot (scale: 2 cm for one month), draw the regression line and finally represent the
confidence interval.
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 9 / 10
Exercise 24. Mayer + confidence interval
city X Y The given table includes eight among major cities of a country. The variable X
gives, in thousands, the number of city residents; the variable Y gives, in
thousands, the number of students in this city.
1) Build the scatter plot from this data series.
2) Give the coordinates of the mean point of the cloud.
3) a. Using Mayer’s method, determine manually the expression of the Y on X
regression line.
b. Draw this line. Does G belong to it?
c. Give "Mayer’s principle".
A 850 58
B 623 37
C 587 38
D 360 20
E 312 16
F 275 15
G 262 12
H 244 12
4) We will use here another fitting line, whose expression is: y' = 0.07 x – 6.
a. With this line, give the 95% confidence interval of the predictable number of students in a town that has
two million inhabitants.
b. What can we say about the chances that the number of students would exceed 155,000 in such a town ?
Exercise 25. logarithmic fitting + confidence interval
Service life of some identical office equipment has been studied. In the following table, ti represents the
duration of use - expressed in thousands of hours - and R(ti) the rate of equipment still in use at the time ti.
(e.g. : after 1,000 hours, ti = 1, there are still 90 % left of equipment in use, R(ti) = 0.90)..
ti 1 2 3 4 5 6 7 8 9
R(ti) 0.9 0.66 0.53 0.4 0.32 0.25 0.19 0.14 0.1
1) We set yi = ln[R(ti)] where ln is the natural logarithm. Fill the following table, then build the scatter plot,
using the points Mi (ti, yi), into an orthogonal frame.
ti 1 2 3 4 5 6 7 8 9
yi
2) May a linear fitting be relevant in the previous point?
Calculate the linear correlation coefficient between T and Y.
3) Using the least square method, determine an expression of the Y on T regression line.
Deduce from this expression that there are two positive real numbers k and λ such that: R(t) = k e- λt
.
4) In this question, we'll take k = 1.174 and λ = 0.266.
a. Determine the predictable rate of equipment still in use after 10,000 hours.
b. After how long are there exactly 50 % of equipment still in use?
5) Give a 99% confidence interval of the rate of equipment still in use after 10,000 hours of service.
Exercise 26.
100 children have been classified by age (X) and size (Y):
Y
[95 ; 105[ [105 ; 125[ [125 ; 135[
X
[3 ; 5[ 15 10 0
[5 ; 7[ 8 32 5
[7 ; 9[ 2 13 15
1) Enter this table in your calculator.
2) Give the means and standard deviations of X and Y, calculate their covariance.
3) Calculate their linear correlation coefficient. Comment this value.
4) Nevertheless, does the table allow us to see some trend?
5) Assuming that the relationship between age and size is linear until the age of 12, give the 95% confidence
interval of the size of a 12 year-old child.
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 10 / 10
IUT TC MATHEMATICS FORM FOR BIVARIATE STATISTICS