business statistics assignment 2014

25
MSc Marketing and Business Analysis Marketing Statistics 17/11/2014 B064536

Upload: tiezheng-yuan

Post on 13-Aug-2015

29 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Business Statistics assignment 2014

MSc Marketing and Business Analysis

Marketing Statistics

17/11/2014

B064536

Page 2: Business Statistics assignment 2014

Table of contents

Section number and title page

1. Description of secondary school student alcohol consumption dataset 2

2. Descriptive and summary statistics 3

3. One-tailed test about a population proportion 7

4. Chi-square test of association 9

5. Correlation 11

6. Further analysis 12

7. Limitation 12

8. Conclusion 12

List of references 13

Appendices

Appendix 1. Survey questions 14

Appendix 2. Cross tab 16

List of tables

Table 1. Variables used 3

Table 2. Summary statistics 3

Table 3. Gender frequency 4

Table 4. School year 5

Table 5. Ever had a proper alcoholic drink 6

Table 6. Chi-square test 9

Table 7. Symmetric measures 9

Table 8. Correlation between family attitude and drinking frequency 11

List of figures

Figure 1. Sample gender in percentage 4

Figure 2. School year in percentage 5

Figure 3. Ever had a proper alcoholic drink 6

1

Page 3: Business Statistics assignment 2014

1. Description of secondary school student alcohol consumption dataset

The secondary data was obtained from UK Data Service (2014). The dataset was collected

through a survey conducted by National centre for Social Research (2012) on secondary

school pupils (aged 11 to 15) . The survey (see Appendix 1) aim to gain insight on the

number of youth alcohol drinkers and their drinking behaviour. A total of 7589 valid

responses were gathered.

The key reason for selecting this dataset is to gain insight on student drinkers so as to develop

effective strategies to curb underage drinking. A body of evidence suggest that drinking at a

young age, in particular heavy and regular drinking, can result in physical or mental problems

and put childern at risk of alcohol related accident or injury. More broadly it is also

associated with missing or falling behind at school, violent and antisocial behaviour. It is

therfore necessary to develop strategies to tackle problem drinking at both national and local

level.

This report will firstly provide descriptive and summary statistic about the sample. Next, it

will conduct one-tailed population proportion hypothesis test to investigate the proportion of

UK pupils who drank alcohol before. This is followed by Chi-square test to ascertain if peer

pressure and student alcohol consumption frequence are associated. Following next,

correlation analysis will be conducted to investigate the strength of type of relationship

between family attitude and student drinking frequency. The report will also mention on

further analysis and limitation of dataset.

2

Page 4: Business Statistics assignment 2014

2. Descriptive and Summary Statistics

Table 1. Variables usedVariable name Measurment Analysis conducted

Age Ratio Summary statisticsGender Nominal Summary statistics

School year Ordinal Summary statisticsUnit of alcohol drank in last 7

daysRatio Summary statistics

Ever had a proper alcoholic drink

Nominal Summary statistic and hypothesis test

Peer pressure Nominal Chi-square Family attitude to pupil

drinking Interval Correlation

Monthly usual drinking frequency

Interval Summary statistic, Chi-square, correlation

Table 2. Summary Statistics

Age 11-15 Units of

alcohol drank

in last 7 days

Usual drinking

frequency

(monthly)

NValid 7589 7172 7314

Missing 0 417 275

Mean 13.1735 1.4194 6.7829

Median 13.0000 1.0000 8.0000

Mode 15.00 1.00 8.00

Std. Deviation 1.39074 1.47192 1.68427

Minimum 11.00 1.00 1.00

Maximum 15.00 8.00 8.00

From Table 2, it can be observed that the mean age of the sample is 13 years old. As for

students who drank alcohol before, their mean consumption was 1.4 units. Besides that, the

student’s mean monthly drinking frequency is 6.78 times.

Furthermore, the three variables analysed in Table 2, has sample standard deviation of 1.684

(Usual drinking frequency), 1.472 (Units of alcohol drank) and 1.391 (Age) respectively.

This shows that there is little variability in each variable analysed. Sample standard deviation

is calculated by using the formula: s = √∑ ( x i−x )2

n−1

Next, some sample characteristics will be presented using frequency tables and charts.

Table 3. Gender Frequency

3

Page 5: Business Statistics assignment 2014

Frequency Percent Valid Percent Cumulative

Percent

Valid

Boy 3809 50.2 50.2 50.2

Girl 3780 49.8 49.8 100.0

Total 7589 100.0 100.0

Figure 1. Sample Gender in Percentage

From Table 3, in the sample of 7589 respondents, 50.2% are boy (3809) and 49.8% (3780)

are girl. Figure 1 displayed the gender percentage.

Table 4. School Year

4

Page 6: Business Statistics assignment 2014

Frequency Percent Valid Percent Cumulative

Percent

Valid

Year 7 1481 19.5 19.5 19.5

Year 8 1526 20.1 20.1 39.6

Year 9 1580 20.8 20.8 60.4

Year 10 1553 20.5 20.5 80.9

Year 11 1449 19.1 19.1 100.0

Total 7589 100.0 100.0

Figure 2. School Year in Percentage

Year 7 Year 8 Year 9 Year 10 Year 1118

18.5

19

19.5

20

20.5

21

School Year in Percentage

School Year

Perc

enta

ge

From Table 4, majority of respondents come from year 9 (20.8%), followed by year 10

(20.5%), year 8 (20.1%), year 7 (19.5%) and lastly year 11 (19.1%). Figure 2, clearly display

the respondent’s school year in percentage.

Table 5. Ever had a proper alcoholic drink

5

Page 7: Business Statistics assignment 2014

Frequency Percent Valid Percent Cumulative

Percent

Valid

Yes 3222 42.5 43.1 43.1

No 4256 56.1 56.9 100.0

Total 7478 98.5 100.0

Missing Not answered 111 1.5

Total 7589 100.0

Figure 3. Ever had a proper alcoholic drink

From Table 5, it can be observed that 43% of respondents have had a proper alcoholic drink

before. On the other hand, 57% did not had a proper alcoholic drink before.

Next, one-tailed test about population proportion will be conducted.

3. One-Tailed Test About a Population Proportion (Hypothesis Test)

6

Page 8: Business Statistics assignment 2014

Rationale for conducting one-tailed test about a population proportion:

National Statistic (2013) estimated that 45% of UK pupils (age 11 to 15) had drunk alcohol at

least once. However, according to the data used in this report, it showed that in a valid sample

of 7478 UK students, 3222 pupils had drunk alcohol at least once. It is therefore in the

interest of the researcher to investigate whether the population porportion is really 45% or is

it lower as presented in the data used.

H0: π = 0.45

H1: π ¿ 0.45

Level of significance: 0.05

Test statistic is calculated using the following formula:

Z =

Ρ−π0

σ Ρ

Where:

Population standard deviation =

σ Ρ=√ π 0 (1−π0 )n

Assuming nπ

≥ 5 and n (1-π

) ≥ 5

Checking assumption:

7478×0.45= 3365.1 ≥ 5, 7478×0.55= 4112.9 ≥ 5, therefore assumption holds and the

researcher proceed to calculate test statistic.

Test statistic calculation:

σ Ρ=√ π 0 (1−π0 )n

= √ 0.45 (0 . 55 )

7478=0 . 00575

Z =

Ρ−π0

σ Ρ=

0 . 431−0 . 450. 00575

=−3 . 30

Using critical value approach:

At 5% significance level, crtitical value = - 1.645

Z = -3.30 ¿ - 1.645, therefore reject H0.

7

Page 9: Business Statistics assignment 2014

Checking using p-value approach:

From standard normal cumulative proability table, z = -3.30, p-value = 0.0005

p-value = 0.0005 ¿ 0.05, therefore both approach are consistent, reject H0.

There is sufficient evidence to reject H0 as p-value = 0.0005 ¿ 0.05 and therefore accept H1.

The reseracher conclude that the porportion of UK pupils who had drunk alcohol at least once

are less than 45%, at 95% confidence level.

Next, Chi-square test of association will be conducted.

8

Page 10: Business Statistics assignment 2014

4. Chi-square test of association

Rationale for conducting Chi-square test:

Borsari and Carery (2001) claimed that excessive drinking is associated with peer pressure

among university students. It is therefore in the interest of the researcher to test this claim

among young pupils. In addition, peer pressure is a nominal variable and student drinking

frequency is an interval variable. Therefore, Chi-square test is most appropriate.

H0: Peer pressure and student drinking frequency are independent

H1: Peer pressure and student drinking frequency are dependent

Level of significan: 0.05

Test statistic is calculated using the following formula:

Calculate expected table (see appendix 2) using e ij=

Ri×C jn

Pearson Chi-square statistic (see Table 6) using x2=∑

i=1

r

∑j=1

c (o ij− eij )2

e ij

Table 6. Chi-Square Tests

Value df Asymp. Sig. (2-

sided)

Pearson Chi-Square 155.105 7 .000

Likelihood Ratio 152.157 7 .000

Linear-by-Linear Association 130.174 1 .000

N of Valid Cases 6981

Compare with x 2(r-1)(c-1),alpha = x 2

(2-1)(8-1),0.05 = x 2 7, 0.05 = 14.067

Since Chi-square value = 155.105 > 14.067, there is sufficient evidence to reject H0. There is

association between peer pressure and student drinking frequency, at 95% confidence level.

Chi-square only test whether the relationship exists, therefore, the researcher uses

Contingency coefficient, Cramer’s V and Phi coefficient to measure strength of association.

Table 7. Symmetric Measures

Value Approx.

Sig.

Nominal by

Nominal

Phi .149 .000

Cramer's V .149 .000

Contingency

Coefficient

.147 .000

N of Valid Cases 6981

9

Page 11: Business Statistics assignment 2014

Phi coefficient (Table 7) is calculated using the formula:

φ=¿0.149, Phi can take the value of [-1,1]. Phi=0.149 indicates a weak positive

association. The significance value of 0.000 means Phi value is significant.

Cramer’s V (Table 7) is calculated using the formula:

V = √ xn

2

min (r−1 ) , (c−1 ) = 0.149, Cramer’s V takes value between 0 and 1. V=0.149 indicates

a weak association. The significance value of 0.000 means Cramer’s V is significant.

Contingency coefficient (Table 7) is calculated using the formula:

C = √ x2

x2+n = 0.147, Contingency coefficient takes value between 0 and 1. C=0.147

indicates a weak association. The significance value of 0.000 means Contingency coefficient

is significant.

Therefore, it can be concluded that there is association between peer pressure and student

drinking frequency. However, the strength of association is not strong.

Next, correlation analysis will be conducted.

5. Correlation

10

Page 12: Business Statistics assignment 2014

Rationale for conducting correlation analysis:

National Statistic (2012) reported that family attitude and student drinking frequency are

associated. The researcher is interested in investigating the strength of type of relationship

using correlation. Both variables are in scale measurement, it is therefore suitable to conduct

correlation analysis.

Key theory of correlation:

Correlation is a measure of linear association and does not necessarily indicate causation. The

correlation coefficient can take on values between -1 and +1. Values near -1 indicate a strong

negative linear relationship. Values near +1 indicate a strong positive linear relationship.

Table 8. Correlation between family attitude and drinking frequency

Family

attitudes to

pupil drinking

Usual drinking

frequency

Family attitudes to pupil

drinking

Pearson Correlation 1 .553**

Sig. (2-tailed) .000

N 7183 7147

Usual drinking frequency

Pearson Correlation .553** 1

Sig. (2-tailed) .000

N 7147 7314**. Correlation is significant at the 0.01 level (2-tailed).

From Table 8, it can be observed that parent’s attitude to pupil drinking and usual drinking

frequency have a moderate positive linear correlation of 0.553 at 0.01 significance level. It

can be explained that student drinking frequency is related to parent’s attitude. Family

members who disapproves student drinking tends to be related to lower drinking frequency.

Conversely, parents who does not mind student drinking tends to be related to higher

drinking frequency.

The Pearson Correlation is calculated using the following formula:

r =

sxysx s y =

∑ (x i− x ) ( y i− y )(n−1 )

√∑ (x i−x )2

(n−1 ) √∑ ( y i− y )2

(n−1 ) =

∑ ( xi−x ) ( y i− y )√∑ ( x i−x )2 √∑ ( y i− y )2 = 0.553

where:

sxy = covariance (measure of the of the linear association between two variables)

sx = standard deviation of x

Sy = standard deviation of y

11

Page 13: Business Statistics assignment 2014

6. Further Analysis

The researcher would like to conduct multiple regression analysis on the dataset to model the

form of the relationship on pupil drinking behaviour (dependent) and other independent

variables. Besides that, factor analysis could be applied to identify and confirm the

dimensionality of existing scales. Furthermore, the researcher would also like to conduct

cluster analysis on the dataset to segment student drinkers base on different characteristics.

These further analysis allows researchers to gain insight on student drinkers so that effective

actions could be taken to curb alcohol consumption among young pupils.

7. Limitation

The use of stratified sampling in this research could lead to sampling bias as stratas are

difficult to identify. In addition, in this research, the stratas were divided according to school

type (comprehensive, secondary modern, grammar and private). One key assumption of

stratified sampling is that the stratas are homogenous. However, there is a possibility that the

stratas are heterogenous. For example, in private schools there are single and mixed gender

schools, also there are international schools.

Another limitation of this research was questionnaire administartion. Students were given

paper copy of the questionnaire and were asked to complete the questionnaire within 60

minutes, under exam condition with teacher supervision. The use of paper questionnaire leads

to many missing values as students did not answer all questions. Besides that, the duration of

questionnaire was too long, students might lose interest and not complete the questionnaire.

Lastly, the presence of teacher supervision might pressure students to provide socially

desirable answers.

Future researcher could conduct computer adminstrated questionnaires with skip logic and

compulsary questions. This would reduce the number of missing values. The duration of the

questionnaire could be shortened to around 20 minutes to prevent students from losing

interest. Lastly, there would be no teacher supervision to avoid any pschological pressure on

students.

8. Conclusion

In conclusion, this report has presented descriptive and summary statistic about the sample. It

also conducted hypothesis test on population proportion of UK pupils who drank alcohol at

least one. Chi-square test was also conducted to ascertain if peer pressure and drinking

frequency were associated. Correlation test was conducted to measure the strength of type of

12

Page 14: Business Statistics assignment 2014

relationship between family attitude and drinking frequency. Lastly, this report also

mentioned on further analysis and limitation of dataset.

References

13

Page 15: Business Statistics assignment 2014

Appendix 1. Survey questions

Are you a boy or a girl?

Boy (1)

Girl (2)

Which year are you at school?

Year 7 (1)

Year 8 (2)

Year 9 (3)

Year 10 (4)

Year 11 (5)

How old are you now?

_______Years old

Have you ever had a proper alcoholic drink?

Yes (1)

No (2)

How often do you usually have an alcoholic dink in a month?

0-3 times (1)

4-7 times (2)

8-11 times (3)

12-15 times (4)

16-19 times (5)

20-23 times (6)

24-27 times (7)

28-31 times (8)

How do your parents/guardian feel about you drinking alcohol?

They won’t like me to drink alcohol at all (1)

They don’t like but allow me to drink limited amount (2)

They won’t mind as long as I don’t drink too much (3)

They would let me drink as much as I like (4)

14

Page 16: Business Statistics assignment 2014

Write down the number of pints, half pints, large and small cans or bottles of alcohol that you

have consumed in the past 7 day?

_____Pints

_____Half pints

_____Large can

_____Small can

_____bottle

I drink due to peer pressure

Yes (1)

No (2)

15

Page 17: Business Statistics assignment 2014

Appendix 2. Cross Tab

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

People my age drink

because of pressure from

friends * (D) Usual drinking

frequency (8 cat)

6981 92.0% 608 8.0% 7589 100.0%

16

Page 18: Business Statistics assignment 2014

People my age drink because of pressure from friends * (D) Usual drinking frequency (8 cat) Crosstabulation

(D) Usual drinking frequency (8 cat) Total

Almost

every day

About twice

a week

About once

a week

About once a

fortnight

About once

a month

A few times

a year

Never

drinks now

Never had

a drink

People my age drink because

of pressure from friends

True

Count 10 64 96 199 267 843 133 2599 4211

Expected

Count

10.3 103.8 146.6 250.3 313.1 822.2 121.2 2443.6 4211.0

False

Count 7 108 147 216 252 520 68 1452 2770

Expected

Count

6.7 68.2 96.4 164.7 205.9 540.8 79.8 1607.4 2770.0

Total

Count 17 172 243 415 519 1363 201 4051 6981

Expected

Count

17.0 172.0 243.0 415.0 519.0 1363.0 201.0 4051.0 6981.0