chapter – 4 testing of hypothesis and model building

213

CHAPTER – 4

TESTING OF HYPOTHESIS AND MODEL BUILDING

TESTING OF HYPOTHESIS – ORGANISED RETAIL OUTLETS Testing of Hypothesis: 98

Inferential statistics is concerned with estimating the true value of population

parameters using sample statistics. A statistical hypothesis is a claim (assertion,

statements, belief or assumption) about an unknown population parameter value.

On the basis of sample findings, the hypothesized value of the population

parameter is either accepted or rejected. The process that enables a decision

maker to test the validity (or significance) of his claim by analysing the difference

between the value of sample statistic and the corresponding hypothesised

parameter value is called testing of hypothesis.

Hypothesis 1

One-Way ANOVA

There are three different groups of customers namely customers visit with

family, friends or alone (independent variable) to an outlet.

1. The time spent by the groups (dependent variable – continuous variable)

2. The purchase made by the groups (dependent variable – continuous

variable)

Null Hypotheses

1. Mean time spent by different groups of customers in the organised outlets

was same.

2. Mean amount of purchase made by different groups of customers in the

organised outlets was same

98 Sharma, J.K., Business Statistics, Pearson Education, Second Edition, 2007, pp327-329.

214

Table 4.1: Descriptive Statistics (Time Spent)

N Mean Std.

Deviation

Std.

Error

95% Confidence

Interval for Mean Minimum Maximum

Lower

Bound

Upper

Bound

Gr_tim

Family 150 43.8667 15.02734 1.22698 41.4421 46.2912 13.00 113.00

Friends 150 28.1000 10.17086 .83045 26.4590 29.7410 8.00 45.00

Alone 150 21.4133 9.27448 .75726 19.9170 22.9097 3.00 40.00

Total 450 31.1267 15.05371 .70964 29.7320 32.5213 3.00 113.00

Gr_purc

Family 150 931.50 343.741 28.066 876.04 986.96 200 2000

Friends 150 609.00 227.425 18.569 572.31 645.69 150 1100

Alone 150 462.83 238.577 19.480 424.34 501.33 100 1000

Total 450 667.78 337.170 15.894 636.54 699.01 100 2000

Table 4.2: Test of Homogeneity of Variances (Time Spent – Levene Statistics)

Levene Statistic df1 df2 Sig.

gr_tim 15.555 2 447 .000

gr_purc 4.903 2 447 .008

The significance value of 0.000 in the case of time spent by the groups and

0.008 in the case of purchases made by the groups indicate that the variances of

time spent and the purchases made by the groups do indeed differ significantly

(Table 4.1 and Table 4.2).

The ANOVA table also shows the significance value of 0.000 in both the

cases namely time spent by the groups and purchases made by the groups. This

shows that the mean time spent and the mean amount of purchases made by the

groups differ significantly (Table 4.3).

215

Table 4.3: ANOVA (Time Spent)

Sum of Squares

df Mean Square F Sig.

gr_tim

Between Groups

39872.573 2 19936.287 144.019

0.000 Within Groups 61877.207 447 138.428

Total 101749.780 449

gr_purc

Between Groups

1.725E7 2 8625484.722 114.094

0.000 Within Groups 3.379E7 447 75599.683

Total 5.104E7 449

Table 4.4: Contrast Coefficients (Time Spent) – I

Contrast gr_name

Family Friends Alone

1 0 1 -1

2 1 0 -1

Table 4.5: Contrast Tests (Time Spent) – II

The contrast tests show that the time spent and purchases made by the

groups namely customers visiting alone and customers visiting with friends were

significantly different (with their means) under both the conditions namely

Contrast Value of

Contrast Std. Error t df

Sig. (2-

tailed)

gr_tim

Assume equal

variances

1 6.6867 1.35857 4.922 447 .000

2 22.4533 1.35857 16.527 447 .000

Does not assume

equal variances

1 6.6867 1.12387 5.950 295.499 .000

2 22.4533 1.44184 15.573 248.127 .000

Gr_purc

Assume equal

variances

1 146.17 31.749 4.604 447 .000

2 468.67 31.749 14.762 447 .000

Does not assume

equal variances

1 146.17 26.912 5.431 297.320 .000

2 468.67 34.164 13.718 265.515 .000

216

‘assuming equal variances’ and ‘not assuming equal variances’. As shown by the

Levene’s test, ‘variances are different’ are considered (Table 4.4 to Table 4.6).

The contrast tests show that the time spent and purchases made by the

groups namely customers visiting alone and customers visiting with family were

significantly different (with their means) under both the conditions namely

‘assuming equal variances’ and ‘not assuming equal variances’. But as shown by

the Levene’s test, ‘the variances are different’ are considered in the context (Table

4.4 to Table 4.6).

Table 4.6: Multiple Comparisons – Least Significant Difference (LSD)

Method (Time Spent)

Depende

nt

Variable

(I)

gr_name

(J)

gr_name

Mean

Difference (I-J) Std. Error Sig.

95% Confidence Interval

Lower Bound Upper Bound

gr_tim

Family Friends 15.76667

* 1.35857 .000 13.0967 18.4366

Alone 22.45333* 1.35857 .000 19.7834 25.1233

Friends Family -15.76667

* 1.35857 .000 -18.4366 -13.0967

Alone 6.68667* 1.35857 .000 4.0167 9.3566

Alone Family -22.45333

* 1.35857 .000 -25.1233 -19.7834

Friends -6.68667* 1.35857 .000 -9.3566 -4.0167

gr_purc

Family Friends 322.500

* 31.749 .000 260.10 384.90

Alone 468.667* 31.749 .000 406.27 531.06

Friends Family -322.500

* 31.749 .000 -384.90 -260.10

Alone 146.167* 31.749 .000 83.77 208.56

Alone Family -468.667

* 31.749 .000 -531.06 -406.27

Friends -146.167* 31.749 .000 -208.56 -83.77

*. The mean difference is significant at the 0.05 level.

– In the above table 4.6, the asterisks marked variables show that the mean

of the given two pairs differ significantly.

– This shows that the mean time spent in the outlets by different groups

namely

217

family vis-a-vis friends and alone friends vis-a-vis family and alone alone vis-a-vis family and friends

are significantly different.

– Similarly, from the above table, it may be inferred that the mean amount of

purchases made by different groups namely

family vis-a-vis friends and alone friends vis-a-vis family and alone alone vis-a-vis family and friends

are significantly different.

Hypothesis 2 One-Way ANOVA

H0: Mean of the percentage of sales over different period of a month (namely 1-10, 11-20 and 21-30 days) are same. H1: Mean of the percentage of sales over different period of a month (namely 1-10, 11-20 and 21-30 days) are not same.

Table 4.7: Descriptives (Percentage of Sales)

Days N Mean Std.

Deviation

Std.

Error

95% Confidence

Interval for Mean Min Max

Lower

Bound

Upper

Bound

1-10 150 46.53 8.608 .703 45.14 47.92 30 75

11-20 150 27.22 7.175 .586 26.06 28.38 10 50

21-30 150 26.52 8.216 .671 25.19 27.85 10 40

Total 450 33.42 12.256 .578 32.29 34.56 10 75

Table 4.8: Test of Homogeneity of Variances (Percentage of Sales)

Levin

Statistic df1 df2 Sig.

3.708 2 447 .025

218

Levin Statistics shows the significance value of 0.025. This shows that the

variances of percentage of sales over different period of a month do differ

significantly (Table 4.7 and Table 4.8).

Table 4.9: ANOVA (Percentage of Sales)

Sum of Squares df Mean Square F Sig.

Between Groups 38675.204 2 19337.602 300.464

.000 Within Groups 28768.573 447 64.359 -

Total 67443.778 449 - -

The ANOVA table also shows the significance value of 0.000. This shows

that the mean of the percentage of sales over different period of a month differ

significantly (Table 4.9).

Table 4.10: Contrast Coefficients (Percentage of Sales)

Contrast Days of a Month

1-10 11-20 21-30

1 1 0 -1

2 0 -1 1

3 -1 1 0

Table 4.11: Contrast Tests (Percentage of Sales)

Percentage of Sales Con-

trast

Value of

Contrast

Std.

Error t df

Sig.

(2-tailed)

Assume equal

variances

1 20.01 .926 21.597 447 .000

2 -.70 .926 -.756 447 .450

3 -19.31 .926 -20.842 447 .000

Does not assume

equal variances

1 20.01 .972 20.591 297.353 .000

2 -.70 .891 -.786 292.690 .433

3 -19.31 .915 -21.101 288.628 .000

Under both the conditions namely ‘assuming equal variances’ and ‘not

assuming equal variances’ (which is applicable in the present context), the contrast

219

tests show that the mean of the percentage of sales over the first ten days (1st -

10th) differ significantly with next ten days (11th-20th) and also last ten days (21st-

30th) of a month (Table 4.10 to Table 4.12).

Under both the conditions namely ‘assuming equal variances’ and ‘not

assuming equal variances’ (which is applicable in the present context), the contrast

tests show that the mean of the percentage of sales over 11th – 20th does not differ

significantly with the last ten days 21st– 30th of a month (Table 4.10 to Table 4.12).

Table 4.12: Post Hoc Tests – Multiple Comparisons (Percentage of

Sales) Least Significant Difference (LSD) Method

(I)

Day_

Mon

(J)

Day_

Mon

Mean

Difference

(I-J)

Std. Error Sig.


Lower

Bound

Upper

Bound

1-10 11-20 19.307* .926 .000 17.49 21.13

21-30 20.007* .926 .000 18.19 21.83

11-20 1-10 -19.307* .926 .000 -21.13 -17.49

21-30 .700 .926 .450 -1.12 2.52

21-30 1-10 -20.007* .926 .000 -21.83 -18.19

11-20 -.700 .926 .450 -2.52 1.12

*. The mean difference is significant at the 0.05 level.

In the above table 4.12, the asterisks show that the mean of the given two

pairs differ significantly. It could be inferred from the table that

The mean of percentage of sales over the first ten days (1st – 10th) differ

significantly with 11th – 20th and also 21st – 30th of a month.

The mean of percentage of sales over the second ten days (11th – 20th)

differ significantly with 1st – 10th but not with the 21st – 30th of a month.

The mean of percentage of sales over the last ten days (21st – 30th) differ

significantly with 1st – 10th but not with 11th – 20th of a month.

220

CROSS-TABULATION AND CHI-SQUARE ANALYSIS Hypothesis 3

H0: Footfalls at an organised retail outlet is independent of the location of the outlet. H1: Footfalls at an organised retail outlet is not independent of the location of the outlet.

Table 4.13: Case Processing Summary (Independence of Location)

Cases

Valid Missing Total

N Percent N Percent N Percent

Location *

Footfalls 126 100.0% 0 .0% 126 100.0%

Table 4.14: Location * Footfalls Cross-Tabulation

Location Footfalls

101-150 151-200 201-250 Total

Main Road

Count 26 27 14 67

Expected Count 27.1 25.5 14.4 67.0

Residual -1.1 1.5 -.4

Middle of the

Street

Count 19 9 8 36

Expected Count 14.6 13.7 7.7 36.0

Residual 4.4 -4.7 .3

Shopping

Complex

Count 6 12 5 23

Expected Count 9.3 8.8 4.9 23.0

Residual -3.3 3.2 .1

Total Count 51 48 27 126

Expected Count 51.0 48.0 27.0 126.0

221

Table 4.15: Chi-Square Tests (Location)

Value df

Asymp. Sig.

(2-sided)

Pearson Chi-

Square 5.492a 4 .240

Likelihood Ratio 5.644 4 .227

N of Valid Cases 126

a. 1 cells (11.1%) have expected count less than 5. The minimum expected count is 4.93.

Table 4.16: Symmetric Measures (Location)

Value Approx. Sig.

Nominal by

Nominal

Phi .209 .240

Cramer's V .148 .240

N of Valid

Cases 126

Inference: The Chi-square analysis of the hypothesis is given in the table

4.13 to table 4.15. The table show χ20.05,4 value as 0.240. i.e. the footfalls at an

organised retail outlet is independent of the location of the outlet. The phi and

Cramer’s V measures of association (Table 4.16) are also small and do not

approach significance.

The test conducted showed that the footfall is not dependent on location

alone. There may be other influencing factors that affect the footfalls.

Hypothesis 4

H0: Number of customers is independent of the size of the organised retail outlets. H1: Number of customers is not independent of the size of the organised retail outlets.

222

Table 4.17: Case Processing Summary (Size)

Cases

Valid Missing Total


FD_Area *

FD_NCust 147 100.0% 0 .0% 147 100.0%

Table 4.18: FD_Area * FD_NCust Cross-Tabulation

FD_Area FD_NCust

1000 – 1500 1501 – 2000 2001 – 2500 2501 – 3000 Total

1000 – 1500

Count 45 0 0 0 45

Expected Count 13.8 22.0 4.0 5.2 45.0

Residual 31.2 -22.0 -4.0 -5.2

1501 – 2000

Count 0 72 0 0 72

Expected Count 22.0 35.3 6.4 8.3 72.0

Residual -22.0 36.7 -6.4 -8.3

2001 – 2500

Count 0 0 13 0 13

Expected Count 4.0 6.4 1.1 1.5 13.0

Residual -4.0 -6.4 11.9 -1.5

2501 – 3000

Count 0 0 0 17 17

Expected Count 5.2 8.3 1.5 2.0 17.0

Residual -5.2 -8.3 -1.5 15.0

Total Count 45 72 13 17 147

Expected Count 45.0 72.0 13.0 17.0 147.0

Table 4.19: Chi-Square Tests (Size)

Value df Asymp. Sig. (2-sided)

Pearson Chi-Square 4.410E2 9 .000




223

Table 4.20: Symmetric Measures (Size)

Value Approx. Sig.

Nominal by

Nominal

Phi 1.732 .000

Cramer's V 1.000 .000


Inference: The Chi-square analysis (Table 4.17 to Table 4.19) shows that

the χ20.05,9 value is .000, i.e. null hypothesis could not be accepted. It may be

concluded that the number of customers for an organised retail outlet is not

independent of the size of the outlet. Also, the phi and Cramer’s V measures of

association (Table 4.20) show that the number of customers and size of the

organised outlets are strongly associated.

Hypothesis 5 H0: Number of footfalls at an organised retail outlet is independent of the size of the

outlet. H1: Number of footfalls at an organised retail outlet is not independent of the size of

the outlet.

Table 4.21: Case Processing Summary (Footfalls)

Cases

Valid Missing Total


FD_Area *

Footfalls 132 100.0% 0 .0% 132 100.0%

224

Table 4.22: FD_Area * Footfalls Cross tabulation (Footfalls)

FD Area Footfalls

101-150 151-200 201-250 Total

1000 – 1500

Count 22 14 7 43

Expected Count 17.3 16.9 8.8 43.0

Residual 4.7 -2.9 -1.8

1501 – 2000

Count 29 31 7 67

Expected Count 26.9 26.4 13.7 67.0

Residual 2.1 4.6 -6.7

2001 – 2500

Count 1 6 4 11


Residual -3.4 1.7 1.8

2501 – 3000

Count 1 1 9 11


Residual -3.4 -3.3 6.8

Total Count 53 52 27 132

Expected Count 53.0 52.0 27.0 132.0

Table 4.23: Chi-Square Tests (Footfalls)

Value df

Asymp. Sig.

(2-sided)

Pearson Chi-Square 36.525a 6 .000




Table 4.24: Symmetric Measures (Footfalls)

Value Approx. Sig.

Nominal by Nominal

Phi .526 .000

Cramer's V .372 .000

N of Valid

Cases 132

225

Inference: The Chi-square analysis (Table 4.21 to Table 4.23) shows that

the χ20.05,6 value is 0.000, i.e. null hypothesis could not be accepted. It is concluded

that the number of footfalls at an organised retail outlet is not independent of the

size of the outlet. Also, the phi and Cramer’s V measures of association

(Table 4.24) confirm the same.

CORRELATION ANALYSIS Hypothesis 6

H0: Total Number of Customers depends on Catchment area of an organised retail

outlet. H1: Total Number of Customers does not depend on Catchment area of an

organised retail outlet.

Table 4.25: Correlations (Catchment Area and No. of Customers) – I

Catch_Area No_of_Customers

Catch_Area

Pearson Correlation 1 .179*

Sig. (2-tailed) .028

N 150 150

No_of_Customers

Pearson Correlation .179* 1


N 150 150

*. Correlation is significant at the 0.05 level (2-tailed).

Table 4.26: Correlations (Catchment Area and No. of Customers) - II

Spearman's rho Catch_Area No_of_Customers

Catch_Area

Correlation Coefficient 1.000 .160

Sig. (2-tailed) . .051

N 150 150

No of Customers

Correlation Coefficient .160 1.000

Sig. (2-tailed) .051 .

N 150 150

226

Inference: Both the Pearson’s and Spearman’s tests (Table 4.25 and Table

4.26) show that the number of customers for an outlet and catchment area are

correlated.

Hypothesis 7

H0: Footfalls per day depend on Catchment area of an organised retail outlet

H1: Footfalls per day do not depend on Catchment area of an organised retail outlet.

Table 4.27: Correlations (Footfalls Vs Catchment Area) – I

foot_fall_mid Catch_Area

Foot_fall_mid

Pearson Correlation 1 .240**


Sum of Squares and Cross-products

681420.833 1625.667

Covariance 4573.294 10.911

N 150 150

Catch_Area

Pearson Correlation .240** 1


Sum of Squares and Cross-products

1625.667 67.173

Covariance 10.911 .451

N 150 150

**. Correlation is significant at the 0.01 level (2-tailed).

Table 4.28: Correlations (Footfalls Vs Catchment Area) – II

foot_fall_mid Catch_Area

Spearman's rho

foot_fall_mid

Correlation

Coefficient 1.000 .306**


N 150 150

Catch_Area

Correlation

Coefficient .306** 1.000


N 150 150


227

Inference: Both the Pearson’s and Spearman’s tests show (Table 4.27 and

Table 4.28) that the footfalls at an outlet and catchment area are correlated.

Hypothesis 8 H0: The sales of the outlets affected by the competition is not more than 10%

H1: The sales of the outlets affected by the competition is more than 10%

One-sample t-test is used to test the hypothesis. The result is shown in

Table 4.29 and table 4.30. The p-value (p > .05) is not significant. i.e., statistically

the null hypothesis cannot be rejected. It can be inferred that the competition for

the retail outlets has affected the sales not more than 10%

Table 4.29: One-Sample Statistics (Competition)

N Mean Std.

Deviation

Std. Error

Mean

Compet_Sal_Mid 150 10.0167 6.00263 .49011

Table 4.30: One-Sample Test (Competition)

Test Value = 10

T Df Sig. (2-

tailed)

Mean

Difference


of the Difference

Lower Upper

Compet_Sal_Mid .034 149 .973 .01667 -.9518 .9851

TESTING OF HYPOTHESIS –KIRANA STORES Hypothesis 9 Correlation Analysis of Store Size and Number of Customers

To know whether there exists a relationship between the store size and

number of customers, correlation analysis was undertaken. The result is as follows.

228

H0: The number of customers of a store is dependent on size of the store H1: The number of customers of a store is not dependent on size of the store

Table 4.31: Correlation Between the Size and

the Number of Customers (kirana Stores)

Size NumCust

Size

Pearson Correlation 1.000 .407**


N 200.000 200

NumCust

Pearson Correlation .407** 1.000


N 200 200.000

**Correlation is significant at the 0.01 level (2-tailed).

Inference: The correlation analysis (Table 4.31) shows that there exists a

strong positive relationship between the size of a store and the number of

customers. The Pearson Correlation is r=0.407, p<0.01 (Two-tailed).

Hypothesis 10

Ho: Number of relatives is independent of number of staff working in a kirana store. H1: Number of relatives depends on number of staff working in a kirana store.

Table 4.32: Correlations (Relatives Vs No. of Staff)

Spearman's rho

and Pearson's

Staff Relatives

Staff

Correlation

Coefficient 1.000 .380**


N 200 200

Relatives

Correlation

Coefficient .380** 1.000


N 200 200


229

The testing of hypothesis shows the p value <0.05 (Table 4.32). Therefore,

statistically null hypothesis, that the number of relatives is independent of number

of staff working in a kirana store could not be accepted. That is, the number of

relatives working in a kirana store depends on the number of staff. i.e. more the

number of staff, more the number of relatives working in a store.

Hypothesis 11

Multiple Correlation

H0: Average sales is dependent on size of the store, number of customers of a store, number of footfalls per day, average time spent by a customer. H1: Average sales is not dependent on size of the store, number of customers of a store, number of footfalls per day, average time spent by a customer.

The analysis (Table 4.33 to Table 4.35) show that the average sales is

dependent on number of footfalls per day, average sales per day and the time

spent by a customer.

Table 4.33: Variables Entered / Removedb (Multiple Correlation)

Model Variables Entered Variables Removed Method

1 Avg_time_spent, Aver_cust, Avg_sales, Size, NumCusta

. Enter

a. All requested variables entered. b. Dependent Variable: Avg_sale_store.

Table 4.34: Coefficientsa (Multiple Correlation)

Model

Unstandardized Coefficients

Standardized Coefficients

t Sig.

95% Confidence Interval for B

B Std. Error

Beta Lower Bound

Upper Bound

1

(Constant) -3596.498 238.631 -15.071 .000 -4067.142 -3125.853

Size -.287 .360 -.023 -.798 .426 -.996 .422

NumCust .401 .987 .012 .406 .685 -1.546 2.347

Aver_cust 48.262 3.057 .433 15.789 .000 42.233 54.290

Avg_sales 70.632 2.343 .719 30.142 .000 66.010 75.253

Avg_time_spent

49.916 21.550 .065 2.316 .022 7.414 92.418

a. Dependent Variable: Avg_sale_store

230

Table 4.35: Coefficient Correlationsa (Multiple Correlation)

Model Avg_time_

spent Aver_cust Avg_sales Size

Num Cust

1

Correlations

Avg_time_spent 1.000 -.073 -.095 -.534 -.027

Aver_cust -.073 1.000 -.100 .207 -.532

Avg_sales -.095 -.100 1.000 .027 -.192

Size -.534 .207 .027 1.000 -.345

NumCust -.027 -.532 -.192 -.345 1.000

Covariances

Avg_time_spent 464.397 -4.824 -4.801 -4.135 -.581

Aver_cust -4.824 9.344 -.716 .228 -1.605

Avg_sales -4.801 -.716 5.491 .023 -.445

Size -4.135 .228 .023 .129 -.123

NumCust -.581 -1.605 -.445 -.123 .974

a. Dependent Variable: Avg_sale_store Comparison of traditional kirana stores and modern organised retail outlets

When sample data do not meet the basic assumptions that underlie the

parametric procedure (e.g. normality or homogeneity of variance), nonparametric

methods are used. Kolomogorov-Smirnov test is used to find whether the given

distribution is normally distributed or not. Wilcoxon- Mann-Whitney independent

two sample test is used to find the independence of the two populations.99

i. Comparison of Sales per Sq. foot (Rs.)

The investigator was interested in comparing the sales per square foot of

organised retail outlets and unorganised kirana stores. Table 4.36 and Figure 4.1

show the information regarding sales per square foot. Further, non-parametric test

was used to test the difference. The result is as follows.

99 Carver, Robert H. and Nash, Jane G, “Data Analysis with SPSS”, Cengage Learning, India Edition, 2009

231

Table 4.36: Comparison of Sales per Square foot

Source: Field Survey.

Figure 4.1: Sales per Sq. foot (in Rs.)


Sales (Rs.) Kirana Stores Organised Outlets

No. of Outlets Percentage No. Of Outlets Percentage <250 69 34.5 17 11.3

250 – 500 35 17.5 22 14.7 500 – 750 22 11.0 30 20.0

750 – 1000 15 7.5 48 32.0

1,000 – 1,250 28 14.0 22 14.7 1,250 – 1,500 16 8.0 7 4.7 1,500 – 1,750 5 2.5 2 1.3

1,750 – 2,000 6 3.0 1 0.7 >2,000 4 2.0 1 0.7

Total 200 100.0 150 100.0

232

Hypothesis 12

One sample Kolomogorov-Smirnov test shows that the metric sales per

square foot of kirana stores were not normally distributed (p < 0.05). But the sales

per square foot of organised retail outlets were normally distributed (p > .05)

(Table 4.37).

Table 4.37: One-Sample Kolmogorov-Smirnov Test

(Sales per Square Foot)

Sales per Sq. foot

Kirana Stores Organised Retail Outlets

N 200 150

Normal Parametersa Mean 663.9800 758.8117

Std. Deviation 599.30898 363.54617

Most Extreme

Differences

Absolute .157 .062

Positive .157 .062

Negative -.143 -.050

Kolmogorov-Smirnov Z 2.221 .755

Asymp. Sig. (2-tailed) 0.000 0.620

a. Test distribution is Normal.

Wilcoxon-Mann-Whitney independent two sample test is used to find the

independence of the two populations.

Table 4.38: Mann-Whitney Test (Sales per Square foot)

Ranks

Kira_or_Organ N Mean

Rank

Sum of

Ranks

Sales_Sq_ft.

Organised Retail

Outlets 150 196.97 29545.50

Kirana Stores 200 159.40 31879.50

Total 350

233

Table 4.39: Test Statistics (Sales per Square Foot)

Test Statisticsa

Sales_Sq_Mt

Mann-Whitney U 11779.500

Wilcoxon W 31879.500

Z -3.438

Asymp. Sig. (2-tailed) 0.001

a. Grouping Variable: Kira_or_Organ

The result (Table 4.38 and Table 4.39) suggests that there exists

statistically significant difference between the underlying distributions of the sales

per square foot of organised retail outlets and the kirana stores (z = -3.438, p <

0.05). Mann-Whitney Test is more conservative than the parametric test such as

t-test. The sales per square foot of an organised retail outlets is greater than that of

kirana store.

The two sample independent t-test (Table 4.40 and Table 4.41) does

suggest that the sales per square foot is not statistically different (t=1.833, p>0.05).

Table 4.40: Independent Two-Sample T-test (Sales per Square foot)

Group Statistics

Kira_or_Organ N Mean Std. Deviation Std. Error Mean

Organised Retail

Outlets 150 758.8117 363.54617 29.68342

Kirana Stores 200 663.9800 599.30898 42.37754

234

Table 4.41: Equality of Variances (Sales per Square foot)

Levene's Test for Equality of Variances

t-test for Equality of Means

Sales_Sq_ft F Sig. t Df Sig. (2-

tailed)

Mean Differen

ce

Std. Error

Difference

95% Confidence Interval of the

Difference

Lower Upper

Equal variances assumed

36.66 0.00

0 1.715 348 0.087 94.832 55.285 -13.902 203.566

Equal variances not assumed

1.833 334.60 0.068 94.832 51.739 -6.9437 196.607

ii. Comparison of Sales per Employee (Rs. in Lakhs)

Comparison of the sales per employee of organised retail outlets and

unorganised kirana stores was done. Table 4.42 and Figure 4.2 show the

information regarding sales per employee. Also, non-parametric test was used to

compare the same. The result is as follows.

Table 4.42: Comparison of Sales per Employee (Rs. in Lakhs)

Organises Retail Outlets Kirana Stores

Amount (Rs.)

No. of Outlets

Percentage Amount

(Rs.) No. of Outlets

Percentage

<0.3 3 2 <0.1 21 10.5

0.3 - 0.6 18 12 0.1 – 0.2 30 15

0.6 - 0.9 32 21 0.2 – 0.3 30 15

0.9 - 1.2 21 14 0.3 – 0.4 22 11

1.2 - 1.5 36 24 0.4 – 0.5 20 10

1.5 - 1.8 24 16 0.5 – 0.6 47 23.5

1.8 -2.1 6 4 0.6 – 0.7 0 0

2.1 - 2.4 2 1 0.7 – 0.8 22 11

2.4 - 2.7 0 0 0.8 – 0.9 2 1

2.7 - 3.0 4 3 0.9 – 1.0 2 1

3.0 - 3.3 1 1 >1.0 4 2

3.3 - 3.6 1 1 Total 200 100

> 3.6 2 1

Source: Field Survey. Total 150 100


235

Figure 4.2: Sales per Employee (Rs. in Lakhs)

Hypothesis 13

One sample Kolomogorov-smirnov test shows that the metric, sales per

employee of kirana stores (p<0.04) and organised retail outlets (p<0.03) were not

normally distributed (Table 4.43).

Table 4.43: One-Sample Kolmogorov-Smirnov Test (Sales per Employee)

Sales per Employee

Kirana Stores Organised Retail Outlets

N 200 150

Normal Parametersa

Mean 58303 124299

Std. Deviation 40358 68654

Most Extreme Differences

Absolute 0.099 0.118

Positive 0.099 0.118

Negative -0.075 -0.076

Kolmogorov-Smirnov Z 1.4 1.448 Asymp. Sig. (2-tailed) 0.04 0.03


236

So, Wilcoxon-Mann-Whitney independent two sample test is used to find the

independence of the two populations.

Table 4.44: Mann-Whitney Test (Sales per Employee)

Kira_or_Organ N Mean Rank Sum of Ranks

Sales_Emp

Organised Retail Outlets 150 240.89 36134.00

Kirana Stores 200 126.46 25291.00

Total 350

Table 4.45: Test Statisticsa (Sales per Employee)

Sales_Emp



Z -10.474

Asymp. Sig. (2-tailed) .000

a. Grouping Variable: Kira_or_Organ.

The result (Table 4.44 and Table 4.45) suggests that there is a

statistically significant difference between the underlying distributions of sales

per employee of organised retail outlets and the kirana stores (z = -10.474, p

< 0.05). The two sample independent t-test also confirms the same (t =

10.492, p<0.05) (Table 4.46 and Table 4.47).

Table 4.46: Group Statistics (Sales per Employee)

Group Statistics

Kira_or_Organ N Mean Std.

Deviation

Std. Error

Mean

Sales_Emp

Organised Retail

Outlets 150 124298.7654 68654.31821 5605.60161

Kirana Stores 200 58303.0200 40358.49528 2853.77657

237

Table 4.47: Equality of Means (Sales per Employee)

Levene's Test for Equality

of Variances t-test for Equality of Means

Sales_ Emp

F Sig. t df Sig. (2-

tailed)

Mean Difference

Std. Error Difference


Difference

Lower Upper


16.45 0.00 11.25 348 0.000 65995.75 5866.09 54458.29 77533.2


10.492 224.93 0.00 65995.75 6290.22 53600.46 78391.03

iii. Comparison of Monthly Sales (Rs. in Lakhs)

Comparison of monthly sales of organised retail outlets and unorganised

kirana stores was done. Table 4.48 and Figure 4.3 show the information regarding

monthly sales. Also, non-parametric test was used to compare the same. The

result is as follows.

Table 4.48: Comparison of Monthly Sales (Rs. in Lakhs)

Organises Retail Outlets Kirana Stores

Amount (Rs.)

No. of Outlets

Percentage Amount

(Rs.) No. of Outlets

Percen-tage

<5 19 12.7 <0.25 21 10.5

5-10 23 15.3 0.25 - 0.50 14 7.0

10-15 43 28.7 0.50 - 0.75 25 12.5

15-20 40 26.7 0.75 - 1.0 0 0.0

20-25 6 4 1.0 - 1.25 37 18.5

25-30 9 6 1.25 - 1.50 0 0.0

30-35 6 4 1.50 - 1.75 48 24.0

35-40 3 2 1.75 - 2.0 0 0.0

>40 1 0.7 2.0 - 2.25 7 3.5

Total 150 100 2.25 - 2.50 25 12.5


2.5 -2.75 3 1.5

2.75 - 3.0 15 7.5

>3.0 5 2.5

Total 200 100.0

238

Figure 4.3 : Monthly Sales (Rs. in Lakhs)

Hypothesis 14

One sample Kolomogorov-smirnov test shows that the metric monthly sales

of organised retail outlets and the kirana stores were not normally distributed

(Table 4.49).

Table 4.49: One-Sample Kolmogorov-Smirnov Test (Monthly Sales)

Monthly Sales

Organised Retail Outlets kirana Stores

N 150 200

Normal Parametersa Mean 1.4777E6 146838.7500

Std. Deviation

8.93519E5 92592.35827

Most Extreme Differences

Absolute .146 .142

Positive .146 .142

Negative -.088 -.078

Kolmogorov-Smirnov Z 1.786 2.004

Asymp. Sig. (2-tailed) .003 .001


239

The Wilcoxon-Mann-Whitney test is used to test the hypothesis

H0: The two samples come from same populations

H1: The two samples come from different populations

Table 4.50: Mann-Whitney Test (Monthly Sales)

Ranks

Kira_or_Organ N Mean

Rank

Sum of

Ranks

Month_Sales

Organised Retail

Outlets 150 275.09 41263.00

Kirana Stores 200 100.81 20162.00

Total 350

Table 4.51: Test Statistics (Monthly Sales)

Test Statisticsa

Month_Sales



Z -15.972

Asymp. Sig. (2-tailed) .000

a. Grouping Variable: Kira_or_Organ

The result (Table 4.50 and Table 4.51) suggests that there is statistically

significant difference between the underlying distributions of the monthly sales of

organised retail outlets and the kirana stores (z = -15.972, p < 0.05).

The two sample independent t-test also proves the same (t = 18.17, p<0.05)

(Table 4.52 and Table 4.53).

240

Table 4.52: Independent t-Test (Monthly Sales)

Group Statistics

Kira_or_Organ N Mean Std.

Deviation

Std. Error

Mean

Month_Sales

Organised Retail

Outlets 150 1.4777E6 8.93519E5 72955.54046

Kirana Stores 200 146838.7500 92592.35827 6547.26844

Table 4.53: Test for Equality of Means (Monthly Sales)

Monthly_Sales

Levene's Test for Equality of

Variances t-test for Equality of Means

F Sig. t df Sig. (2-

tailed)

Mean Difference

Std. Error Difference


Difference

Lower Upper


147.387 0.000 20.925 348 0.000 1.33E+06 63602.33 1.21E+06 1.46E+06


18.17 151.402 0.00 1.33E+06 73248.74 1.19E+06 1.48E+06

CLUSTER ANALYSIS

Cluster analysis100 is a collection of statistical methods, which identifies

groups of samples that behave similarly or show similar characteristics. In

common parlance it is also called look-a-like groups. The simplest mechanism is

to partition the samples using measurements that capture similarity or distance

between samples. In this way, clusters and groups are interchangeable words.

Often in market research studies, cluster analysis is also referred to as a

segmentation method.

100 George, Darren and Mallery, Paul, “SPSS for Windows – Step by Step”, Pearson Education, 8th Edition, 2009.

241

Analysis 1: An attempt is made to segment the organised outlets on the

basis of select parameters such as area (sq. ft.), number of customers, footfalls,

catchment area (sq. kms), monthly sales, expenditure and PBDIT. The cluster

analysis resulted in 3 distinct clusters. The first cluster of outlets is of small

category, the second one is of medium category and the third cluster of outlets is of

large category. The percentage of outlets in each cluster is 16%, 33% and 51%

respectively. They are as follows (Table 4.54).

Table 4.54: Clusters based on performance

Cluster 1 2 3

Area (Sq. ft.) 1,000-1,500 1,500-2,500 2,000-3,000

No. of Customers 1,000-2,000 1,000-2,000 1,500-3,000

Footfalls 100-200 100-300 200-400

Catchment Area (Sq. Kms) 2-3 2-4 2-4

Monthly Sales (Rs. in Lakhs)

1-15 5-20 5-40

Expenditure as percentage of Sales

51-80 70-90 61-90

PBDIT (in Rs.) Upto 3 lakhs Upto 4 Lakh 2 – 7 lakhs

No. of SKUs 1,500-3,000 2,000-3,000 2,000-5,000

No. of Outlets 24 49 77

Percentage 16 33 51

Analysis 2: Cluster analysis is used to segment the outlets on the basis of

time spent and amount purchased by customer groups. The analysis resulted in 3

clusters and they are as follows (Table 4.55). The first cluster consisted of outlets

where the customers in different groups spent less time and made less purchases.

This group constituted about one-fifth (19%) of the outlets surveyed. The second

cluster, a major cluster, constituted about half (50%) of the outlets. In this kind of

outlets, the customers spent more time and made more purchase. In the third kind

of outlets, the time spent and purchases made was maximum among the clusters.

242

Table 4.55: Clusters Based on Customer Groups

Cluster

Customers visit with Family

Customers visit with Friends

Customers visit Alone Outlets

Time Spent

Amt Purchased

Time Spent

Amt Purchased

Time Spent

Amt Purchased

Number Percent

age

1 21-30 500-1,000 11-20 250-500 <10 <250 29 19

2 31-40 750-1,500 21-30 500-750 11-20 250-500 77 51

3 41-50 1,000-1,500 31-40 500-1,000 21-30 500-1,000 44 29

Total 150 100

MATHEMATICAL MODELLING

A mathematical model is the set of equations which describe the behaviour

of the system. The majority of interacting systems in the real world are far too

complicated to model in their entirety. But even if a model describes just a part of

the reality it can be very useful for analysis and design — if it describes the

dominating dynamic properties of the system.101 Hence the first level of

compromise is to identify the most important parts of the system. These will be

included in the model, the rest will be excluded.

The second level of compromise concerns the amount of mathematical

manipulation which is worthwhile. Although mathematics has the potential to prove

general results, these results depend critically on the form of equations used. Small

changes in the structure of equations may require enormous changes in the

mathematical methods. Using computers to handle the model equations may never

lead to elegant results, but it is much more robust against alterations.102

101 Bender, E.A., An introduction to mathematical modelling. Wiley, 1978. 102 Cross, M. and Moscardini, A.O., Learning the art of mathematical modelling, Ellis Horwood Ltd. Chichester, 1985.

243

ORGANISED RETAIL OUTLETS – REGRESSION MODELS Model 1 – Multiple Regression for Determining the Monthly Sales.

Monthly Sales of an organised retail outlet depends on number of

customers, footfalls and catchment area.

Table 4.56: Correlations (Number of Customers, Footfalls and Catchment Area)

Area_ Sqm

Catch_ Area

foot_fall_ mid

SKU_ NUMBER

Mon_ Sales

No_of_ Customers

Area_Sqm

Pearson Correlation

1 .199* .367** .398** .508** .577**

Sig. (2-tailed) 0.014 0 0 0 0

N 150 150 150 150 150 150

Catch_Area

Pearson Correlation

.199* 1 .240** .254** .264** .179*

Sig. (2-tailed) 0.014 0.003 0.002 0.001 0.028

N 150 150 150 150 150 150

foot_fall_mid

Pearson Correlation

.367** .240** 1 .509** .395** .579**

Sig. (2-tailed) 0 0.003 0 0 0

N 150 150 150 150 150 150

SKU_NUMBER

Pearson Correlation

.398** .254** .509** 1 .475** .593**

Sig. (2-tailed) 0 0.002 0 0 0

N 150 150 150 150 150 150

Mon_Sales

Pearson Correlation

.508** .264** .395** .475** 1 .834**

Sig. (2-tailed) 0 0.001 0 0 0

N 150 150 150 150 150 150

No_of_Customers

Pearson Correlation

.577** .179* .579** .593** .834** 1

Sig. (2-tailed) 0 0.028 0 0 0

N 150 150 150 150 150 150

*. Correlation is significant at the 0.05 level (2-tailed). **. Correlation is significant at the 0.01 level (2-tailed).

244

Table 4.57: Correlations - Spearman's rho (Number of Customers, Footfalls and Catchment Area)

Area_ Sqm

Catch_ Area

foot_fall_ mid

SKU_ NUMBER

Mon_ Sales

No_of_ Customers

Area_Sqm

Correlation Coefficient

1 .285** .408** .377** .290** .405**

Sig. (2-tailed) . 0 0 0 0 0

N 150 150 150 150 150 150

Catch_Area


.285** 1 .306** .262** .182* 0.16

Sig. (2-tailed) 0 . 0 0.001 0.026 0.051

N 150 150 150 150 150 150

foot_fall_mid


.408** .306** 1 .434** .242** .453**

Sig. (2-tailed) 0 0 . 0 0.003 0

N 150 150 150 150 150 150

SKU_NUMBER


.377** .262** .434** 1 .342** .375**

Sig. (2-tailed) 0 0.001 0 . 0 0

N 150 150 150 150 150 150

Mon_Sales


.290** .182* .242** .342** 1 .812**

Sig. (2-tailed) 0 0.026 0.003 0 . 0

N 150 150 150 150 150 150

No_of_Customers


.405** 0.16 .453** .375** .812** 1

Sig. (2-tailed) 0 0.051 0 0 0 .

N 150 150 150 150 150 150

**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

245

Table 4.58: Model 1 (Number of Customers, Footfalls and

Catchment Area) Model Summaryd

Model R R

Square

Adjusted R

Square

Std. Error of the

Estimate

1 .834a 0.696 0.693 4.95E+05

2 .842b 0.709 0.705 4.85E+05

3 .852c 0.726 0.72 4.73E+05 a. Predictors: (Constant), No_of_Customers b. Predictors: (Constant), No_of_Customers, Catch_Area c. Predictors: (Constant), No_of_Customers, Catch_Area, foot_fall_mid d. Dependent Variable: Mon_Sales

Table 4.59: Coefficientsa (Number of Customers, Footfalls and Catchment Area)

Model

Unstandardized Coefficients Std.

Error

Standardized Coefficients

t Sig. B Beta

1

(Constant) -974328.6 139334 -6.993 0

No_of_Customers 1264.218 68.752 0.834 18.388 0

2

(Constant) -1.31E+06 188232 -6.973 0

No_of_Customers 1232.133 68.546 0.813 17.975 0

Catch_Area 157249.67 60176 0.118 2.613 0.01 3

(Constant) -1.26E+06 184218 -6.844 0

No_of_Customers 1366.941 80.696 0.902 16.939 0

Catch_Area 187397.37 59498 0.141 3.15 0.002

foot_fall_mid -2121.63 712.88 -0.161 -2.976 0.003

a. Dependent Variable: Mon_Sales

246

Table 4.60: Excluded Variablesd (Number of Customers,

Footfalls and Catchment Area)

Model Beta In T Sig. Partial

Correlation Collinearity Statistics

1 Tolerance

Area_Sqm .041a .736 .463 .061 .667

Catch_Area .118a 2.613 .010 .211 .968

foot_fall_mid -.132a -2.404 .017 -.195 .665

SKU_NUMBER -.030a -.540 .590 -.044 .648

2

Area_Sqm .024b .442 .659 .037 .658

foot_fall_mid -.161b -2.976 .003 -.239 .645

SKU_NUMBER -.060b -1.060 .291 -.087 .626

3

Area_Sqm .029c .540 .590 .045 .657

SKU_NUMBER -.024c -.420 .675 -.035 .593

a. Predictors in the Model: (Constant), No_of_Customers

b. Predictors in the Model: (Constant), No_of_Customers, Catch_Area

c. Predictors in the Model: (Constant), No_of_Customers, Catch_Area, foot_fall_mid

d. Dependent Variable: Mon_Sales

The regression analysis using ‘forward entry’ method shows that the sales of

an outlet depend on number of customers, footfalls and catchment area (Table

4.56 to Table 4.60).

The model was able to explain 72% of the variance in ‘monthly sales’ with

the three independent variables namely number of customers, footfalls and

catchment area.

Monthly Sales = 1377 x (Number of customers) + 1,87,397 x (Catchment Area) – 2,122 x (footfalls) -12,61,000

247

Model 2 – Determining PBDIT using multiple regression

PBDIT for an organised outlet is estimated using multiple linear regression

and the model is as follows.

Table 4.61: Correlation (Estimation of PBDIT)

Sales Per_Exp PBDIT

Sales

Pearson Correlation 1 0.122 .918**

Sig. (2-tailed) 0.137 0

N 150 150 150

Per_Exp

Pearson Correlation 0.122 1 -.237**

Sig. (2-tailed) 0.137 0.003

N 150 150 150

PBDIT

Pearson Correlation .918** -.237** 1

Sig. (2-tailed) 0 0.003

N 150 150 150


Table 4.62: Model Summaryc (Estimation of PBDIT)

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .918a .842 .841 83208.489

2 .983b .966 .966 38573.059

a. Predictors: (Constant), Sales, b. Predictors: (Constant), Sales, Per_Exp c. Dependent Variable: PBDIT.

248

Table 4.63: ANOVAc (Estimation of PBDIT)

Model Sum of Squares Df Mean Square F Sig.

1

Regression 5.479E12 1 5.479E12 791.349 .000a

Residual 1.025E12 148 6.924E9

Total 6.504E12 149

2

Regression 6.285E12 2 3.143E12 2.112E3 .000b

Residual 2.187E11 147 1.488E9

Total 6.504E12 149

a. Predictors: (Constant), Sales, b. Predictors: (Constant), Sales, Per_Exp, c. Dependent Variable: PBDIT.

Table 4.64: Coefficientsa (Estimation of PBDIT)

Model

Unstandardized

Coefficients

Standardized

Coefficients t Sig.

B Std. Error Beta

1 (Constant) 46042.109 13162.698 3.498 .001

Sales .215 .008 .918 28.131 .000

2

(Constant) 862288.39

5 35597.418 24.223 .000

Sales .225 .004 .961 63.070 .000

Per_Exp -11092.597 476.601 -.355 -23.274 .000

a. Dependent Variable: PBDIT.

In this model, 97% of the variance in PBDIT is explained by monthly sales

and operating expenses in percentage (Table 4.61 to Table 4.64).

The model is

PBDIT = 0.225 (Sales) – 11,093 (Percentage Expenditure) + 8,62,288

249

Model 3 – Multiple linear regression estimation for determining the sales of a

kirana store.

The analysis showed that the averages sales for a kirana store is dependent

on number of footfalls per day, average sales per day and the time spent by a

customer.

To confirm this proposition, the regression analysis using stepwise and

backward regression methods were used. The analysis corroborates the earlier

conclusion that the average sales of a kirana store is dependent on number of

footfalls per day, average sales per day and the time spent by a customer (Table

4.65 to Table 4.69).

The multiple linear regression equation for determining the average sales of

a kirana store is

Table 4.65: Variables Entered/Removedb (Average Sales of a kirana Store)

Model Variables Entered Variables Removed Method

1 Avg_time_spent, Aver_cust, Avg_sales, Size, NumCusta

. Enter

a. All requested variables entered. b. Dependent Variable: Avg_sale_store.

Table 4.66: Model Summary (Average Sales of a kirana Store)

Model R R

Square Adjusted R

Square Std. Error of the Estimate

1 .951a .904 .901 969.928

a. Predictors: (Constant), Avg_time_spent, Aver_cust, Avg_sales, Size, NumCust

Averages Sales = 48.976 x Aver_cust + 70.746 x Avg_sales + 41.416 x Avg_time_spent -3584.657

250

Table 4.67: ANOVAb (Average Sales of a kirana Store)

Model Sum of Squares df Mean Square F Sig.

1

Regression 1.713E9 5 3.426E8 364.207 .000a

Residual 1.825E8 194 940759.419

Total 1.896E9 199

a. Predictors: (Constant), Avg_time_spent, Aver_cust, Avg_sales, Size, NumCust b. Dependent Variable: Avg_sale_store.

Table 4.68: Coefficientsa (Average Sales of a kirana Store)

Model

Unstandardized Coefficients

Stand-ardized

Coefficients t Sig.

95% Confidence Interval for B

B Std. Error

Beta Lower Bound

Upper Bound

1

(Constant) -3596.498 238.631 -15.071 .000 -4067.142 -3125.853

Size -.287 .360 -.023 -.798 .426 -.996 .422

NumCust .401 .987 .012 .406 .685 -1.546 2.347

Aver_cust 48.262 3.057 .433 15.789 .000 42.233 54.290

Avg_sales 70.632 2.343 .719 30.142 .000 66.010 75.253

Avg_time_spent

49.916 21.550 .065 2.316 .022 7.414 92.418

a. Dependent Variable: Avg_sale_store.

Table 4.69: Coefficient Correlationsa (Average Sales of a kirana Store)

Model Avg_time_

spent Aver_ cust

Avg_ sales

Size NumCust

1

Correlations

Avg_time_spent 1.000 -.073 -.095 -.534 -.027

Aver_cust -.073 1.000 -.100 .207 -.532

Avg_sales -.095 -.100 1.000 .027 -.192

Size -.534 .207 .027 1.000 -.345

NumCust -.027 -.532 -.192 -.345 1.000

Covariances

Avg_time_spent 464.397 -4.824 -4.801 -4.135 -.581

Aver_cust -4.824 9.344 -.716 .228 -1.605

Avg_sales -4.801 -.716 5.491 .023 -.445

Size -4.135 .228 .023 .129 -.123

NumCust -.581 -1.605 -.445 -.123 .974

a. Dependent Variable: Avg_sale_store .

In this model 90% of the variance in average sale/day of a kirana store is

explained by footfalls per day, number of customers and average time spent by a

customer.

��

chapter – 4 testing of hypothesis and model building

Documents