121958653 business quantitative techniques

7/28/2019 121958653 Business Quantitative Techniques

1/49

Business Quantitative Tools &

Tetchiness

Submitted to: Mr. Abid Awan

Date: June 27, 2011


2/49

June 27, 2011 BUSINESS QUANTITATIVE TOOLS & TETCHINESS

1

Authors Detail:

MBA (Fall 2010)

SECTION (A)

Sr no: Names IDs

1

2

3


3/49


2

ACKNOWLEDGMENT

In the name of ALLAH ALLMIGHITY the lord of the world who has bestowed us with

abilities and blessed with knowledge so that we can make best of opportunities provided to us

first of all we are indebted towards ALLAH ALLMIGHITY who has created us and made us

capable enough to with stand in the competitive world.

If words could pay gratitude then we would like to pay our esteem gratitude to our most

respected

MR ABID AWAN Ali Khan for assigning us this project of BUSINESS STATISTICS

throughout the course period he has been extremely cooperated with us and guided us at every

single step he has been very encouraging and kind to us.

Throughout it has been collaborated effort of all the group members who have contributed and

helped each other to make this project a successful report.


4/49


3

Objective

The purpose of this project is to apply statistical tools for testing which Country is best among

these three Countries in wheat EXPORT, PRODUCTION and DOMESTIC CONSUMPTIONOF WHEAT.

Countries are:

INDIA

PAKISTAN

MAXICO

By applying statistical tools on this data well learn that who to apply these tools in our real life

too to make our decisions easy.


5/49


4

Table of ContentsHistory: ....................................................................................................... Error! Bookmark not defined.

Description of all Statistical Tools ................................................................................................................ 9

Brief description of variables ...................................................................................................................... 12

Number of Balls (I.V): ............................................................................ Error! Bookmark not defined.

Number of Boundaries (I.V): .................................................................. Error! Bookmark not defined.

Runs (D.V): ............................................................................................. Error! Bookmark not defined.

Descriptive Analysis ................................................................................................................................... 13

Analysis of Variance ................................................................................... Error! Bookmark not defined.

Fisher LSD .......................................................................................................................................... 22

Correlation Analysis of Afridi .................................................................................................................... 29

Correlation Analysis of Yuvraj ................................................................................................................... 31

Correlation Analysis of Gayle: ................................................................................................................... 33

Regression Analysis of Afridi ..................................................................................................................... 35

ANOVA of Regression Analysis ............................................................................................................ 38

Regression Analysis of Yuvraj ................................................................................................................... 39


Regression Analysis of Gayle ..................................................................................................................... 42


Comparison of Players ................................................................................................................................ 45

Crux of Study .............................................................................................................................................. 46

Recommendations ....................................................................................................................................... 47

Conclusion .................................................................................................................................................. 48


6/49


5

List of Tables

Descriptive Analysis Tables

Table: 113

Table: 214

Table: 316

ANOVA Tables

Table: 420

Table: 524

Table: 627

Correlation Tables

Table: 728

Table: 830

Table: 932

Regression Analysis Tables

Table: 10..34

Table: 10.1.36

Table: 10.2.37

Table: 11..38

Table: 11.1.40

Table: 11.2.40

Table: 12..41

Table: 12.1.43


7/49


6

History of Indian wheat

Agriculture plays an important role in Indian Economy. It provides gainful employment

to a significantly large section of Indian Society and provides raw material for a large number ofindustries in the country. According to 1991 Population Census nearly 74 percent of India's

population live in rural areas for its livelihood. Agriculture is the largest contributor in thecountry's Net Domestic Product accounting for as much as 34.2 percent in 1991-92 at current

prices. Thus, Agriculture has a key position in India's economy both in view of employment andcontribution to the national income.

India is one of the main wheat producing and consuming countries of the world. After the

Green Revolution in the 1970's and 1980's the production of wheat has shown a huge increase.

The major States that are involved in the cultivation of wheat are those located in the plains likeUttar Pradesh, Punjab and Haryana. They account for nearly 70 per cent of total wheat produced

in the country. Punjab and Haryana yield the highest amount of wheat because of the availabilityof better irrigation facilities.

Wheat is a Rabi crop that is grown in the winter season. Sowing of wheat takes place in

October to December and harvesting is done during the months of February and May. The wheatcrop needs cool winters and hot summers, which is why the fertile plains of the Indo-Gangetic

region are the most conducive for growing it.


8/49


7

History of Pakistans wheat

Agriculture plays an important role in Pakistans Economy. It provides gainful

employment to a significantly large section of Pakistans Society and provides raw material for alarge number of industries in the country. About 28% of Pakistan's total land area is under

cultivation and is watered by one of the largest irrigation systems in the world. Agriculture

accounts for about 21% of GDP and employs about 42% of the labor force. The most important

crops are cotton, wheat, rice, sugarcane, fruits, and vegetables, which together account for more

than 75% of the value of total crop output. Despite intensive farming practices, Pakistan remains

a net food importer. Pakistan exports rice, fish, fruits, and vegetables and imports vegetable oil,

wheat, cotton (net importer), pulses, and consumer foods.

Wheat is a Rabi crop that is grown in the winter season. Sowing of wheat takes place in

October to December and harvesting is done during the months of February and May. The wheat

crop needs cool winters and hot summers, which is why the fertile plains of the Indo-Gangetic

region are the most conducive for growing it.


9/49


8

History of Mexico Wheat

In 1999, agriculture employed 23 percent of Mexico's labor force but accounted

for only 5 percent of Mexico's GDP. Crop production was and continues to be the most

important agricultural activity in Mexico, accounting for fully 50 percent of agricultural

output. Domestically, the most important crops for consumption purposes are wheat,

beans, corn, and sorghum. The most important crops for export purposes are sugar,

coffee, fruits, and vegetables. Mexico continues to be one of the top producers of crops

in the world. In 1999, the crops produced in greatest number in Mexico were sugar cane

(46.81 billion tons), corn (15.72 billion tons), sorghum (5.59 billion tons), wheat (3

billion tons), and beans (1.04 billion tons). Fruits and vegetables are the most

economically significant agricultural products exported by Mexico.


10/49


9

Description of all Statistical ToolsFollowing is the brief description of the Statistical tools which we have used in our project to analyze the

data;

Mean:

The mean is also called the Arithmetic Mean. It is one of the measures of central

tendency which is used most frequently. It is obtained by dividing the sum of all values by the

number of values in the interested data. The single value of the Mean represents the whole

data. The sample data mean is denoted by and the population mean is denoted by .Median:

The median is also an important measure of central tendency. If we want the Median of

the ungrouped data then it will be the middle value of the arranged data. It divides the arranged

data into two equal parts. It is calculated by arranging data first in increasing order and then the

middle value is founded from that arranged data.

If the values of the data are oddthen the median will be the middle value of the arranged

data and if the values of the data are eventhen the median will be calculated by the average of

the two middle values of the data.

Standard Deviation:

It is a frequently used measure of dispersion. The standard deviation tells about the

deviation of the values of the data from its mean. The large value of standard deviation shows the

more spread of the values from the mean and the smaller value of the standard deviation shows

the less spread of the values from the mean. It is measured by taking the square root of the

variance. The variance of the population is denoted by and variance of the sample is

denoted by s.so the variance of population is denoted by and sample standard deviation is

denoted by s.

It is actually the sum of deviation of each value of the data from the mean and the sum of

the deviation will always equal to zero.

The value of the variance and standard deviation should not be negative. If the data has no

variation in it then the variance and the standard deviation will be 0. It is expressed in the

same units as the data.

Co-Efficient of Variance:

The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean

: The answer of the (CV) < 1 shows the less variance and (CV) > 1 shows the high variance. Italso tells the per unit variance of the data.


11/49


10

Skewness:

Skewness is the measure of the symmetry of the probability distribution of the random

variable. Its value can be positive, negative or undecided. The negative skewed value indicates

the tail on the left side, longer than the right side which means the data is clustered including

median lying on the right side of the mean. The positive value of the skewness tells that the tail

on the right side is longer than the left side, which means the data is clustered on the left side. If

the value of the skewness is zero means the valued are evenly distributed on the both sides of the

mean.

Hypothesis Testing:

It is a method of making decision using data whether from controlled experiment or from

controlled experiment. Means we perform this test when we make the decision about the

population parameter on the bases of the sample statistic.

The two terms are used in this test Null Hypothesis and Alternative Hypothesis. Null

Hypotheses normally is a claim about the population parameter that is considered to be true until

it is proved false.

Alternative Hypothesis is used to check the claim whether or not the Null Hypothesis is true.

ANOVA:

In analyses of the variance the ANOVA is used to check the Null Hypothesis whether or not

the means of three or more populations are equal.

The Hypothesis in the analyses of variance is normally stated like;

H: all three population means are equal

H: all three population means are not equal

Assumptions of ANOVA:

For using one-way ANOVA following assumptions must be fulfilled.

1. The populations are normally distributed from which the samples are drawn.

2. The populations also have the same variance or (Standard Deviation) from which the

samples are drawn.

3. The samples are independent and random, drawn from the different populations.

In the ANOVA we calculate the two variance estimates, variance between sample (MSB) and

variance within sample (MSW). If the means of the populations under consideration are all equal

then the variation between the means of the samples taken from those populations well be less,

(MSB) will be low and the means of the populations are not equal then the (MSB) will be higher.


12/49


11

MSW gives the value indicating the variance within the data of samples taken from the different

populations. It is similar to the concept of the pooled standard deviation.

Multiple Regression Analyses:

It is the study to confirm that how a dependent variable is depending on one or more independent

variables. For this purpose we use the following model;

= a+ bx + bx+ Where:

=dependent variable (Estimated Value of Y)

a= y-intercept (Constant)

b = (Slope) per unit change in due to the change in one unit of first independent variable.

b = (Slope) per unit change in due to the change in one unit of second independent variable.

The above model is often used for the future predictions of the important components/variables

of the organizations.

Coefficient of Determination:

Coefficient of Determination is used to answer the question that how good is the

regression model. It tells that how significantly a dependent variable is depending on the

independent variables used in the regression model.

It is denoted by r. The value of the r is always between 0 and 1. As long as the value of ris close to the 1 it will represent the fitness of the model and gives the confidence about the

regression model. Its formula is;

r =


13/49


12

Brief description of variables

For the purpose of this project we have selected following variables.

i. Production (Independent Variable)

ii. Domestic Consumption (Independent Variable)

iii. Export (Dependent Variable)

Production: (Independent Variable)

In this project production is our 1st

variable. We collect last 30 values which are yearly

base data of production of wheat of three countries INDIA, PAKISTAN, and MEXICO. If

countries production goes up then it is able to export it more and also fulfill its domestic

consumption very easily. The data of production vary year to year and it affects on the domestic

consumption and export too. It is purely significant independent variable for each country

because production of wheat depends upon the variables mostly.

Domestic consumption: (Independent Variable)

In this project domestic consumption is 2

nd

variable. We collect 30 values which are yearly basedata of domestic consumption of wheat for three countries INDIA, PAKISTAN and MEXICO if

the domestic consumption of a country goes down then it is able to export it more but it will

happen when production of wheat goes higher. The data of domestic production vary year to year

and it affects on exports only. It is purely significant variable for each country.

Exports: (Dependent)

Exports are our 3rd

variable in this project which is dependent variable for this variable

we collect 30 values which are yearly base data and this data is for three countries INDIA

PAKISTAN MEXICO. Export is dependent variable which depends upon production and

domestic consumption of wheat if the production increases then the export of wheat will also

increase for each country and as well as if the domestic consumption of wheat decreases then the

export of wheat will increases for each country. Our dependent variables have positive relation

with production and have negative relation with domestic consumption of wheat.


14/49


13

Descriptive Analysis

INDIA

Table: 1

ProductionDomestic

Consumption Export

Mean 60613.27 Mean 60451.8 Mean 832.5333

Median 63598.5 Median 64342.5 Median 150

Standard

Deviation 13362.26 Standard Deviation 12830.12

Standard

Deviation 1430.318

coefficient of

Variance 104.925

coefficient of

Variance 106.436

coefficient of

Variance 18.0173

Kurtosis -1.15216 Kurtosis -0.97222 Kurtosis 5.139285

Skewness -0.24334 Skewness -0.31746 Skewness 2.327124

Explanation:

MEAN (PRODUCTION, DOMESTIC CONSUMPTION AND EXPORT)

The mean is the average value of the wheat which produce in India is 60613.27 in this period. Now this

result show that in India the average domestic consumption of wheat on these 30th

years is 60451.8.

And also this result show that the average Export of wheat in India in this 30th

year period is 832.5333

Median

The median is also use for the average value. In this period the average production of wheat is

63593.5. Now this result show that in India the average domestic consumption of wheat on these 30th

years is. And also this result show that the average Export of wheat in India in this 30th

year period is

150.

.

Standard Deviation: (Production, Domestic Consumption and EXPORT

Respectively)

The standard deviation of wheat production of India in this period is 133362.26, which means that

there are 133362.26 deviations from the average, in the data of Indian production of wheat for last

30 years.

The standard deviation of wheats domestic consumption in India is 12830.12 in that period. The

standard deviation of export of wheat is 1430.318.


15/49


14

Co-Efficient of Variance: Production, Domestic Consumption and EXPORT

Respectively)

The CV value of wheat production is showing that there is 104% per year variation of wheatproduction in India of that period.

As long as the domestic consumption is concerned the CV value is explaining the 107%

variation of domestic consumption.

Now the CV of export of wheat is illustrating the 18% variation per year export of India.

Skewness: Production, Domestic Consumption and EXPORT Respectively)

Wheat production skewness is -0.2433 indicates that production in left skewed and most of the

data is clustered on the right side.

Boundarys skewness is -.03175 illustrating that domestic consumption is also left skewed and

most of the data is clustered on the right side.

Export of production skewness is 2.3271 indicates that export of wheat are right skewed and most

of the data is clustered on the left side.

Pakistan

Table: 2

Production

Domestic

Consumption Export

Mean 16868.66667 Mean 17862.03 Mean 324.6

Median 16779 Median 18640 Median 30.5

Standard


Standard

Deviation 577.8332

coefficient of

Variance 23.5%

coefficient of

Variance 21.02%

coefficient of

Variance 178.01%Kurtosis -0.998029335 Kurtosis -0.98076 Kurtosis 5.446105

Skewness 0.198571805 Skewness -0.34223 Skewness 2.377759


16/49


15

Explanation:

MEAN (production, domestic consumption and export)

The mean of production of wheat by PAKISTAN IS 16868.666 and mean of domestic consumption is

17862.03 and the mean of export of wheat by Pakistan is 324.6 in last 30 years.

Median:Production, Domestic Consumption and EXPORT Respectively)

The median of this data is 16779 which shows the average production of wheat by Pakistan in last 30

years and the average domestic consumption of wheat in Pakistan is 18640 in last 30 years .the median

of export is 30.5 which means that the average export of Pakistan is 30.5 in last 30 years.

Standard Deviation: Production, Domestic Consumption and EXPORT

Respectively)

The standard deviation in Pakistans wheat production is 3964.253039 and this value show the

deviation from the average in data of production of Pakistan wheat in last 30 years.

The standard deviation in Pakistans wheat domestic consumption is 3756.152 and this value

show thedeviation from the average in data of domestic consumption of Pakistan wheat in last

30 years.

The standard deviation in Pakistans wheat export is 577.8332 and this value show thedeviation from the average in data of export of Pakistan wheat in last 30 years.

Co-Efficient of Variance: Production, Domestic Consumption and EXPORT

Respectively)

The value Co-Efficient of Variance of production of wheat is showing that there is 23.5%per

year variation in production of wheat by Pakistan. The value Co-Efficient of Variance of

domestic consumption of wheat is 21.02% per year variation in domestic consumption of wheat by

Pakistan. As well as export concerned in export data the value of Co-Efficient of Variance is178.01%.

it shows the per year variation in export of wheat by Pakistan.


17/49


16

Skewness: Production, Domestic Consumption and EXPORT Respectively)

Skewness value of wheat production is 0.198 which means that it is right skewed and most of the data is

clustered on the left side.

Skewness value of wheat of domestic consumption is -0.34223 which means that it is left skewed and

most of the data is clustered on the right side.

Skewness value of export of wheat is 2.377759 which means that It is right skewed an dmostly data is

clustered on the left side.

MEXICO:

Table: 3

Production Domestic Consumption Export

Mean 3578.9 Mean 5146.233333 Mean 362.1333

Median 3587.5 Median 5223 Median 271

Standard


Standard

Deviation 386.0117

coefficient of

Variance 15%

coefficient of

Variance 14.6%

coefficient of

Variance 106%

Kurtosis -0.644972214 Kurtosis -1.388933856 Kurtosis 0.963786

Skewness -0.09289545 Skewness -0.063532895 Skewness 1.161069

Explanation:

Mean

The mean is the average value of the wheat which produce in Mexico is 3578.9 in this period. These

results also show that the average domestic consumption of wheat on these 30th

years is 5146.233333.

And also this result show that the average Export of wheat in Mexico in this 30th

year period is 362.1333.


18/49


17

Median:Production, Domestic Consumption and EXPORT Respectively)

The median is also use for the average value. In this period the average production of wheat is

3587.5. This result is also show that the average domestic consumption of wheat on these 30th

yearsis5223. And also this result show that the average Export of wheat in this 30

thyear period is 271.

Standard Deviation: Production, Domestic Consumption and EXPORT

Respectively)

The standard deviation of wheat production of Mexico in this period is 524.1380017 which mean

that there are 524.1380017 deviations from the average, in the data of Indian production of wheat

for last 30 years.

The standard deviation of wheats domestic consumption in Mexico is 749.3137404 in that

period. The standard deviation of export of wheat is 386.0117.

Co-Efficient of Variance:(Production, Domestic Consumption and EXPORT

Respectively)

The CV value of wheat production is showing that there is 14% per year variation of wheat

production in Mexico of that period.

As long as the domestic consumption is concerned the CV value is explaining the 14.6%variation of domestic consumption.

Now the CV of export of wheat is illustrating the 106% variation per year export of Mexico.

Skewness: (Production, Domestic Consumption and EXPORT Respectively)

Wheat production skewness is-0.644972214 indicates that production in left skewed and most of

the data is clustered on the right side.

Boundarys skewness is -0.063532895illustrating that domestic consumption is also left skewed

and most of the data is clustered on the right side.

Export of production skewness is 1.161069 indicates that export of wheat are right skewed and

most of the data is clustered on the left side.

Comparison of Averages


19/49


18

Our data has outliers so that we have calculated the Median. We have also calculated the Mean.

Production:

Now we can compare the averages of production from our above data. The average

production of India is 60613.27 and average production of PAKISTAN is 16868.66 and the

average production ofMEXICO is 3578.9 in last 30 years. From these averages we can say that

India is best producer of wheat in last 30 years and PAKISTAN is second best producer of wheat

in this period and as far MEXICO is consider it is the lower producer of wheat from other two

countries in this period.

Domestic Consumption:

As comparing the averages of domestic consumption of wheat in these countries we

found that the average of INDIA is 60451.8 average ofPAKISTAN is 17862.03 and the average

ofMEXICO is 5146.23.these result shows that averages of domestic consumption of wheat in

these countries are varying from each others and INDIA consumes domestically very high fromtwo countries and PAKISTAN is on second number as well as MEXICO consumes very low

domestically. All these countries consume wheat domestically in these 30 years very differently.

EXPORT:

This data of three countries about EXPORT for last 30 years shows the average of export

of INDIA is 832.5333 and the export of PAKISTAN is 324.6 where as the export of MEXICO is

362.1333. These averages shows that the average EXPORT of INDIA is higher than other two

countries and MEXICO is on second number where as PAKISTAN export of wheat is lower than

two other countries.

Comparison of Standard Deviation:

As we have already stated above that Standard deviation is the most commonly used tool of

dispersion. Still it is the most accurate measure of dispersion. It is used to find the deviation in

data. The less the standard deviation the more consistent the data will be.

We are going to compare the S.D of all three countries comparison is as bellow;


20/49


19

Production:

The S.D of India is, 13362.26 Pakistan is 3964.253039 and Mexico is 524.1380017.

According to our result the S.D of India is again more than that of other t wo countries; it

means its deviation from the mean is more. There is more spread in his data, means no

consistency. The deviation of pakistan is less than the india but greater than mexico,

which shows that it is more consistent in producing wheat than india but less consistent than

mexico . Here the deviation of mexico from the mean is less than the other two countries


The S.D of india is, 12830.12 pakistan is3756.152 and mexico is 749.3137404. According

to our result the S.D of india is again more than that of other counyries; it means its deviation

from the mean is more. There is more spread in his data, means no consistency. The deviation of

pakistan is less than the other two countries which shows that it is more consistent in

consumption of wheat than both of other two. Here the deviation of mexico is less than the

india.

.

EXPORT:

The S.D of india is 1430.318 , pakistan is577.8332 and mexico is 386.0117 . According to

our result the S.D of india is more than that of other two countries, it means its deviation from

the mean is more. There is more spread in his data, means no consistency. The deviation of

mexico is less than that of other two countries which shows that it is more consistent in export.Here the deviation of pakistan is less than the india but greater than the mexico.

Comparison of C.V:

As we know that the Co-efficient of Variance (C.V.) tells us the Variation per unit. So the less

the value of C.V. the more it will be favorable and it also tells us the consistency.

Now the comparison of all three countries which is as under:

Production:

The C.V of india is 104.46%,the C.V of pakistan is 23.5% and the mexico C.V is 15% Our result

shows that the per year variation of maxico is again less than that of other countries.


The C.V of india is106%,the C.V of pakistan is 21% and the mexico C.V is14.6% Our result

shows that the per year variation of maxico is again less than that of other countries.

EXPORT:

The C.V of India is18.017%, the C.V of Pakistan is 178% and the Mexico C.V is106%.


21/49


20

Scenario # 1 Production

We check that the mean of wheat production of the three countries production. (India Pakistan

and Mexico) are same in year. For this job the following data is taken from SBP(State Bank of

Pakistan) record. Use these data to test whether the populations mean of production of these

countries are different. Use = 0.05India Pakistan Mexico

3.6313 1.1473 3.05

3.7452 1.1304 4.2

4.2794 1.2414 3.2

4.5476 1.0882 4.2

4.4069 1.1703 4.4

4.7052 1.3922 4.5

4.4323 1.202 3.7

4.6169 1.2675 3.2

5.411 1.4419 4

4.985 1.4429 3.93

5.5134 1.4565 4.061

5.569 1.5684 3.621

5.721 1.6157 3.582

5.984 1.5212 4.151

6.547 1.7002 3.468

6.2097 1.6907 3.107

6.935 1.6651 3.639

6.635 1.8694 3.235

7.078 1.7858 3.05

7.6369 2.1079 3.4

6.968 1.9024 3.27

7.181 1.8227 3.23

6.51 1.9183 2.7

7.215 1.95 2.426.864 2.1612 3.02

6.935 2.1277 3.24

7.581 2.3295 3.593

7.857 2.0959 4

8.068 2.4033 4.3


22/49


21

Null Hypothesis

0 1 2 3:H

Alternative Hypothesis

1 1 2 3:H

Distribution

We use F-Distribution, because we have more than one population or to run the hypothesis testing.

Critical Region

1 2, ,=tab v vf

F tab=2, 84, .05

=3.105

If f >3.105, so we reject H0

ANOVA Table

TABLE: 4

ANOVA

Source of

Variation SS df MS F P-value F crit

Between Groups 287.8838833 2 143.9419 209.8983

2.11E-

33 3.105157

Within Groups 57.60466839 84 0.68577

Total 345.4885517 86

8.071 2.39 3.9


23/49


22

Calculated f value > Tabulated f value

f = 208.475 is greater than the tabulated value which is 3.105

As well As

P value <

So we reject Ho

Explanation:

We know that if the means of all the population are equal then the production sample

means taken from three different countries production will still be not equal because the variation

between them will be very large and as a result the value of MSB will be less. If the means of all

populations under concern are not equal then variation between sample means will be high and

the value of MSB will also be high. As long as the MSW is concerned, the MSW value tells the

variation within the different samples data taken from the different populations. Here the

variation within sample is 696.31

So in the light of the above discussion we can see in the above table that the value of

MSB is very high and our results are also indicating towards rejecting the Ho which means that

all population means are not equal. The critical value is 3.101 and our f calculated value is 3929

which is greater than the critical value which illustrates that reject Ho. As we know that the one-

way ANOVA test is always right tail and the f calculated value is lying in the rejection region

which is the main reason to reject Ho.

If we compare the p value, the p value is 0.023 which is less than the (0.05) and the

rule is , if p value is less than the then reject Ho otherwise do not reject. It is also

supporting our conclusion.

As we have rejected the Ho, which means at least one mean should not be equal, so now

we want to know that which population mean is not equal to the other population means. For this

purpose we will calculate the Fisher LSD which will tell us about which mean is not equal.


24/49


23

Domestic Consumption

Scenario # 2

We check that the mean of wheat domestic consumption of three different countries (INDIA,

PAKISTAN and MEXICO) are same in One year. For this job the following data is taken from

SBP (State Bank of Pakistan) record. Use following data to test whether the populations mean of

domestic consumption of these countries is different. Use = 0.05.INDIA Pakistan Mexico

3.6313 1.1215 4

3.7838 1.1521 4.093

4.2029 1.2 4.1

4.3076 1.2312 4.35

4.3719 1.2754 4.64

4.5567 1.325.112

5.6492 1.38 4.272

4.8929 1.4886 4.103

5.3201 1.5316 4.152

4.7595 1.6206 4.454

5.8009 1.6907 4.622

5.7515 1.7405 4.888

5.3377 1.79 5.398

5.833 1.8137 5.265

6.4978 1.8904 4.707

6.6064 2.0124 4.815

6.9246 2.0258 5.181

6.3707 2.1284 5.409

6.8793 2.0452 5.378

6.6821 2.05 5.58

6.5125 1.98 5.818

7.4294 1.838 5.9

6.8258 1.89 5.9

7.2838 1.95 6

6.998 2.01 6.1

7.3477 2.17 6.2

7.6423 2.24 5.5

7.0924 2.28 6

7.8201 2.32 6.2

8.2435 2.4 6.25


25/49


26/49


25


f = 208.475 is greater than the tabulated value which is 3.105

As well As

P value <

So we reject Ho

Explanation:

As the f calculated value is 208.475, Greater than the f critical value, means208.475 > 3.101296 so we do not reject Ho which means average domestic consumption by allthree countries are equal. Our P value is also showing the different results because it is less

than the alpha so we reject Ho. As in above table the value of MSB is very large than thecomparison of production, it is showing the large variation in the sample means taken from the

different three countries. It happens when the population means are equal, so the comparison

between critical value and f calculated value has justified that the population means are equal.

The value of MSW 0.717472 is showing the variation within the sample data taken from the

different Countries

We have selected the competitive Countries so that the number of domestic consumption

are not close to each other because on every production compite the domestic consumption is

difficult and sometimes impossible, so therefore on average these three countries number of

domestic consumption are not same.

Comparison:

We have calculated the ANOVA of three players regarding boundaries hit by them so if we

compare the findings of the ANOVA with the raw data, the value of MSB is showing the less

variation between the means of the samples data which gives the indication about the equality in

the population means regarding the boundaries. The findings of the ANOVA about the

boundaries demonstrate the average boundaries hit by all the three players on the population base

are same because the players we have selected have more strike rate, means they have made

more runs on less balls played. This happens only because of hitting more boundaries so all three

players have the same average on population bases regarding the boundaries.


27/49


26

Scenario # 3 Export

We check that the mean of wheat export by three different Countries (INDIA, PAKISTAN and

MEXICO) are same in One year. For this job the following data is taken from SBP(State Bank

of Pakistan) record. Use these data to test whether the populations mean of Wheat export by the

three different Countries. Use alpha= 0.05.

India Pakistan Mexico

0 0 5

1 0.78 10

0.35 2.05 2

1 0.49 5

4 0 6

5 0 3

5 0 52

0.2 0 231

0.2 0 200

2 0 7

6.8 0.1 8

0.5 0.5 8

0.28 0 12

1 0.06 135

15 0.01 472

20 0.01 227

0 0.12 374

0 0 311

2 0 404

15.69 2.53 705

30.87 4.95 548

48.5 11.85 597

56.5 1.93 451

21.2 6 5048.01 6 533

0.94 7 548

0.49 22 1261

0.23 21 1406

1 3 839

2 7 1000


28/49


27

Null Hypothesis

0 1 2 3:H

Alternative Hypothesis

1 1 2 3:H

Distribution

We use F-Distribution, because we have more than one population or to run the hypothesis testing.

Critical Region

1 2, ,=tab v vf

F tab=2, 84, .05

=3.105

If f >3.105, so we reject H0

ANOVA Table

TABLE: 6

ANOVASource of

Variation SS df MS F P-value F crit

Between Groups 2625191 2 1312596 26.2768

1.37E-

09 3.105157

Within Groups 4196022 84 49952.64

Total 6821213 86


29/49


28


f =26.2768 is greater than the tabulated value which is 3.105

As well As

P value <

So we reject Ho

Explanation:

As the calculated value is 26.2768 greater than the critical value means 26.2768 >3.105 so we reject the Ho which means all means are not equal.

According to the results the average export by these three countries on production bases are not

equal. P value is also less than the alpha so we reject Ho. The value of MSB is high butsome time less or greater than the value of the production.

Comparison

Here the value of MSB is high but still less than the (balls value of MSB). According to ourANOVA calculations our P and F value are in the favor of Ho so because of thesecalculations we also have concluded that the population means (of Runs) are equal and there is

less (but not that much less variation as compared to the boundaries) in sample means. It is all

because of high MSB value than boundaries but less than balls MSB value, as we have found the

MSB value of balls very high and on this base we have rejected Ho.

So as a result we have concluded that on average they made almost same runs in their every

match.


30/49


29

Correlation Analysis of INDIA

Following is the Correlation Analysis of ourcollected data.

TABLE: 7

Correlation Table

Production Domestic Consumption Exports

Production 1

Domestic Consumption 0.95653 1

Exports 0.272128 0.322684164 1

Explanation:

In the above table we can see that the relation between X1 and X2 is 82.76%, it is showing that

there is strong positive correlation between our both independent variables.

Now the relationship between X1 and Y is 92.32% which is indicating that our dependent

variable Runs is depending strongly on the Balls. It happens in real life as well because

without playing balls the player cant make the runs so thats why there is strong positivecorrelation between Runs and Balls. Now the value 97.133% is also showing the strong

positive correlation between the Boundaries and Runs. This value is greater than the value of

Balls relationship because boundary can increase the runs in high proportion and we have

selected those players whose boundaries are more than the other bats men. Therefore the relation

ship between Boundaries and the Runs is more.

Now we are going to check that whether or not the relations discussed above exist on the

population bases. For this purpose we will apply the hypothesis.

SCENARIO 1:

We have gotten the = 0.9232 so now we want to check that whether this relation existsactually on population bases or not.

Ho: 0


31/49


30

H1: < 0

Test Statistic:

=1.701

If t calculated value is < 1.701 then we will reject Hootherwise dont reject.

Conclusion:

Value = 32.588905, so it is > 1.701, which, means we do not reject Ho. This result is showing that

the relation on the base of sample data exists in the population as well. It meansthat there is strong positive correlation between Balls and Runs.SCENARIO 2:

We have gotten the = 0.97133 so now we want to check thatwhether this relation existsactually on population bases or not.

Ho: 0

H1: < 0

=1.701

If t calculated value is < 1.701 then we will reject Hootherwise dont reject. Conclusion

Value = 34.8655, so it is > 1.701, which, means we do not reject Ho. This result is showing thatthe relation on the base of sample data

exists in the population as well. It means

that there is strong positive correlation between Number of Boundaries and Runs.


32/49


31

Correlation Analysis of PAKISTAN

TABLE: 8

Correlation Table


Production 1

Domestic Consumption 0.937507914 1

Exports 0.59856792 0.504225813 1

Explanation:





without playing balls the player cant make the runs so thats why there is strong positive

correlation between Runs and Balls. Now the value 97.79% is also showing the strong



selected those players whose boundaries are more than the other bats men. Therefore the relationship between Boundaries and the Runs is more.



SCENARIO 1:

We have gotten the = 0.9136 so now we want to check thatwhether tH1s relation existsactually on population bases or not.

Ho: 0

H1: < 0

=1.701


33/49


32


= t =35.3425

Conclusion:

Value = 35.3425, so it is > 1.701, which, means we do not reject Ho. This result isshowing that the relation on the base of sample data exists in the populationas well. It means that there is strong positive correlation between Balls and Runs.

SCENARIO 2:

We have gotten the = so now we want to check thatwhether this relation existsactually on population bases or not.

Ho: 0

H1: < 0

=1.701


= t = 37.739


34/49


33

Conclusion:

Value = 37.739, so it is > 1.701, which, means we do not reject Ho. This result is showingthat the relation on the base of sample data exists in the population as well. Itmeans that there is strong positive correlation between Boundaries and Runs.

Correlation Analysis of MEXICO

TABLE: 9

Correlation Table


Production 1

Domestic Consumption -0.243877355 1

Exports -0.134459644 0.717375962 1

Explanation:





without playing balls the player cant make the runs so thats why there is strong positive

correlation between Runs and Balls. Now the value 93.76% is also showing the strong



selected those players whose boundaries are more than the other bats men. Therefore the relation

ship between Boundaries and the Runs is more.




35/49


34

SCENARIO 1:

We have gotten the = so now we want to check that whether this relation existsactually on population bases or not.

Ho: 0

H1: < 0

=1.701

If t calculated value is < 1.701 then we will reject H ootherwise dont reject.

= t = 21.63

Conclusion:

Value = t = 21.63, so it is > 1.701, which, means we do not reject Ho. This result is showing thatthe relation on the base of sample data exists in the population as well. It meansthat there is strong positive correlation between Balls and Runs.

SCENARIO 2:

We have gotten the = so now we want to check thatwhether tH1s relation existsactually on population bases or not.

Ho: 0

H1: < 0

= 1.701



36/49


35

= t = 23.44

Conclusion:

Value = 23.44, so it is > 1.701, which, means we do not reject Ho. This result is showingthat the relation on the base of sample data exists in the population as well. Itmeans that there is strong positive correlation between Boundaries and Runs.

Regression Analysis of INDIA

Following is the Regression Analysis of our data:

TABLE: 10

Regression Table

Coefficients Standard Error t Stat P-value

Intercept -1324.14 1243.271851 -1.06504 0.296288459

Production -0.04597 0.066270106 -0.69373 0.493779102

Domestic Consumption 0.081772 0.069018762 1.18478 0.246428801

Our multiple regression equation is:

= a + (b1*X1) + (b2*X2)

= -0.4177 + 0.6097*X1 + 4.1337*X2

Explanation of the above Equation:

Our intercept a = -0.4177, it means that if every value other than a is equal to 0 then the runs of

the Afridi will be -0.4177. It is showing the negative value of the intercept which indicates that

the average Runs of the Afridi will go in minus if he could not score the runs in the match

because as his matches will increase and if he could not play the bat or could not score the runs

the over all average will be decreased.


37/49


36

Now the value of b1 = 0.6097, it is showing the 0.6097 change in the runs due to the change in

one ball. Its value is small so that it has not that much significant affect on the model but as the

value is showing the less impact on the model, it is showing this result because there is a

possibility not to make the runs on every ball. But in reality without balls runs cant be made.

The value of b2 = 4.1337 which is showing the 4.1337 change in the runs due to the change inone boundary. This value is showing the significant contribution in the model.

Now to generalize the above slopes to the population we will have to calculate the hypothesis

which will tell us that whether or not the above discussed changes exist on population base.

SCENARIO 1:

According to our calculation the value ofb1 =0.609794. Now we want to check that whether this

per unit change exists in the population or not.

Ho: > 0Hi: < 0

= 1.701

Ift calculated value is < 1.701 then we will reject Hootherwise dont reject.

t = 4.798473

Conclusion:

As t = 4.798473 which is >1.701 so we do not reject H. It means that we are in the favor of Hthat indicates, the per unit b1 = 0.609794 change exists in the population.

SCENARIO 2:

According to our calculation the value ofb = 4.13377197. Now we want to check that whetherthis per unit change exists in the population or not.

Ho: > 0Hi: < 0


38/49


37

= 1.701

Ift calculated value is 1.701 so we dont reject H. It means that we are in the favor of Hthat indicates, the per unit b = 4.13377 change exists in the population.

We have proved the per unit changes on the bases of the samples on the population. Now we

want to check that how much our regression model is reliable or up to what extent we can controlthe errors. For this purpose we will apply the Coefficient of Determination. It is denoted by

r.

TABLE: 10.1

Regression Statistics

Multiple R 0.346141

R Square 0.119814

Adjusted R Square 0.054615

Observations 30

Explanation:

In the above table the value of r = 0.9886which is showing that we are controlling

errors. It means that the Runs which we have taken as dependent variable is strongly

depending on the factors we have selected as independent variables like Balls and

Boundaries. There are only approximately 2% chances that the dependent variable will be

affected by other factors i.e. state of pitch, match pressure etc. As the more variables are added to

the model the value of will be more.


39/49


38

The value of is also showing the fitness of our model, we are approximately 98% confidentthat our selected model is suitable to predict our dependent variable.

Our sample is also showing that we can strongly control the dependent variable with our

independent variable, which we have taken.

Actually the r tells us the reliability of the model more than the original thats why we use

adjusted r to know the approximately close answer to the reality. The value of adjusted r is

always less than the value of the r as given in the table.

Adjusted R = 0.9878. This value is showing that still we can control the errors up to

approximately 98% which is good indication for our model.

ANOVA of Regression Analysis

Overall Affect of Balls and Boundaries

Now we want to check the overall affect of the Balls (b) and Boundaries (b) on the Runs.So we will have to apply ANOVA to check this.

TABLE: 10.2

ANOVA

Df SS MS F Significance F

Regression 2 7108367.665 3554184 1.837663842 0.178549

Residual 27 52220085.8 1934077

Total 29 59328453.47

Explanation:

As we can see from the above table, the value of the MSR is greater than the MSE due to this the

f calculated value will be greater and it will tell us about whether b and b that we havecalculated, can be generalized on the populations or not.Now we will apply the hypothesis to test the overall affect of the changes in balls and

boundaries.

Ho: = = 0


40/49


39

H1: 0= 4.24

If

calculated value is < -4.24or > 4.24then we will reject Hootherwise dont reject.

Conclusion

We do not reject Ho because our f calculated value is 1178.0717 which means that the over all

affect of Balls and Boundaries exist on population bases.

Regression Analysis of PAKISTAN

TABLE: 11

Regression Table


Intercept -938.981 426.5087124 -2.20155 0.03642519

Production 0.151507 0.063218223 2.396576 0.023740225

Domestic Consumption -0.07234 0.066720678 -1.08423 0.287851529


= a + (b1*X1) + (b2*X2)

= -0.293 + 0.3630*X1 + 4.7083*X2Explanation:

Our intercept a = -0.293, it means that if every value other than a is equal to 0 then the runs of

the Yuvraj will be -0.293. It is showing the negative value of the intercept which indicates that

the average Runs of the Yuvraj will go in minus if he could not score the runs in the match



Now the value of b1 = 0.3630, it is showing the 0.3630change in the runs due to the change in





41/49


40

The value of b2 = 4.7083which is showing the 4.7083change in the runs due to the change in

one boundary. This value is showing the significant contribution in the model.

Now to generalize the above slopes to the population we will have to calculate the hypothesis

which will tell us that whether or not the above discussed changes exist on population base.

SCENARIO 1:

According to our calculation the value ofb = 0.36302. Now we want to check that whether thisper unit change exists in the population or not.

Ho: = 0H1:

0

= 2.048Ift calculated value is < -2.048or >2.048then we will reject Hootherwise dont reject.

t = 17.622

t =

Conclusion:

As t = 17.622 which is >2.048 so we will reject H. It means that we are in the favor of H thatindicates, the per unit b = 0.36302 change exists in the population.

SCENARIO 2:

According to our calculation the value ofb =4.7083. Now we want to check that whether thisper unit change exists in the population or not.

Ho: = 0H1: 0 = 2.048

Ift calculated value is < -1.701 or >1.701 then we will reject Hootherwise dont reject.

t = 36.58


42/49


41

Conclusion:

As t = 36.58 which is >2.048 so we will reject H. It means that we are in the favor of H thatindicates, the per unit b = 4.70831 change exists in the population.

We have proved the per unit changes on the bases of the samples on the population. Now we

want to check that how much our regression model is reliable or up to what extent we can control

the errors. For this purpose we will apply the Coefficient of Determination. It is denoted by

r.

TABLE: 11.1

Regression Statistics

Multiple R 0.62053

R Square 0.385057

Adjusted R Square 0.339506

Observations 30

The value of R square is telling that we can control approximately 98.12% errors by using the

above regression model. The variables are very important to represent the dependent variable

Runs.

The value of Adjusted R square tells the accurate value than the R square value.


Now we will apply the hypothesis to test the overall affect of the changes in balls and

boundaries.

TABLE: 11.2

ANOVA

df SS MS F

Significance

F

Regression 2 3728451.183 1864226 8.453268432 0.00141

Residual 27 5954394.017 220533.1

Total 29 9682845.2


43/49


42

Ho: = = 0H1: 0

= 4.24

Ifcalculated value is < -4.24or > 4.24then we will reject Hootherwise dont reject.Conclusion

We do not reject Ho because our f calculated value is 706.972039 lying in the rejection region

which means that the over all affect of Balls and Boundaries exist on population bases.

Regression Analysis of MEXICO

TABLE: 12

Regression Table


Intercept -1681.01 577.6260941 -2.91021 0.007151896

Production 0.031707 0.101635148 0.311969 0.757460537

Domestic Consumption 0.374968 0.071092842 5.274339 1.45836E-05


= a + (b1*X1) + (b2*X2)

= -0.3125 + 0.4097*X1 + 4.3161*X2Explanation:

Our intercept a = -0.3125, it means that if every value other than a is equal to 0 then the runs ofthe Yuvraj will be -0.3125. It is showing the negative value of the intercept which indicates that

the average Runs of the Yuvraj will go in minus if he could not score the runs in the match




44/49


43

Now the value of b1 = 0.4097, it is showing the 0.4097 change in the runs due to the change in




The value of b2 = 4.3161which is showing the 4.3161change in the runs due to the change inone boundary. This value is showing the significant contribution in the model.

Now we will generalize the above findings to the populations.

SCENARIO 1:

According to our calculation the value ofb = 0.4097

. Now we want to check that whether this per unit change exists in the population or not.

Ho: = 0H1: 0

= 2.048Ift calculated value is < -2.048 or >2.048 then we will reject Hootherwise dont reject.

t = 5.285

Conclusion:

As t = 5.285which is >2.048 so we will reject H. It means that we are in the favor of H thatIndicates, the per unit b = 0.4097 change exists in the population.

Ift calculated value is then we will reject Hootherwise dont reject.SCENARIO 2:

According to our calculation the value ofb = 4.3161. Now we want to check that whether thisper unit change exists in the population or not.

Ho: = 0H1: 0


45/49


44

= 2.048t = 3.344

Conclusion:

As t = 3.344which is > so we will reject H. It means that we are in the favor of H thatindicates, the per unit b = 4.3161change exists in the population.


Overall Affect of Balls and Boundaries

Now we will apply the hypothesis to test the overall affect of the changes in balls and

boundaries.

TABLE: 12.1

ANOVA

df SS MS F

Significance

F

Regression 2 2231316.65 1115658 14.41399149 5.51E-05

Residual 27 2089828.817 77401.07

Total 29 4321145.467

Ho: = = 0H1: 0

= 4.24

If

calculated value is < -4.24or > 4.24then we will reject Hootherwise dont reject.

Conclusion

We do not reject Ho because our f calculated value is 287.928 lying in the rejection region which

means that the over all affect of Balls and Boundaries exist on populationbases.


46/49


45

Comparison of COUNTRI

So according to our findings the intercepts of all three players are showing the negative value

which means that all of these three players should make the runs in their coming matches

because if they dont make the runs then their over all average will be reduced as the intercept

point is showing. As they increase their runs or strike rate more their average will start going up.

On the other hand the affect of the Balls and Boundaries is almost the same on all of the

players. Because we have selected those players who have high strike rate so that they increase

their runs by hitting more boundaries. If we talk about the balls separately then the b (Balls) hasalmost the same affect on the model of all three players. It has less affect than boundaries on runs

because we have already stated in the correlation topic that there are 50% chances of making

runs on every ball and if they make the runs but not hit the boundary it will affect less than the

boundaries. As boundaries increase the runs in high proportion so that b boundaries are showingmore affect on the model of the regression.

If we talk about the R square of the all three players, the value of the R square of all three players

is also almost close to each other. All three players have the value of R square more than 95% so

the variables which we have selected are very useful for our three selected players.

If we compare the results of the Correlation of all three players then the results of these players

again showing approximately the results very close to each other. The results of correlation table

of all three players are showing almost more than 97% relation of the dependent variable (Runs)

with dependent variables (Balls) and (Boundaries).


47/49


46

Crux of StudyIn this project we have applied following statistical tools Mean, Median, Stander Deviation,

Skewness, Kurtosis, Hypothesis, ANOVA, Multiple Regressions, and Correlation. After using

these statistical tools we get some results like the average ball played by the Shahid Afridi are

29.933 or approximately 21 balls, whereas the average ball played by Yuvraj Singh is 40 and the

average balls played by Chris Gayle are 29.5 or 30 balls. This calculation shows that the Indian

player Yuvraj Singh has played maximum balls as compared with other two players. If we talk

about the Median, it is another type of average. We have calculated Median because there are too

many out lair in this data. The reason of calculating Hypothesis is to check that the sample is

generalized able to the population or not. On the bases of our calculation the sample is

generalize able to the population data. Just like that we have used Skewness which shows the

symmetry of the data. Here in this project all the values of Skewness are positive which shows

that the data is positively skewed or symmetry of the data exists at right side. And if we talk

about the Kurtosis it shows the out lairs in the data. In this data all the out lairs exist on the

positive or right side. ANOVA shows the overall reliability or fitness of the model. And

correlation shows the relation among Independent variable and dependent variable individually.


48/49


47

Recommendations

We have some recommendations for the players which we have selected in the light of our

findings.

As we have concluded that as compared to the other two players Afridi is not a consistent player

and he got 0 runs in more matches than other two players shown in our raw data. So we

recommend the Shahid Afridi that he should stay on the wicket, as we have calculated his

intercept is in minus which shows the decrease in the average runs of the Afridi as his matches

increase. If he could not do so his average will go down worst.

Now we have some recommendations for Yuvraj. He is a good and consistent player than

Afridi. According to our findings his intercept is also in minus which illustrates the decrease in

the overall average if he could not make the runs in his matches. His intercept value is -0.29324

which is still less than theboth Afridi and Gayles intercept value. As he is consistent player than Afridi

so besides standing on the wicket he should hit some boundaries to boost up his average.

The Chris Gayle is very good player in these other two players. His intercept is also in negative but is less

than Afridi. His intercept value is -0.31253 so more strike rate is also needed by the Gayle to increase

his average scores.


49/49


Conclusion

From this project we have concluded that the Pakistani Cricketer Shahid Afridi is not a

consistent player. There are many out lairs in his data, which means that he had played matches

but he is unable to make any score in most of the matches as compared with other cricketers.

This analysis shows that he is not a good player among these 2 players. He is not a fixed player

in his batting order because he comes on 2nd

number or in 7 number beside this all he is a good

hitter of the ball he has a very good experience but one thing in himself is that after hitting a ball

he become aggressive after the another ball he loose his wicket thats the main draw back of him.

Just like Shahid Afridi, Chris Gayle is also not a good player. His strike rate is lower then the

Shahid Afridi but the average runs are more then the Shahid Afridi. Actually he is an opening

batsman the reason behind of his loosing wicket early and his low strike rate are when the match

starts the Pitch condition is soo speedy and grassy we can say that bowling wicket. For first 10 to

15 overs the Pitch is speedy so it is difficult for him to survive or stay on the wicket.

If we talk about the Yuvraj Singh he is better then other these two players because his strike rateis good and also the average runs of Yuvraj Singh is more then other two players. And also

Yuvraj Singh is consistent player because he is a middle order batsman his batting style is calm

and steady he gets the benefit for late coming the reason is that the rival players become little bit

lazy and the ball become older and it is a good chance for making a score.

Thank You

121958653 business quantitative techniques

Documents