arya statistics 11

7/31/2019 Arya Statistics 11

1/12

Quantitative Techniques 2012

1

Project Report

OnQuantitative Statistics

Submitted to:Prof. Venkatesh ShekharCourse In-chargeManagerial Statistics

Roll No-P301311CMG216

NIIT University, Neemrana

Rajasthan

Submitted by:

Arya PradhanMBA (F&B), Batch III, Term I


2/12


2

Table of Content

Sl. No Topic1 Objectives

2 Introduction

3 Measure of Central Tendency

7 Frequency Distribution

8 Probability on the Curve

9 Sample Probability

9 Hypothesis Testing for Single Population

10 Single Factor Anova

10 Hypothesis Testing for 2 Population

11 F-test

12 Conclusion


3/12


3

Objective

This project aims at understanding statistics as a tool to explore a collected data for time

spent on moodle for the Month of May and June 2012. The project aims to summarise and

interpret data in the correct perspective with the use of statistical models and formulae.

The inference for all statistical results aims to understand various concepts like:

Measure of Central Tendency

Measure of Dispersion

Concept of Outliers

Frequency Distribution

Probability on the Curve

Sample Probability

Hypothesis Testing for Single Population

Hypothesis Testing for 2 Population

Single Factor Anova

F- test


4/12


4

Introduction

Statistics is the study of the collection, organization, analysis, and interpretation of data. It

deals with all aspects of this, including the planning of data collection in terms of the design

of surveys and experiments.

A statistician is someone who is particularly well versed in the ways of thinking necessary for

the successful application of statistical analysis. Such people have often gained this

experience through working in any of a wide number of fields. There is also a discipline

called mathematical statistics that studies statistics mathematically.

Statistical methods can be used for summarizing or describing a collection of data; this is

called descriptive statistics. This is useful in research, when communicating the results of

experiments. In addition, patterns in the data may be modelled in a way that accounts for

randomness and uncertainty in the observations, and are then used for drawing inferences

about the process or population being studied; this is called inferential statistics. Inference is

a vital element of scientific advance, since it provides a means for drawing conclusions from

data that are subject to random variation.
http://en.wikipedia.org/wiki/Datahttp://en.wikipedia.org/wiki/Statistical_surveyhttp://en.wikipedia.org/wiki/Experimental_designhttp://en.wikipedia.org/wiki/Statisticianhttp://en.wikipedia.org/wiki/List_of_fields_of_application_of_statisticshttp://en.wikipedia.org/wiki/Mathematical_statisticshttp://en.wikipedia.org/wiki/Mathematical_statisticshttp://en.wikipedia.org/wiki/List_of_fields_of_application_of_statisticshttp://en.wikipedia.org/wiki/Statisticianhttp://en.wikipedia.org/wiki/Experimental_designhttp://en.wikipedia.org/wiki/Statistical_surveyhttp://en.wikipedia.org/wiki/Data


5/12


5

POPULATION DATA SET

Population data is composed of observations of time spent on moodle at various times, with

the data from each observation serving as a different member of the overall group. In short itis a complete set of data for conducting any statistical analysis.

The table below represents data collected for the last two months.

DateTime spent onMoodle

01-May 0.0

02-May 0.0

03-May 0.0

04-May 0.0

05-May 2.5

06-May 5.007-May 7.5

08-May 2.7

09-May 6.9

10-May 7.0

11-May 0.0

12-May 15.5

13-May 2.5

14-May 2.0

15-May 2.7

16-May 8.6

17-May 0.018-May 0.0

19-May 2.5

20-May 15.0

21-May 5.3

22-May 4.4

23-May 16.8

24-May 6.7

25-May 0.0

26-May 7.1

27-May 17.5

28-May 0.029-May 10.5

30-May 4.9

31-May 6.9

01-Jun 8.1

02-Jun 11.2

03-Jun 6.9

04-Jun 7.3

05-Jun 0.0

06-Jun 16.8

07-Jun 8.1

08-Jun 19.9

09-Jun 8.1

10-Jun 6.9


6/12


6

11-Jun 16.8

12-Jun 7.0

13-Jun 5.3

14-Jun 10.6

15-Jun 0.0

16-Jun 6.717-Jun 0.0

18-Jun 3.9

19-Jun 0.0

DESCRIPTIVE STATISTICS:

Summarizes the population data by describing what was observed in the sample numerically

or graphically. Numerical descriptors include mean and standard deviation for continuous

data types (like heights or weights), while frequency and percentage are more useful in

terms of describing categorical data. The table below represents descriptive analysis of

population data and its inference.

Mode 0.000

Median 6.003

Mean 6.086

Qmin 0.000

Q1 0.508

Q2 6.003

Q3 8.090

Qmax 19.926

Variance 30.627

Standard Deviation 5.534

Mean Absolute Deviation 4.337

Coefficient Of Variation 91%

Skewness 0.817625

INFERENCES

Range - In the descriptive statistics, the range is the length of the smallest interval which

contains all the data. It is calculated by subtracting the smallest observation (sample

minimum) from the greatest (sample maximum) and provides an indication of statistical

dispersion. In our case the range of time spent on moodle is 19.9.

Mean - For a data set, the mean is the sum of the values divided by the number of values.

The mean of a set of numbers x1, x2, ..., xn is typically denoted by, pronounced "x bar". This

mean is a type of arithmetic mean. If the data set were based on a series of observations

obtained by sampling a statistical population, this mean is termed the "sample mean" to


7/12


7

distinguish it from the "population mean". In our case the population mean is 6.086, which is

average daily time spent on moodle.

Median - The median of a set of data values is the middle value of the data set when it has

been arranged in ascending order. That is, from the smallest value to the highest value. In

our case the median is 6.003.

Outlier - An outlying observation, or outlier, is one that appears to deviate markedly from

other members of the sample in which it occurs.

Outliers can occur by chance in any distribution, but they are often indicative either of

measurement error or that the population has a heavy-tailed distribution. In the former case

one wishes to discard them or use statistics that are robust to outliers, while in the latter

case they indicate that the distribution has high kurtosis and that one should be very

cautious in using tools or intuitions that assume a normal distribution.

Q min Q1 Q2 Q3 Q maxQuartile 0.0 0.508 6.003 8.090 19.926

Standard Deviation - In statistics, standard deviation (represented by the symbol ) shows

how much variation or "dispersion" exists from the average (mean, or expected value). A low

standard deviation indicates that the data points tend to be very close to the mean, whereas

high standard deviation indicates that the data points are spread out over a large range of

values. The Standard Deviation of 5.534 represents the measure of dispersion in data.

Skewness- It is a measure of the asymmetry of the probability distribution of a real-valued

random variable. The skewness value can be positive or negative, or even undefined.Qualitatively, a negative skew indicates that the tail on the left side of the probability density

function is longer than the right side and the bulk of the values lie to the right of the mean. A

positive skew indicates that the tail on the right side is longer than the left side and the bulk

of the values lie to the left of the mean. A zero value indicates that the values are relatively

evenly distributed on both sides of the mean, typically but not necessarily implying a

symmetric distribution. In this case the data is positively skewed (0.817).

Frequency Distribution - In statistics, a frequency distribution is an arrangement of the

values that one or more variables take in a sample. Each entry in the table contains the

frequency or count of the occurrences of values within a particular group or interval, and in

this way, the table summarizes the distribution of values in the sample.

Frequency distributions are used for both qualitative and quantitative data. From the

histogram we can infer that the most of the time spent on moodle lie within 0 -0.5 minutes

bucket.


8/12


8

From this we can infer that the most the time moodle has been used only for downloading

the study material.

Random Sampling - A random sample is one chosen by a method involving an

unpredictable component. Random sampling can also refer to taking a number of

independent observations from the same probability distribution, without involving any real

population.

The random sample drawn in this case is:

Random Sample1 6.71

2 2.50

3 2.70

4 7.04

5 7.14

6 7.14

7 2.70

8 4.93

9 0.00

10 0.00

11 0.00

12 6.71

13 7.30

14 16.81

15 8.09

16 8.09

17 0.00

18 17.50

19 7.14

20 0.00

21 0.00

22 0.00

23 6.89

24 11.19

0

2

4

6

8

10

12

14

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Frequency Distribution

Frequency


9/12


9

Probability for the Population and Interval Estimates:

Let us consider an example of time spent on moodle. The probability of spending less than 4

minutes is 43.15%. Now we will estimate the population mean from sample mean and thesame can be done with confidence interval approach and the details as follows

ESTIMATING POPULATION MEAN FROM SAMPLE

Confidence Interval 95%

t Value 2.07

Sample Mean 5.44

Sample Size 24

Point Estimator 5.44

Interval Estimate (Upper value) 7.66

Interval Estimate (Lower value) 3.22

Population Mean 6.09

HYPOTHESIS TESTING ABOUT SINGLE POPULATION

H1:U= 4.80

H2:U not equal to 4.8

t calculated -0.60

Confidence Interval 95%

t Critical Value (Two tailed test) +-2.069

Hypothesis cannot be rejected

Population mean is within the confidence interval of 7.23 minutes to 2.60 minutes.

Hypothesis Test (Single Population)

Let us consider Null hypothesis to be = 4.80. Alternate hypothesis is not equal to

4.80.

Since the sample size is less than 30, we have used the t- distribution. For the same we

have considered the random sample of 24 values and the mean sample has also been found

out, which is 4.92 minutes. Using the t- distribution we have found out the t- calculated value

as 0.11 which is less than the t critical value of +-2.069 for a two-tailed test (Confidence


10/12


10

Interval = 95%, degree of freedom = 23). Since the t calculated is within the acceptance

region, we have accepted null hypothesis (= 4.80).

2 Population Tests:

A Z-test is any statistical test for which the distribution of the test statistic under the nullhypothesis can be approximated by a normal distribution. Because of the central limit

theorem, many test statistics are approximately normally distributed for large samples. For

each significance level, the Z-test has a single critical value (for example, 1.96 for 5% two

tailed) which makes it more convenient than the Student's t-test which has separate critical

values for each sample size. Therefore, many statistical tests can be conveniently performed

as approximate Z-tests if the sample size is large or the population variance known. If the

population variance is unknown (and therefore has to be estimated from the sample itself)

and the sample size is not large, the Student t-test may be more appropriate.

Now we are considering the two sample test and here we have taken the sample from

population of time spent by Devesh and Dennis. Z- Distribution is being utilised to find thatthere is any difference in time spent on moodle by both the persons.

Let us consider Null hypothesis to be Ho:1=2. Alternate hypothesis is Ha:12

z-Test: Two Sample for Means

DENNIS ARYA

Mean 5.09 3.52Known Variance 26.56 21.32

Observations 31.00 31.00Hypothesized Mean Difference 0.00z 1.26P(Z


11/12


11

chance of committing a type I error. For this reason, ANOVAs are useful in comparing two,

three, or more means. Based on the above samples, we shall undertake the following

hypothesis.

H1:1=2=3

H2: Any of the sample means are not equivalent to the others.

Anova: Single Factor

SUMMARYGroups Count Sum Average Variance

Dennis 31 157.90 5.09 28.26Devesh 31 109.18 3.52 26.30

Arya 31 236.39 7.62 28.80

ANOVASource ofVariation SS df MS F P-value F crit

BetweenGroups 265.77 2 132.88 4.78 0.0106 3.098Within Groups 2501.42 90 27.79

Total 2767.20 92

Since F calculated is greater than F critical, we should reject the null hypothesis.Which

means that there is a difference in time spent on moodle by all three persons.

F

Test:

An F-test is any statistical test in which the test statistic has an F-distribution under the null

hypothesis. It is most often used when comparing statistical models that have been fit to

a data set, in order to identify the model that best fits the population from which the data

were sampled. Exact F-tests mainly arise when the models have been fit to the data

using least squares

F-TEST TWO-SAMPLE FOR VARIANCES

Dennis Devesh

Mean

3.52 7.63

Variance26.31 28.81

Observations31.00 31

df30.00 30

F0.91

P(F


12/12


12

F-test is rejected for the above case as F calculated < F critical. Hence the null hypothesis ofHo: 1=2 is accepted.

CONCLUSION:

We have done statistical analysis upon the time spent on moodle by members within the

group. The time spend pattern patterns between Members are out of sync as the measures

of dispersion are too wide.

arya statistics 11

Documents