arya statistics 11
TRANSCRIPT
-
7/31/2019 Arya Statistics 11
1/12
Quantitative Techniques 2012
1
Project Report
OnQuantitative Statistics
Submitted to:Prof. Venkatesh ShekharCourse In-chargeManagerial Statistics
Roll No-P301311CMG216
NIIT University, Neemrana
Rajasthan
Submitted by:
Arya PradhanMBA (F&B), Batch III, Term I
-
7/31/2019 Arya Statistics 11
2/12
Quantitative Techniques 2012
2
Table of Content
Sl. No Topic1 Objectives
2 Introduction
3 Measure of Central Tendency
7 Frequency Distribution
8 Probability on the Curve
9 Sample Probability
9 Hypothesis Testing for Single Population
10 Single Factor Anova
10 Hypothesis Testing for 2 Population
11 F-test
12 Conclusion
-
7/31/2019 Arya Statistics 11
3/12
Quantitative Techniques 2012
3
Objective
This project aims at understanding statistics as a tool to explore a collected data for time
spent on moodle for the Month of May and June 2012. The project aims to summarise and
interpret data in the correct perspective with the use of statistical models and formulae.
The inference for all statistical results aims to understand various concepts like:
Measure of Central Tendency
Measure of Dispersion
Concept of Outliers
Frequency Distribution
Probability on the Curve
Sample Probability
Hypothesis Testing for Single Population
Hypothesis Testing for 2 Population
Single Factor Anova
F- test
-
7/31/2019 Arya Statistics 11
4/12
Quantitative Techniques 2012
4
Introduction
Statistics is the study of the collection, organization, analysis, and interpretation of data. It
deals with all aspects of this, including the planning of data collection in terms of the design
of surveys and experiments.
A statistician is someone who is particularly well versed in the ways of thinking necessary for
the successful application of statistical analysis. Such people have often gained this
experience through working in any of a wide number of fields. There is also a discipline
called mathematical statistics that studies statistics mathematically.
Statistical methods can be used for summarizing or describing a collection of data; this is
called descriptive statistics. This is useful in research, when communicating the results of
experiments. In addition, patterns in the data may be modelled in a way that accounts for
randomness and uncertainty in the observations, and are then used for drawing inferences
about the process or population being studied; this is called inferential statistics. Inference is
a vital element of scientific advance, since it provides a means for drawing conclusions from
data that are subject to random variation.
http://en.wikipedia.org/wiki/Datahttp://en.wikipedia.org/wiki/Statistical_surveyhttp://en.wikipedia.org/wiki/Experimental_designhttp://en.wikipedia.org/wiki/Statisticianhttp://en.wikipedia.org/wiki/List_of_fields_of_application_of_statisticshttp://en.wikipedia.org/wiki/Mathematical_statisticshttp://en.wikipedia.org/wiki/Mathematical_statisticshttp://en.wikipedia.org/wiki/List_of_fields_of_application_of_statisticshttp://en.wikipedia.org/wiki/Statisticianhttp://en.wikipedia.org/wiki/Experimental_designhttp://en.wikipedia.org/wiki/Statistical_surveyhttp://en.wikipedia.org/wiki/Data -
7/31/2019 Arya Statistics 11
5/12
Quantitative Techniques 2012
5
POPULATION DATA SET
Population data is composed of observations of time spent on moodle at various times, with
the data from each observation serving as a different member of the overall group. In short itis a complete set of data for conducting any statistical analysis.
The table below represents data collected for the last two months.
DateTime spent onMoodle
01-May 0.0
02-May 0.0
03-May 0.0
04-May 0.0
05-May 2.5
06-May 5.007-May 7.5
08-May 2.7
09-May 6.9
10-May 7.0
11-May 0.0
12-May 15.5
13-May 2.5
14-May 2.0
15-May 2.7
16-May 8.6
17-May 0.018-May 0.0
19-May 2.5
20-May 15.0
21-May 5.3
22-May 4.4
23-May 16.8
24-May 6.7
25-May 0.0
26-May 7.1
27-May 17.5
28-May 0.029-May 10.5
30-May 4.9
31-May 6.9
01-Jun 8.1
02-Jun 11.2
03-Jun 6.9
04-Jun 7.3
05-Jun 0.0
06-Jun 16.8
07-Jun 8.1
08-Jun 19.9
09-Jun 8.1
10-Jun 6.9
-
7/31/2019 Arya Statistics 11
6/12
Quantitative Techniques 2012
6
11-Jun 16.8
12-Jun 7.0
13-Jun 5.3
14-Jun 10.6
15-Jun 0.0
16-Jun 6.717-Jun 0.0
18-Jun 3.9
19-Jun 0.0
DESCRIPTIVE STATISTICS:
Summarizes the population data by describing what was observed in the sample numerically
or graphically. Numerical descriptors include mean and standard deviation for continuous
data types (like heights or weights), while frequency and percentage are more useful in
terms of describing categorical data. The table below represents descriptive analysis of
population data and its inference.
Mode 0.000
Median 6.003
Mean 6.086
Qmin 0.000
Q1 0.508
Q2 6.003
Q3 8.090
Qmax 19.926
Variance 30.627
Standard Deviation 5.534
Mean Absolute Deviation 4.337
Coefficient Of Variation 91%
Skewness 0.817625
INFERENCES
Range - In the descriptive statistics, the range is the length of the smallest interval which
contains all the data. It is calculated by subtracting the smallest observation (sample
minimum) from the greatest (sample maximum) and provides an indication of statistical
dispersion. In our case the range of time spent on moodle is 19.9.
Mean - For a data set, the mean is the sum of the values divided by the number of values.
The mean of a set of numbers x1, x2, ..., xn is typically denoted by, pronounced "x bar". This
mean is a type of arithmetic mean. If the data set were based on a series of observations
obtained by sampling a statistical population, this mean is termed the "sample mean" to
-
7/31/2019 Arya Statistics 11
7/12
Quantitative Techniques 2012
7
distinguish it from the "population mean". In our case the population mean is 6.086, which is
average daily time spent on moodle.
Median - The median of a set of data values is the middle value of the data set when it has
been arranged in ascending order. That is, from the smallest value to the highest value. In
our case the median is 6.003.
Outlier - An outlying observation, or outlier, is one that appears to deviate markedly from
other members of the sample in which it occurs.
Outliers can occur by chance in any distribution, but they are often indicative either of
measurement error or that the population has a heavy-tailed distribution. In the former case
one wishes to discard them or use statistics that are robust to outliers, while in the latter
case they indicate that the distribution has high kurtosis and that one should be very
cautious in using tools or intuitions that assume a normal distribution.
Q min Q1 Q2 Q3 Q maxQuartile 0.0 0.508 6.003 8.090 19.926
Standard Deviation - In statistics, standard deviation (represented by the symbol ) shows
how much variation or "dispersion" exists from the average (mean, or expected value). A low
standard deviation indicates that the data points tend to be very close to the mean, whereas
high standard deviation indicates that the data points are spread out over a large range of
values. The Standard Deviation of 5.534 represents the measure of dispersion in data.
Skewness- It is a measure of the asymmetry of the probability distribution of a real-valued
random variable. The skewness value can be positive or negative, or even undefined.Qualitatively, a negative skew indicates that the tail on the left side of the probability density
function is longer than the right side and the bulk of the values lie to the right of the mean. A
positive skew indicates that the tail on the right side is longer than the left side and the bulk
of the values lie to the left of the mean. A zero value indicates that the values are relatively
evenly distributed on both sides of the mean, typically but not necessarily implying a
symmetric distribution. In this case the data is positively skewed (0.817).
Frequency Distribution - In statistics, a frequency distribution is an arrangement of the
values that one or more variables take in a sample. Each entry in the table contains the
frequency or count of the occurrences of values within a particular group or interval, and in
this way, the table summarizes the distribution of values in the sample.
Frequency distributions are used for both qualitative and quantitative data. From the
histogram we can infer that the most of the time spent on moodle lie within 0 -0.5 minutes
bucket.
-
7/31/2019 Arya Statistics 11
8/12
Quantitative Techniques 2012
8
From this we can infer that the most the time moodle has been used only for downloading
the study material.
Random Sampling - A random sample is one chosen by a method involving an
unpredictable component. Random sampling can also refer to taking a number of
independent observations from the same probability distribution, without involving any real
population.
The random sample drawn in this case is:
Random Sample1 6.71
2 2.50
3 2.70
4 7.04
5 7.14
6 7.14
7 2.70
8 4.93
9 0.00
10 0.00
11 0.00
12 6.71
13 7.30
14 16.81
15 8.09
16 8.09
17 0.00
18 17.50
19 7.14
20 0.00
21 0.00
22 0.00
23 6.89
24 11.19
0
2
4
6
8
10
12
14
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Frequency Distribution
Frequency
-
7/31/2019 Arya Statistics 11
9/12
Quantitative Techniques 2012
9
Probability for the Population and Interval Estimates:
Let us consider an example of time spent on moodle. The probability of spending less than 4
minutes is 43.15%. Now we will estimate the population mean from sample mean and thesame can be done with confidence interval approach and the details as follows
ESTIMATING POPULATION MEAN FROM SAMPLE
Confidence Interval 95%
t Value 2.07
Sample Mean 5.44
Sample Size 24
Point Estimator 5.44
Interval Estimate (Upper value) 7.66
Interval Estimate (Lower value) 3.22
Population Mean 6.09
HYPOTHESIS TESTING ABOUT SINGLE POPULATION
H1:U= 4.80
H2:U not equal to 4.8
t calculated -0.60
Confidence Interval 95%
t Critical Value (Two tailed test) +-2.069
Hypothesis cannot be rejected
Population mean is within the confidence interval of 7.23 minutes to 2.60 minutes.
Hypothesis Test (Single Population)
Let us consider Null hypothesis to be = 4.80. Alternate hypothesis is not equal to
4.80.
Since the sample size is less than 30, we have used the t- distribution. For the same we
have considered the random sample of 24 values and the mean sample has also been found
out, which is 4.92 minutes. Using the t- distribution we have found out the t- calculated value
as 0.11 which is less than the t critical value of +-2.069 for a two-tailed test (Confidence
-
7/31/2019 Arya Statistics 11
10/12
Quantitative Techniques 2012
10
Interval = 95%, degree of freedom = 23). Since the t calculated is within the acceptance
region, we have accepted null hypothesis (= 4.80).
2 Population Tests:
A Z-test is any statistical test for which the distribution of the test statistic under the nullhypothesis can be approximated by a normal distribution. Because of the central limit
theorem, many test statistics are approximately normally distributed for large samples. For
each significance level, the Z-test has a single critical value (for example, 1.96 for 5% two
tailed) which makes it more convenient than the Student's t-test which has separate critical
values for each sample size. Therefore, many statistical tests can be conveniently performed
as approximate Z-tests if the sample size is large or the population variance known. If the
population variance is unknown (and therefore has to be estimated from the sample itself)
and the sample size is not large, the Student t-test may be more appropriate.
Now we are considering the two sample test and here we have taken the sample from
population of time spent by Devesh and Dennis. Z- Distribution is being utilised to find thatthere is any difference in time spent on moodle by both the persons.
Let us consider Null hypothesis to be Ho:1=2. Alternate hypothesis is Ha:12
z-Test: Two Sample for Means
DENNIS ARYA
Mean 5.09 3.52Known Variance 26.56 21.32
Observations 31.00 31.00Hypothesized Mean Difference 0.00z 1.26P(Z
-
7/31/2019 Arya Statistics 11
11/12
Quantitative Techniques 2012
11
chance of committing a type I error. For this reason, ANOVAs are useful in comparing two,
three, or more means. Based on the above samples, we shall undertake the following
hypothesis.
H1:1=2=3
H2: Any of the sample means are not equivalent to the others.
Anova: Single Factor
SUMMARYGroups Count Sum Average Variance
Dennis 31 157.90 5.09 28.26Devesh 31 109.18 3.52 26.30
Arya 31 236.39 7.62 28.80
ANOVASource ofVariation SS df MS F P-value F crit
BetweenGroups 265.77 2 132.88 4.78 0.0106 3.098Within Groups 2501.42 90 27.79
Total 2767.20 92
Since F calculated is greater than F critical, we should reject the null hypothesis.Which
means that there is a difference in time spent on moodle by all three persons.
F
Test:
An F-test is any statistical test in which the test statistic has an F-distribution under the null
hypothesis. It is most often used when comparing statistical models that have been fit to
a data set, in order to identify the model that best fits the population from which the data
were sampled. Exact F-tests mainly arise when the models have been fit to the data
using least squares
F-TEST TWO-SAMPLE FOR VARIANCES
Dennis Devesh
Mean
3.52 7.63
Variance26.31 28.81
Observations31.00 31
df30.00 30
F0.91
P(F
-
7/31/2019 Arya Statistics 11
12/12
Quantitative Techniques 2012
12
F-test is rejected for the above case as F calculated < F critical. Hence the null hypothesis ofHo: 1=2 is accepted.
CONCLUSION:
We have done statistical analysis upon the time spent on moodle by members within the
group. The time spend pattern patterns between Members are out of sync as the measures
of dispersion are too wide.