Download - Statistics

Transcript
Page 1: Statistics

Spring 2012

Master of Business Administration- MBA Semester 1

MB0040 – Statistics for Management - 4 Credits (Book ID: B1129)

Assignment Set - 1 (60 Marks)

*Note: Each question carries 10 Marks. Answer all the questions.

Q1. What are the functions of Statistics? Distinguish between Primary data and Secondary Data.

Ans: Functions of Statistics

Statistics is used for various purposes. It is used to simplify mass data and to make comparisons easier. It is also used to bring out trends and tendencies in the data as well as the hidden relations between variables. All this helps to make decision making much easier. Let us look at each function of Statistics in detail.

1. Statistics simplifies mass dataThe use of statistical concepts helps in simplification of complex data. Using statistical concepts, the managers can make decisions more easily. The statistical methods help in reducing the complexity of the data and consequently in the understanding of any huge mass of data.

Example: Fifty people were interviewed to rate a regional movie on the scale of 1 to 10, with number 1 being for the top movie and number 10 being for the worst movie. The table 1a shows the ratings given by 50 customers. Simplify the data?Table 1a The ratings (scale of 1 to 10) for aRegional movie given by 50 customers

1 5 7 6 8 7 5 3 4 7 1 2 5 8 7 4 7 4 2 4 9 8 7 2 54 5 7 9 8 7 8 9 6 7 2 3 2 8 7 6 3 5 7 6 3 9 5 4 8

 

The data in table 1a can be condensed and is presented in table 1b using the statistical concepts such as calculating frequency and frequency distribution to draw conclusions and then frequency table is prepared. In this example, from the bulk data consisting of 50 rating scores, the frequency table was prepared. The frequency table is in condensed and simple form. From the tabled data, we can easily interpret that for the regional movie, most of the customers gave a 7 rating (that is, 11 customers). Only two customers gave a rating of 1 for the regional movie, which means only two out of 50 customers surveyed liked the regional movie the most.

Page 2: Statistics

Table 1b Frequency tableRating Frequency Frequency Distribution1 2 2/50 = 0.042 5 5/50 = 0.103 4 4/50 = 0.084 6 6/50 = 0.125 7 7/50 = 0.146 4 4/50 = 0.087 11 11/50 = 0.228 7 7/50 = 0.149 4 4/50 = 0.0810 0 0/50 =0Total 50 1

 

2. Statistics makes comparison easierWithout using statistical methods and concepts, collection of data and comparison cannot be done easily. Statistics helps us to compare data collected from different sources. Grand totals, measures of central tendency, measures of dispersion, graphs and diagrams, coefficient of correlation all provide ample scopes for comparison.

Hence, visual representation of numerical data helps you to compare the data with less effort and can make effective decisions.

3. Statistics brings out trends and tendencies in the dataAfter data is collected, it is easy to analyse the trend and tendencies in the data by using the various concepts of Statistics.

4. Statistics brings out the hidden relations between variablesStatistical analysis helps in drawing inferences on data. Statistical analysis brings out the hidden relations between variables.

5. Decision making power becomes easierWith the proper application of Statistics and statistical software packages on the collected data, managers can take effective decisions, which can increase the profits in a business.

 

The differences between primary and secondary data are listed below: Primary Data

1. Data is original and thus more accurate and reliable.2. Gathering data is expensive.3. Data is not easily accessible.4. Most of the data is homogeneous.5. Collection of data requires more time.6. Extra precautionary measures need not be taken.7. Data gives detailed information.

Page 3: Statistics

Secondary Data1. Data is not reliable.2. Gathering data is cheap3. Data is easily accessible through internet or other resources.4. Data is not homogeneous.5. Collection of data requires less time.6. Data needs extra care.7. Data may not be adequate.

Q2. Draw a histogram for the following distribution:

Age 0-10 10-20 20-30 30-40 40-50No. Of people

5 10 15 8 2

Histogram showing following data

Page 4: Statistics

Q3. Find the median value of the following set of values: 45, 32, 31, 46, 40, 28, 27, 37, 36, 41.

Ans: Arranging in ascending order, we get:

27,28,31,32,36,37,40,41,45,46

We have, n=10

Therefore, Median=(10+1) th /2

Value=5.5th

M=(36+37)/2

=73/2

= 36.5

The median for the given set of values is 36.5

Q4. Calculate the standard deviation of the following data:

Marks 78-80 80-82 82-84 84-86 86-88 88-90No. of students

3 15 26 23 9 4

Ans: The table below represents the frequency distribution of data required for calculating th standard deviation.

Class interval

Mid value x Frequency”F”

D=x-832

Fd Fd^2

78-80 79 3 -2 -6 1280-82 81 15 -1 -15 1582-84 83 26 0 0 084-86 85 23 1 23 2386-88 87 9 2 18 3688-90 89 4 3 12 36

80 32 122

Page 5: Statistics

=

=[122/80-[32/80]^2]X(2)^2

=[1.525-0.16]x4

=5.46(mm)

=Variance

Standard deviation= =2.336(mm)

Q5. An unbiased coin is tossed six times. What is the probability that the tosses will result in: (i) exactly two heads and (ii) at least five heads

Ans:

Let ‘A’ be the event of getting head. Given that:

(i) The probability that the tosses will result in exactly two heads is given by:

Therefore, the probability that the tosses will result in exactly two heads is 15/64.

 

(ii) The probability that the tosses will result in at least five heads is given by:

Page 6: Statistics

Therefore, the probability that the tosses will result in at least five heads is 7/64.

Q6. Explain briefly the types of sampling.

Ans: There are two types of sampling. They are briefed as follows:

a) Probability Sampling: it provide a specific technique of drawing samples from the population. The technique of drawing sampoles is according to the law which unit has a predetermined probability of being included in the sample. The different ways if assingning probability are :

i) each unit has the same chance of being selected.ii) Sampling units have varying probability.iii) Units have probability proportional to the sample size.b) Non-probability sampling: depending upon the object of inquiry and

opther considerations a predetermined number of sample units is selected proposely so that they represents the true characteristics of the population.A serious drawback of the sampling design is that it is highly subjective in nature. The selection of sample units depends entirely upon the personal convenience, biases, prejudices and beliefs of the investigator. This method will be more successful if the investigator is thoroughly skilled and experienced.

Page 7: Statistics

Spring 2012

Master of Business Administration- MBA Semester 1

MB0040 – Statistics for Management - 4 Credits (Book ID: B1129)

Assignment Set - 2 (60 Marks)

*Note: Each question carries 10 Marks. Answer all the questions.

Q1. Explain the following terms with respect to Statistics: (i) Sample, (ii) Variable, (iii) Population.

Ans: (i) Sample

In statistics, a sample is a subset of a population. Typically, the population is very large, making a censor a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size. Samples are collected and statistics are calculated from the samples so that one can make inferences or extrapolations from the sample to the population. This process of collecting information from a sample is referred to a sampling. A complete sample is a set of objects from a parent population that includes ALL such objects that satisfy a set of well-defined selection criteria. For example, a complete sample of Australian men taller than 2m would consist of a list of every Australian male taller than 2m. But it wouldn't include German males, or tall Australian females, or people shorter than 2m. So to compile such a complete sample requires a complete list of the parent population, including data on height, gender, and nationality for each member of that parent population. In the case of human populations, such a complete list is unlikely to exist, but such complete samples are often available in other disciplines, such as complete magnitude-limited samples of astronomical objects. An unbiased sample is a set of objects chosen from a complete sample using a selection process that does not depend on the properties of the objects. For example, an unbiased sample of Australian men taller than 2m might consist of a randomly sampled subset of 1% of Australian males taller than 2m. But one chosen from the electoral register might not be unbiased since, for example, males aged under 18 will not be on the electoral register. In an astronomical context, an unbiased sample might consist of that fraction of a complete sample for which data are available, provided the data availability is not biased by individual source properties. The best way to avoid a biased or unrepresentative sample is to select a random sample, also known as a probability sample. A random sample is defined as a sample where each individual member of the population has a known, non-zero chance of being selected as part of the sample Several types of random samples are simple random samples, systematic samples, stratified random samples, and cluster random samples. 

(ii) VariableA variable is a characteristic that may assume more than one set of values to which a numerical measure can be assigned. Height, age, amount of income, province or country of birth, grades obtained at school and type of housing are all examples of variables. Variables may be classified into various categories, some of which are outlined in this section.

Page 8: Statistics

Categorical variables:A categorical variable (also called qualitative variable) is one for which each response can be put into a specific category. These categories must be mutually exclusive and exhaustive. Mutually exclusive means that each possible survey response should belong to only one category, whereas, exhaustive requires that the categories should cover the entire set of possibilities. Categorical variables can be either nominal or ordinal.Nominal variables:A nominal variable is one that describes a name or category. Contrary to ordinal variables, there is no 'natural ordering' of the set of possible names or categories. Ordinal variables: An ordinal variable is a categorical variable for which the possible categories can be placed in a specific order or in some 'natural' way. Numeric variables: A numeric variable, also known as a quantitative variable, is one that can assume a number of real values such as age or number of people in a household. However, not all variables described by numbers are considered numeric. For example, when you are asked to assign a value from 1 to 5 to express your level of satisfaction, you use numbers, but the variable(satisfaction) is really an ordinal variable. Numeric variables may be either continuous or discrete. Continuous variables: A variable is said to be continuous if it can assume an infinite number of real values. Examples of a continuous variable are distance, age and temperature. The measurement of a continuous variable is restricted by the methods used, or by the accuracy of the measuring instruments. For example, the height of a student is a continuous variable because a student may be 1.6321748755 metre tall. Discrete variables: As opposed to a continuous variable, a discrete variable can only take a finite number of real values. An example of a discrete variable would be the score given by a judge to a gymnast in competition: the range is 0 to 10 and the score is always given to one decimal (e.g.,a score of 8.5)

(iii) Population A statistical population is a set of entities concerning which statistical inferencesare to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest. Notice that if we choose a population like all crows, we will be limited to observing crows that exist now or will exist in the future. Probably, geography will also constitute a limitation in that our resources for studying crows are also limited. Population is also used to refer to a set of potential measurements or values, including not only cases actually observed but those that are potentially observable. Suppose, for example, we are interested in the set of all adult crows now alive in the county of Cambridge shire, and we want to know the mean weight of these birds. For each bird in the population of crows there is a weight, and the set of these weights is called the population of weights.

A) A subset of a population is called a subpopulation. If different subpopulations have different properties, the properties and response of the overall population can often be better understood if it is first separated into distinct subpopulations.

B) For instance, a particular medicine may have different effects on different subpopulations, and these effects may be obscured or dismissed if such special subpopulations are not identified and examined in isolation.

C) Similarly, one can often estimate parameters more accurately if one separates out subpopulations: distribution of heights among people is better modeled by considering men and women as separate subpopulations, for instance.

D) Populations consisting of subpopulations can be modeled by mixture models, which combine the distributions within subpopulations into an overall population distribution.

Page 9: Statistics

Q2. What are the types of classification of data?

Ans:  

According to Nature1. Quantitative data- information obtained from numeral variables(e.g. age, bills, etc)2. Qualitative Data- information obtained from variables in the form of categories, characteristics names or labels or alphanumeric variables (e.g. birthdays, gender etc.)

 According to Source1. Primary data- first- hand information (e.g. autobiography, financial statement)2. Secondary data- second-hand information (e.g. biography, weather forecast from newspapers)

According to Measurement

1. Discrete data- countable numerical observation.-Whole numbers only has an equalwhole number interval obtained through counting(e.g. corporate stocks, etc.)

2. Continuous data-measurable observations. -decimals or fractions obtained through measuring(e.g. bank deposits, volume of liquid etc.)

Qualitative data is a categorical measurement expressed not in terms of numbers, but rather by means of a natural language description. In statistics, it is often used interchangeably with "categorical" data.

For example:

favorite color = "yellow"

height = "tall"

Although we may have categories, the categories may have a structure to them. When there is not a natural ordering of the categories, we call these nominal categories. Examples might be gender, race, religion, or sport.

When the categories may be ordered, these are called ordinal variables. Categorical variables that judge size (small, medium, large, etc.) are ordinal variables. Attitudes (strongly disagree,disagree, neutral, agree, strongly agree) are also ordinal variables, however we may not know which value is the best or worst of these issues. Note that the distance between these categories isnot something we can measure.

QUANTITATIVE DATA Quantitative data is a numerical measurement expressed not by means of a natural language description, but rather in terms of numbers. However, not all numbers are continuous and measurable. For example, the social security number is a number, but not something that one can add or subtract. For example:

favorite color = "450 nm

Page 10: Statistics

"height = "1.8 m"

Quantitative data always are associated with a scale measure. Probably the most common scale type is the ratio-scale. Observations of this type are on a scale that has a meaningful zero value but also have an equidistant measure (i.e., the difference between 10 and 20 is the same as the difference between 100 and 110). For example, a 10 year-old girl is twice as old as a 5 year-old girl. Since you can measure zero years, time is a ratio-scale variable. Money is another common ratio-scale quantitative measure. Observations that you count are usually ratio-scale (e.g., number of widgets).

A more general quantitative measure is the interval scale. Interval scales also have a equidistant measure. However, the doubling principle breaks down in this scale. A temperature of 50degrees Celsius is not "half as hot" as a temperature of 100, but a difference of 10 degrees indicates the same difference in temperature anywhere along the scale. The Kelvin temperature scale, however, constitutes a ratio scale because on the Kelvin scale zero indicates absolute zero in temperature, the complete absence of heat. So one can say, for example, that 200 degrees Kelvin is twice as hot as 100 degrees Kelvin.

PRIMARY DATA Primary data means original data that has been collected specially for the purpose in mind. It means when an authorized organization, investigator or an enumerator collects the data for the first time from the original source. Data collected this way is called primary data.

SECONDARY DATA Secondary data is data that has been collected for another purpose. When we use Statistical Method with Primary Data from another purpose for our purpose we refer to it as Secondary Data. It means that one purpose's Primary Data is another purpose's Secondary Data. Secondary data is data that is being reused. Usually in a different context.

Q3. Find the (i) arithmetic mean and (ii) range of the following data: 15, 17, 22, 21, 19, 26, 20.

Ans: i)The arithmetic mean is given by:

=15+17+22+21+19+26+20/7

=140/2

=20

ii) Range = highest number- lowest number/2

= 58/2

=29

Page 11: Statistics

Q4. Suppose two houses in a thousand catch fire in a year and there are 2000 houses in a village. What is the probability that: (i) none of the houses catch fire and (ii) At least one house catch fire?

Ans:

 Given the probability of a house catching fire is:

and 

Therefore, the required probabilities are calculated as follows:             i.        The probability that none catches fire is given by:

Therefore, the probability that none of the houses catches fire is 0.01832.           ii.        The probability that at least one catches fire is given by:

Therefore, the probability that at least one house catches fire is 0.98168.          

Q5. (i) What are the characteristics of Chi-square test?

(ii) The data given in the below table shows the production in three shifts and the number of defective goods that turned out in three weeks. Test at 5% level of significance whether the weeks and shifts are independent.

Shift 1st week 2nd week 3rd week TotalI 15 5 20 40II 20 10 20 50III 25 15 20 60Total 60 30 60 150

Ans:(i) It is not symmetric.

1. The shape of the chi-square distribution depends upon the degrees of freedom, just like Student’s t-distribution.

Page 12: Statistics

2. As the number of degrees of freedom increases, the chi-square distribution becomes more symmetric as is illustrated in Figure 1.

3. The values are non-negative. That is, the values of are greater than or equal to 04. This is not a test, but a distribution. The Chi-square distribution, is derived from the

Normal distribution. It is the distribution of a sum of squared Normal distributed variables. That is, if all Xi are independent and all have an identical, standard Normal distribution then X^2 = X1*X1 + X2*X2 + X3*X3 + ... +Xv*Xv is Chi-square distributed with v degrees of freedom with mean = v and variance = 2*v. The importance of the Chi-square distribution stems from the fact that it describes the distribution of the Variance of a sample taken from a Normal distributed population.

5. Chi-square is non-negative. Is the ratio of two non-negative values, therefore must be non-negative itself .

6. There are many different chi-square distributions, one for each degree of freedom.7. The degrees of freedom when working with a single population variance is n-1.

since the chi-square distribution isn't symmetric, the method for looking up left-tail values is different from the method for looking up right tail values.

1. Area to the right - just use the area given.2. Area to the left - the table requires the area to the right, so subtract the given area from

one and look this area up in the table.Area in both tails - divide the area by two. Look up this area for the right critical value and one minus this area for the left critical value.

Q6. Find Karl Pearson’s correlation co-efficient for the data given in the below table:

X 20 16 12 8 4y 22 14 4 12 8

Ans:

X Y X^2 Y^2 XY20 22 400 484 44016 14 256 196 22412 4 144 16 488 12 64 144 964 8 16 64 32∑X=60 ∑Y=60 ∑X^2=880 ∑Y^2=904 ∑XY=840


Top Related