market research mb mk 02 - mba - iii sem, uptu syllabus

Marketing ResearchMB MK – 0 2

Syllabus of Unit I1. Marketing Research –

1. Definition,

2. Scope,

3. Significance,

4. Limitations,

5. Obstacles in acceptance.

2. Ethics in marketing research.

3. Marketing Intelligence system

4. Research process

5. Management dilemma (problem) – 1. Decision problem

2. Research problem

6. Hypothesis statement – 1. characteristics of a good hypothesis

7. Drafting the research proposal.

8. Various sources of market Information – 1. Methods of collecting Market Information

2. Secondary data – sources – problems of fit and accuracy.

9. Syndicated services.

04/10/2023 Kartikeya Singh 2

Unit 1. Marketing Research

a). Definition

• “Marketing research is the systematic gathering, recording and analyzing of data about problems relating to the marketing of goods and services.”

• “Market research will give you the data you need to identify and reach your target market at a price customers are willing to pay.”


b).Scope

• The scope of Marketing Research could cover the business problems relating to the followings– Types of consumer that compromise present and

potential markets.– Buying habits and purchasing habits– Size and location of different markets, not only in Indian

but also overseas– New mantras for emerging markets– Marketing and manufacturing capabilities of competitors– Most Suitable entry timing– Optimum use of Promotion tools.– Chances of improvement in current channels– Pricing Strategy.


b). Scope

• Market Research• Product Research• Sales related Research• Packaging Research• Advertising Research• Business Economic Research• Promotion Research• Distribution Research• Consumer Research• Pricing Research


c).Significance

• A manager takes decisions• His responsibility is to reduce risk of

failure in decision making• Risk arises due to lack of relevant

information• A manager always seeks information to

improve quality of decision making• Information can be collected through MR• Hence, MR is an important tool for

managerial decision making


d).Disadvantages/Limitations

• Disadvantages of Market Research– Information only as good as the

methodology used– Can be inaccurate or unreliable– Results may not be what the business wants

to hear!– May stifle initiative and ‘gut feeling’– Always a problem that we may never know

enough to be sure!


e).Obstacles in Acceptance

A} A narrow conception of Marketing Research.

B} Uneven Caliber of Marketing Researchers

C} Late and Occasionally Erroneous Findings by Marketing Research

D} Personality and presentation Differences


2. Ethics in Marketing Research

• Relating to Respondents

• Relating to Clients• Relating to Research

Firms• Relating to Research

Professionals


3.Marketing Intelligence System

• “A Market Intelligence System (MkIS) is one that systematically gathers and processes critical business information, transforming it into actionable Management intelligence for marketing decisions”.


3.Marketing Intelligence System

• Market and customer orientation• Identification of new opportunities • Early warning of competitor moves • Minimizing investment risk • Better customer interaction.• Better market selection & positioning.• Quicker, more efficient and cost effective

information


4.Research Process1. Define the Problem

2. Develop an Approach to the Problem• Type of Study? Exploratory, Descriptive, Causal?• Mgmt & Research Questions, Hypotheses

3. Formulate a Research Design• Methodology• Questionnaire Design

4. Fieldwork/Data collection.

5. Prepare & Analyze the Data

6. Prepare & Present the Report04/10/2023 Kartikeya Singh 13

4. Research Process-Simplified

1. Identifying and Defining the Research Problem

2. Conducting Survey

3. Formulating hypothesis

4. Creating Research Design

5. Determining the data Need

6. Sample Selection

7. Designing Questionnaire

8. Selection and Training of Field Staff

9. Collection of Data

10. Data Processing

11. Data Analysis and Interpretation

12. Preparation of Research Report

13. Follow up04/10/2023 Kartikeya Singh 14

5. Management Dilemmaa). Decision Problem

• Research Problem: Research problem must contain the followingi. An Individual or an organization which has the

problem

ii. Some objective/goal to be attained.

iii. Research Should have some doubts regarding the selection of possible alternatives.

iv. They must occupy some environment/condition to which the difficulty pertains

v. There should be some alternative course of action through which the objectives can be attained.


6.(b) Research Problem

• Research problem enables the researchers to be on the right path, whereas an ill-defined problem may create problems. In real sense, formulation of a problem is often more essential than its solution.


7.Hypothesis Statement

The word hypothesis is a compound of two words ‘hypo’ and ‘thesis’. Hypo means under or below and thesis means a reasoned theory or a viewpoint.

• “A hypothesis is an attempt at explanation: a provisional supposition made in order to explain scientifically some fact or phenomenon.” - Coffey

• “Hypothesis is a summary which is temporary and imaginary related to subject of study.” - George Caswell

• “Hypothesis is a proposition which can be put to test to determine its validity”- Good and Hatt


7.(a)Characteristics of Good Hypothesis

• Guidance• Clarity• Not in Exaggerative

Language• Temporary Solution• Connectivity with

Main problem• Specialization• Scientific and

Meaningful• Related with Theories


7.Research Proposal

• A research proposal is a document written by a Researcher that describes in details the program for a proposed scientific investigation..

• A research proposal is a document written by a researcher that provides a detailed description of the proposed program. It is like an outline of the entire research process that gives a reader a summary of the information discussed in a project.


7. Research ProposalThe Issue

What problem the researcher address?

BenefitWhat will the

research contribute to the existing knowledge?

Research Design

How will the research achieve its shared objective?


7.Drafting a Research Proposal

1. Brief paragraph – what the research is about.

2. Background to the topic

3. Why this research is import, necessary, and what is new about it.

4. Detail what the research about – Aims and objectives of the research.

5. If we have research design then give a glimpse of it.(not necessary)

6. Methodology – How you are going to carry this research.

7. Limitations if any.04/10/2023 Kartikeya Singh 21

8. Various Sources of Market Information

• Importance of Market Information.– Anticipation of consumer demand– Complexity of Marketing– Significance of Economic Indicators– Significance of Competition– Development of Science and Technology– Consumerism– Marketing Planning– Information explosion.


8.Various Sources of Market Information

• Marketing Information:- “Marketing research is the function which links the consumer, customer and public to the marketer through information.”

• Data – “Recorded experience that is useful for decision making.”

• Characteristics:– Accurate– Current– Sufficient– Available– Relevant


8.Various Sources of Market Information.a) Methods of collecting Market Information

• Questionnaire Method

• In this method, the responded is questioned directly about his attitudes, opinion etc.

• Observation Method• In this method the

responded is simply observed and his actions are recorded. This is done by using mechanical devices or by physically watching them.



Questionnaire Method:-• Questionnaire is simply a formalized schedule to

obtain and record specific and relevant information with accuracy.

• Questionnaire has five functions to perform– Give the respondent clear comprehension of the questions.– Assurance of confidentiality.– Stimulate responses through introspection, using memory or

reference to records.– Give instructions on what is required and the way it would be

responded.– Identify what needs to be known to classify and verify the

interview.



Questionnaire Method:-

• Eight steps in designing a questionnaire:-

1. Determine the specific data to be collected

2. Determine Interview Process

3. Then evaluation of the questionnaire content

4. Decide on Question Contenti. Open ended

ii. Closed endeda) Dichotomous

b) Ranking

c) Checklist

d) Multiple Choice Questions

e) Scales

Cont…….04/10/2023 Kartikeya Singh 26


• Questionnaire Method:-Contd….• Eight steps in designing a questionnaire:-

5. Determine the wordings of the questions.I. Use of simple language.

II. Use familiar words

III. Avoid using lengthy questions

IV. Be as specific as possible.

6. Questionnaire structure.—Sequence of questions.

7. Determine the physical characteristics of the forum.

8. Pretest, Revision and Final Draft.



Advantage:-a. The questionnaire method has

the capacity to address or deal with all types/aspects of research problems.

b. This method can maintain confidentiality of answers and hence respondents can freely express themselves.

c. The processing of data can be fast if the questionnaire is well structured.

d. It is less time consuming and less expensive than observation method.

e. It is very structured and thus leaves little room for manipulation or incorrect recording by the interviewer or respondent.

Disadvantage:- a. A saturation level has been reached

and now the respondent is not

willing to fill up questionnaire and

hardly manages to spare time.

b. If a questionnaire is not well-

designed, it generates incomplete

information.

c. Interviewers who are not well trained

can spoil a good questionnaire.

d. The respondent may not fill a

questionnaire property if he has to

tax his memory too much.

e. Lack of Time.

f. Lack of Interest.


Questionnaire Method:-Contd….


Observation Method:-• Another method used for gathering research data is by

observing a respondents over behavior. Observation is used to obtain information on both past and present behavior of people.

• Observation may be used either solely or in conjunction with some other method.

• The potential components in this form must be evaluated on the basis of four criteria to determine what exactly is to be observed:

I. Who should be observed?

II. What should be observed?

III. When should the observation take place?

IV. What should be the expected path?

Equipments used for it:- eye camera, pupil metric camera, Psycho galvanometer, video camera, cctv etc.



Advantage:-I. It is objective and accurate.

II. It eliminates the subjective element faced in questionnaires

III. The willingness of the respondents does not matter as the respondent is not aware.

IV. It is very useful in case of respondents where there is difficulty to communicate.

Disadvantage:-I. The action observed may not

necessary be the one in actual normal circumstances.

II. It is very expensive and time consuming to set up and undertake observation studies.

III. It can’t yield information on state of mind, motives etc.

IV. The observer must be properly selected and trained as the data collection depends upon skill of the observer.

V. Time constraint.

VI. Confidentiality.


Survey Method:-

8.Various Sources of Market Information.b) Secondary data – sources.

a) Secondary Data:- “Data collected by someone else for purpose other than solving the problem being investigated". Secondary data can be collected through

I. Internal and

II. external sources.a) Government Sources

b) Business References

c) Commercial Research Agencies


8.Various Sources of Market Information.c) Problems of fit and accuracy.

• It is not enough to know what was the purpose behind the data collection, it is also necessary to know how the data was collected.

• Secondary data suffers from a major limitation of obsolescence. The utility of secondary data diminishes with time.

• Secondary data may be available but always be relevant is not necessary.

• The Classification bases used in the secondary data often do not coincide with those of the present study.

• Locating appropriate sources of secondary data is a time consuming affair.

• One can not be always sure of the accuracy of secondary data.


9. Syndicated services.• Syndicate services may be regarded as an

‘intermediate’ source falling between primary and secondary sources of data. Syndicated services are normally designated to suit the requirements of many individual firms. Such services are particularly useful in the spheres of T.V. viewing, magazine readership and consumer goods/movement through retail outlet.

• Syndicate services are provided by certain organizations, which collect and tabulate marketing information on a continuous basis. Organizations providing syndicated services may also engage themselves in other types of research work for their clients. However, such organizations usually confine themselves to this activity alone.


Assignment - 1

1. Define Market Research. State its significance and Limitations.

2. What do you understand by the term ethics in Market Research?

3. Elaborate Research Process in detail with suitable example.

4. How hypothesis is different from Research Proposal.

5. What are the various sources of information. Discuss it in detail.

• Date of Submission– 27th September,2013


Case Study-JD sports

• CASE STUDY-MARKET RESEARCH-JD SPORTS.pdf


End of Unit I


Unit II


Syllabus of Unit II1. Marketing research techniques:

2. Market development research: I. Cool hunting – socio cultural trends,

II. Demand estimation research,

III. Test marketing,

IV. Segmentation Research - Cluster analysis,

V. Discriminant analysis.

3. Sales forecasting – I. objective and

II. subjective methods

4. Marketing Mix Research: I. Concept testing,

II. Brand Equity Research,

III. Brand name testing,

5. Commercial eye tracking :I. package designs,

II. Conjoint analysis,

III. Multidimensional scaling

IV. positioning research,

V. Pricing Research,

VI. Shop and retail audits,

6. Advertising Research I. Copy Testing,

II. Readership surveys and viewer ship surveys,

III. Ad tracking,

IV. Viral marketing research.


1. Research Techniques• Ad Tracking – periodic or continuous in-market research to monitor

a brand’s performance using measures such as brand awareness, brand preference, and product usage. (Young, 2005)

• Advertising Research – used to predict copy testing or track the efficacy of advertisements for any medium, measured by the ad’s ability to get attention (measured with Attention Tracking), communicate the message, build the brand’s image, and motivate the consumer to purchase the product or service. (Young, 2005)

• Brand equity research — how favorably do consumers view the brand? • Brand association research — what do consumers associate with the

brand? • Brand attribute research — what are the key traits that describe the

brand promise? • Brand name testing - what do consumers feel about the names of the

products? • Commercial eye tracking research — examine advertisements, package

designs, websites, etc. by analyzing visual behavior of the consumer Concept testing - to test the acceptance of a concept by target consumers


1. Research techniques• Cool hunting - to make observations and predictions in changes of new or existing

cultural trends in areas such as fashion, music, films, television, youth culture and lifestyle

• Buyer decision making process research — to determine what motivates people to buy and what decision-making process they use; over the last decade,

• Neuro marketing emerged from the convergence of neuroscience and marketing, aiming to understand consumer decision making process

• Copy testing – predicts in-market performance of an ad before it airs by analyzing audience levels of attention, brand linkage, motivation, entertainment, and communication, as well as breaking down the ad’s flow of attention and flow of emotion.

• Customer satisfaction research - quantitative or qualitative studies that yields an understanding of a customer's satisfaction with a transaction

• Demand estimation — to determine the approximate level of demand for the product

• Distribution channel audits — to assess distributors’ and retailers’ attitudes toward a product, brand, or company

• Internet strategic intelligence — searching for customer opinions in the Internet: chats, forums, web pages, blogs... where people express freely about their experiences with products, becoming strong opinion formers.


1. Research Techniques• Marketing effectiveness and analytics — Building models and

measuring results to determine the effectiveness of individual marketing activities.

• Mystery consumer or mystery shopping - An employee or representative of the market research firm anonymously contacts a salesperson and indicates he or she is shopping for a product. The shopper then records the entire experience. This method is often used for quality control or for researching competitors' products.

• Positioning research — how does the target market see the brand relative to competitors? - what does the brand stand for?

• Price elasticity testing — to determine how sensitive customers are to price changes

• Sales forecasting — to determine the expected level of sales given the level of demand. With respect to other factors like Advertising expenditure, sales promotion etc.


1. Research Techniques• Segmentation research - to determine

the demographic, psychographic, and behavioral characteristics of potential buyers

• Online panel - a group of individual who accepted to respond to marketing research

• Online Store audit — to measure the sales of a product or product line at a statistically selected store sample in order to determine market share, or to determine whether a retail store provides adequate service

• Test marketing — a small-scale product launch used to determine the likely acceptance of the product when it is introduced into a wider market

• Viral Marketing Research - refers to marketing research designed to estimate the probability that specific communications will be transmitted throughout an individual's Social Network. Estimates of Social Networking Potential (SNP) are combined with estimates of selling effectiveness to estimate ROI on specific combinations of messages and media.


2.(a)Cool Hunting.

• The practice of observing current trends and predicting where the youth demographic will shift in trends in the immediate future.

• A term coined in the 90’s referring to marketing firms who looked to design and develop the newest trends.

• The marketing firms then sell these ideas to retail establishments who uses these idea to earn more profits.


2.(a)Cool Hunting.• The “hot new designs” influence…• Art (ex. Magic Poster, Window

Pictures, Wall Paint Colors)• Retail Merchandise (ex. Tights,

Baggy pants, Khaki’s, Knee Length Socks, Not Socks)

• Music (ex. Reggae, Punk, Techno, European, Alternative)

• Shoes (ex. Knee length boots, Low Rise Sneakers, Design your own shoes)

• Gaming (ex. Puzzle Solving games, War Rally Games, Real Life Solutions Gaming)

• Travel (Ex. Costa Rica, Thailand, Backpacking) 04/10/2023 Kartikeya Singh 44

2.(a)Cool Hunting.

Alpha Consumer: A term used by marketers to define the “cool people” setting trends within their peer group. Usually the alpha consumer is setting this trend a year before it is mainstreamed.

Urban Pioneers: People who are established in music, fashion, film, marketing, and advertising.


2.(b)Demand estimation research

• The decision-making task has become difficult and extremely important

• The need of the hour for a manager is to know the behavior of the market related variables, their interrelationship and future movement

• Demand estimation attempts to quantify the links between the level of demand for a product and the variables which determines it whereas demand forecasting simply attempts to predict the level of sales at some particular future date


2.(b)Demand estimation research- Methods of Demand Estimation

Demand Estimation

Qualitative Methods

Consumer Survey,Market Experiment

Quantitative Methods

Statistical MethodModel specificationStatistical Models



Qualitative Method

• Consumer Survey.• Firms can obtain

information regarding their demand functions by using interviews and questionnaires, asking questions about buying habits, motives and intentions.

• These can be quick on-the-street interviews, or in-depth ones.



Qualitative Method

Advantage

• They give up-to-date information reflecting the current business environment.

• Much useful information can be obtained that would be difficult to uncovering other ways;

• Firms can also establish product characteristics that are important to the buyer,

Disadvantage

• Validity: Consumers often find it difficult to answer hypothetical questions, and sometimes they will deliberately mislead the interviewer to give the answer they think the interviewer wants.

• Reliability: It is difficult to collect precise quantitative data by such means.

• Sample bias: Those responding to questions may not be typical consumers.



Qualitative Method

• Market experiments:• Laboratory experiments or consumer clinics seek to

test consumer reactions to changes invariables in the demand function in a controlled environment.

• Consumers are normally given small amounts of money and allowed to choose how to spend this on different goods at prices that are varied by the investigator.

• However, such experiments have to be set up very carefully to obtain valid and reliable results; the knowledge of being in an artificial environment can affect consumer behavior.


2.(b) Demand estimation research- Methods of Demand Estimation

Qualitative Method

Advantage• Gives direct feed back

about customer interest.• Customers are able to act

in stimulated atmosphere so their interest level can be known immediately

• Direct observation of the consumers takes place rather than something of a hypothetical theoretical model .

Disadvantage• There is less control in

this case• The number of variations

are more• Experiments may have to

be long-lasting



Qualitative Method

Model specification• In order to understand this we must first distinguish a

statistical relationship from a deterministic relationship. The latter are relationships known with certainty, for example the relationship among revenue, price and quantity:

• R=P*Q; if P and Q are known R can be determined exactly. • Statistical relationships are much more common in

economics and involve an element of uncertainty. The deterministic relationship is considered first.



Qualitative Method

Mathematical models:• It is assumed to begin with

that the relationship is deterministic. With a simple demand curve the relationship would therefore be:

• Q=f (P)• If we are also interested in

how sales are affected by the past price, the model might in general become:

• Qt=f (Pt, Pt-1)04/10/2023 Kartikeya Singh 53


Qualitative Method

In practice we can very rarely specify an economic relationship exactly. Models by their nature involve simplifications; in the demand situation we cannot hope to include all the relevant variables on the right hand side of the equation, for a number of reasons:1. We may not know from a theoretical viewpoint what variables are relevant in affecting the demand for a particular product.2. The information may not be available, or impossible to obtain. An example might be the marketing expenditures of rival firms.3. It may be too costly to obtain the relevant information. For example, it might be possible to obtain information relating to the income of customers, but it would take too much time (and may not be reliable).


• Statistical models


Qualitative Method

Statistical Method• In a perfect relationship the points would exactly fit a

straight line, or some other regular curve. We therefore have to specify the relationship in statistical terms, using a residual term to allow for the influence of omitted variables. This is shown for the linear form as follows:

• Qi=a +bPi +di• where di represents a residual term. Thus, even if P is

known, we cannot predict Q with complete accuracy because we do not know for any observation what the size or direction of the residual will be.


2.(c) Test marketing

• Test marketing is a research technique which is used when the proposed product and the marketing programme for the same is tried out for the first time with a small sample size in the potential market.

• Test marketing is defined as “A controlled experiment done in a limited but carefully selected part of the market place, whose aim is to predict the sales or profit consequences in absolute or relative terms of one or more proposed marketing actions”



2.(c) Test marketing

• Features– It helps to get information and experience with the marketing

programme before finalizing the plans and making a total commitment to it.

– It helps to predict the programmes outcome when it is applied to the total market.

– It is costly/Expensive.– It is time consuming.– It allows the competitors to view your new product or your test

marketing mix.– The test market should be large enough to provide meaningful

results and it should be demographically represent the actual population

2.(c) Test marketingMethods of Test Marketing

• Consumer Goods Test Marketing:– Purchase frequency– Trial purchase– Repeat purchase

• Sales Wave Research:– Offered free for trial and then they charge for it. They repeat the

process 3-4 times.– When customer makes the choice to purchase– How they get an advantage over their competitor.

• Simulated Test Marketing:– Simulation is an imitation of a real world situation.– Advertisement shown>Money provided>store purchase behavior

noticed>compared with the competitor.• Controlled Test Marketing:

– Shelf position, display method, point of purchase, pricing etc.


2.(d) Segmentation Research – Cluster Analysis

Cluster analysis is used to classify persons or objects into a smaller number of mutually exclusive and exhaustive groups. There should be high internal (within cluster) homogeneity and high external(between cluster) heterogeneity, cluster analysis has been increasingly used in marketing research due to its utility in resolving the problem of classifying the consumers, products etc.


2.(d) Segmentation Research – Cluster Analysis


2.(e). Discriminant analysis

• A discriminant analysis enables the researcher to classify the person or objects into two or more categories.

• For Example, consumers may be classified as heavy and light users.

• With the help of such techniques, it is possible to predict the categories or classes which are mutually exclusive in which individuals are likely to be included. In recent years, discreminant analysis has been used by the marketing researchers.

• Identifying new product buyers, determining brand loyalty among customers etc.


3. Sales forecasting – a). Objective and Subjective methods

• Sales analysis:- Sales analysis enables a company to identify the areas where its sales performance has been good or mediocre, customers who have bought I bulk, products with high and low sales volume etc.

• A systematic, comprehensive and periodical sales analysis will be helpful to a company to reinforce its sales effort where it is most needed. In this way, it can achieve the best possible results.



• Sales analysis by Territory• Sales analysis by Product• Sales analysis by

Customer• Sales analysis by Order



• The concept of Market Potential.– “Market Potential has been defined as “the

maximum demand response possible for a given group of customers within a well-defined geographic area for a given product or service over a specified period of time under well-defined competitive and environmental conditions”



• Methods of Estimating Current Demand:• “Total market potential is the maximum amount of

sales that might be available to all the firms in an industry during a given period under a given level of industry marketing effort and given environment conditions”

• Symbolically, total marketing potential is • Q = nxqxp

– Q = total market Potential– n = number of buyers in the specific product/market under

the given assumptions– q = quantity purchased by an average buyer– p = price of an average unit.



Process of predicting a future event based on historical data

Educated Guessing

Underlying basis of all business decisions Production Inventory Personnel Facilities


• What is Sales forecasting

• Importance• Forecasting

Process


• Predict the next number in the pattern:

a) 3.7, 3.7, 3.7, 3.7, 3.7,

b) 2.5, 4.5, 6.5, 8.5, 10.5,

c) 5.0, 7.5, 6.0, 4.5, 7.0, 9.5, 8.0, 6.5,



Process


Importance: a) It keeps any firm ready for any contingency to happen.

b) It is tool to help budgeting for the entire firm

c) Forecasting is quite necessary for planning for uncertain future in different areas of the economy.

d) For effective planning by providing a scientific and reliable basis for anticipating future operations such as production, inventory, supply of capital and so on.

e) For reducing the area of uncertainty that surrounds management decision-making with respect to cost, production, profits, pricing, etc.

f) Making and reviewing on a continuous basis will compel the managers to think ahead and to search for the best possible decisions with a dynamic approach.

g) For efficient managerial control as Forecast of sales a must in order to control the costs of production and the productivity of personnel.




Process

3. Sales forecasting – a). Objective and Subjective methods.




Process


• Short-range forecast – Usually < 3 months

• Job scheduling, worker assignments

• Medium-range forecast– 3 months to 2 years

• Sales/production planning

• Long-range forecast– > 2 years

• New product planning

Designof system

Detailed use ofsystem

Quantitativemethods

QualitativeMethods


Introduction Growth Maturity Decline

Sales

Time

Quantitative models

- Time series analysis- Regression analysis

Qualitative models- Executive judgment- Market research- Survey of sales force- Delphi method


Methods of Forecasting

Subjective or Qualitative

Field Sales Force

Jury of Executives

Users Expectations

The Delphi Method

Objective or Quantitative

Causal


Subjective Methods:• An important advantage of subjective

methods is that they are easily understood. • Another advantage is that the cost involved

in forecasting is quite low.• One major limitation is the varying

perceptions of people involved in forecasting. As a result, wide variance is found in forecasts.

• Subjective methods may be more suitable in case of highly technical products which have a limited number of customers.


3. Sales forecasting – a). Subjective methods.

Subjective Methoda) Field Sales

Forceb) Jury of

Executivesc) Users

Expectationsd) The Delphi

Method


Field Sales Force• Some companies ask their salesman to indicate the most likely

sales for a specified period in the future. • Usually the salesman is asked to indicate anticipated sales for

each account in his territory. These forecasts are checked by district managers who forward them to the company’s head office. Different territory forecasts are then combined into a composite forecast at the head office. This method is more suitable when a short-term forecast is to be made as there would be no major changes in this short period affecting the forecast.

• Advantage– Sales force are directly involved so we get a direct feedback

• Disadvantage– Sales force would not take an overall or broad perspective– Sales force may give somewhat low figure.



Forceb) Jury of

Executivesc) Users


Method


Jury of Executives:• Some companies prefer to assign the task of sales forecasting

to executives instead of a sales force. Given this task each executive make his forecast for the next period. Since each has his own assessment of the environment and other relevant factors, one forecast is likely to be different from the other.

• To narrow down the differences in the forecasts, sometimes discussion between the executives is organized so that they can arrive at a common forecast. In case this is not possible, the chief executive may have to decide which of these forecasts is acceptable as a representative one.

• Advantage:– It includes large base of executive to come on final consensus.

• Disadvantage:– Opinion may be influenced by current market conditions.



Forceb) Jury of

Executivesc) Users


Method


Users Expectations• Forecast can be based on users expectations

or intentions to purchase goods and services. • It is difficult to use this method when the

number of users is large. • Another limitations of this method is that though

it indicates users intentions to buy, the actual purchases may be far less at a subsequent period.

• It is most suitable when the number of buyers is small such as in case of industrial products.



Forceb) Jury of

Executivesc) Users


Method


The Delphi Method:• This method is based on the expert opinions. Here, each

expert has access to the same information that is available. A feedback system generally keeps them informed of each others forecasts but no majority opinion is disclosed to them. However, the experts are not brought together. This is to ensure that one or more vocal experts do not dominate other experts.

• The experts are given an opportunity to compare their own previous forecasts with those of the others and revise them. After three or four rounds, the group of experts arrives at a final forecast.

• The method may involve a large number of experts and this may delay the forecast considerably. Generally it involves a small number of participation



Forceb) Jury of

Executivesc) Users


Method

3. Sales forecasting – b).Objective methods.

• Quantitative or Objective Method1. Causal or Explanatory Methods

Causal or explanatory methods are regarded as the most sophisticated methods of forecasting. These methods yield realistic forecasts provided relevant data are available on the major variables influencing changes in sales. There are three distinct advantages of causal methods.

First, turning points in sales can be predicted more accurately by these methods than by time-series methods.

Second the use of these methods reduces the magnitude of the random component far more than it may be possible with the time series methods.

Third, the use of such methods provides greater insight into causal relationships

This facilitates the management in decision making.04/10/2023 Kartikeya Singh 78

Quantitative or Objective Method1. Causal or Explanatory

Methodsa) Leading Indicatorsb) Regression

Modelsc) Input-Output

Analysisd) Econometric

Models2. Time Series

a) Free handb) Trend Projectionc) Exponential

Smoothingd) Autoregressive

Modele) Box Jenkins Model


a) Leading Indicators:• Sometimes one finds that changes in sales of a particular

product or service are preceded by changes in one or more leading indicators. In such cases, it is necessary to identify leading indicators and to closely observe changes in them.

• One example of leading indicators is the demand for various household appliances which follows the construction of new houses.

• Likewise, the demand for many durables is preceded by an increase in disposable income.

• Yet another example is of number of births. The demand for baby food and other goods for infants can be ascertained by the number of births in territory. It may be possible to include leading indicators in regression models.



Methodsa) Leading

Indicatorsb) Regression







3. Sales forecasting – a).Subjective methods.

b). Regression Models:• Linear regression analysis is perhaps the most

frequently used and the most powerful method among causal methods.

• Regression models indicate linear relationships within the range of observations and at the times when they were made.

• Sometimes there may be a lagged relationship between the dependent and independent variables.

• It may happen that the data required to establish the ideal relationship, do not exist or are inaccessible or, if available, are not useful.

• Finally, regression model reflects the association among variables.











Input and Output Method:• The analyst takes into consideration a large number of factors,

which affect the outputs he is trying to forecast. For this purpose, input- out put table is prepared where the inputs are shown horizontally trying to forecast.

• For this purpose, input-output table is prepared where the inputs are shown horizontally as the column headings and the outputs vertically as the stubs. It may be mentioned that by themselves input-output flows are of little direct use to the analyst.

• The use of input-output analysis in sales forecasting is appropriate for products sold to governmental, institutional and industrial markets as they have distinct patterns of usage. It is seldom used for consumer products and services.

• Major constraint in the use of this method is that it needs extensive data for a large number of items which may not easily available.











d). Econometric Models:• Econometric is concerned with the use of statistical and

mathematical techniques to verify hypothesis emerging in economic theory. An econometric model incorporates functional relationships estimated by these techniques into an internally consistent and logically self-contained framework. The use of econometric models is generally found at the macro level such as forecasting national income and its components.

• Such models show how the economy or any specific segment operates. As compared to an ordinary regression equation, they bring out the causalities involved more distinctly. This merit of econometric models enables them to predict turning points more accurately. However, their use at the micro level for forecasting has so far been extremely limited.











• Time Series: Values taken by a variable over time (such as daily sales revenue, weekly orders, monthly overheads, yearly income) and tabulated or plotted as chronologically ordered numbers or data points.

• To yield valid statistical inferences, these values must be repeatedly measured, often over a four to five year period. Time series consist of four components:

I. Seasonal variations that repeat over a specific period such as a day, week, month, season, etc.,

II. Trend variations that move up or down in a reasonably predictable pattern,

III. Cyclical variations that correspond with business or economic 'boom-bust' cycles or follow their own peculiar cycles, and

IV. Random variations that do not fall under any of the above three classifications.











a). Freehand Method: One of the methods of getting a secular trend is the freehand method.

It may be mentioned that it is the simplest method of finding the trend line, which is simply extended for forecast.

It is highly subjective method as the trend line fitted to the same set of data will vary from one person to another as such it is the most inappropriate method to be used for forecasting











b). Trend Projection: The trend is forecast simply by substituting the appropriate value t(i.e. the year for which the forecast is desired) in the least squares line.

In case the data are monthly or quarterly, this value is to be multiplied by the seasonal index.

Finally we measure the cyclical component and try to ascertain what it is likely to be at the point for which forecast is being made.











c). Exponential Smoothing: When a large number of forecasts are to be made for a number of items, exponential smoothing is particularly suitable as it combines the advantages of simplicity of computation and flexibility. It may be used for short-term forecasts(One period into the future) particularly when there is no long-term trend in a time series data or when the trend is not clear.

This method uses differential weights to time-series data. The heaviest weight is assigned to the most recent data and the least weight to the most remote data in the time series. It is a type of moving average that “smooth's” the time series of its sharp variations.











• Exponential Smoothing:

The formula used for exponential smoothing is based on three terms:

i) The present observed value of the time series Y

ii) The previous computed exponentially smoothed value Ei-1

iii) A subjectively assigned weighting factor or smoothing coefficient W.

Thus, the formula is

Ei=WYi+(1-W)Ei-1

Ei = value of the exponentially smoothed series being computed in time period i.

Ei-1 = value of the exponentially smoothed series computed in the preceding time period i-1

Yi = observed value of the time series in period i.

W = subjectivity assigned weight whose value is between 0 and 1.











Sales data of a firm for the year 1995 to 2000 are given below

1995 15

1996 24

1997 15

1998 20

1999 22

2000 28










Exponentially Smoothed Values of Sales of a Business Firm

Year Sales (milliion rs) W=0.5 W=0.3

1995 15 15 15

1996 24 19.50 17.70

1997 15 17.25 16.89

1998 20 18.63 17.82

1999 22 20.32 19.07

2000 28 24.16 21.75


• Autoregressive model:• Sometimes the values of a time

series data are highly correlated with the values that precede and succeed them. In such cases an auto regression model is used for forecasting.

• The first order auto regressive model may be expressed as

• Y^i=b0+b1Yi-104/10/2023 Kartikeya Singh 89










• Box-Jenkins Method: The analyst identifies a tentative model considering the nature of the past data. This tentative model and the data are entered in the computer. The box-Jenkins programme then gives the values of the parameters included in the model. A diagnostic check is then conducted to find out whether the model gives an adequate description of the data. If the model satisfies the analyst in this respect, then it is used to make the forecast.









Modele) Box Jenkins

Model

4.Marketing Mix Researcha). Concept Testing


4.Marketing Mix Researcha). Concept Testing

• Concept testing (or market testing) is the process of using quantitative methods and qualitative methods to evaluate consumer response to a product idea prior to the introduction of a product to the market.

• It can also be used to generate communication designed to alter consumer attitudes toward existing products.

• Such methods are commonly referred to as concept testing and have been performed using field surveys, personal interviews and focus groups, in combination with various quantitative methods, to generate and evaluate product concepts.


4.Marketing Mix Researchb). Brand Equity Research:

• A brand is a “name, term, sign, symbol, or design, or a combination of them intended to identify the goods and services of one seller or group of sellers and to differentiate them from those of competition.”

• :-American Marketing Association



• Brand equity is the added value that endowed to products and services. This value may be reflected in how consumers think, feel, and act with respect to the brand, as well as the prices, market share and profitability that the brand commands for the firm. Brand equity is an important intangible asset that has psychological and financial value to the firm.



• Brand equity research measures your brand value. We use leading edge brand equity research models and quantitative marketing research tools to tailor each client firm's research analysis study.

• Brand equity research studies support branding strategy programs

• Brand Base Research• Brand Qualitative Research• Brand Quantitative Research 04/10/2023 Kartikeya Singh 95

4.Marketing Mix Researchc).Brand Name Testing

• Develop Your Brand Strategy• Research the Market, Competitors, and

Consumers• Identify the Message Your Brand Should

Communicate • Brainstorm without Judging• Create a Short List• Trademark and Domain Name Availability Search• Create a Shorter Short List• Develop Brand Marketing Mock-ups• Test Your Brand Marketing Mock-ups• Roll out and Monitor Your Brand04/10/2023 Kartikeya Singh 96


• So how should names be researched?

• Here’s just a few thoughts and research companies may not respond well to this kind of heresy. Every one will have their own methodology and you will need to decide if it can adapt easily to names. However, these principles apply if you are talking to real people or their ‘avatar’.

• Think about what you are testing. This will help to keep research simple.

• Allow the audience to concentrate on the names. Don’t let them be distracted by elements which potentially cloud their judgement.

• Don’t waste your time producing

unnecessary stimulus. Instead, you should be testing the strength of prospective names before entering into design (unless of course, you have unlimited budget!).

• Don’t waste time on too many names. If you’ve done your job, you will already have narrowed the list to a manageable size – say, six words. If you can’t be decisive, use a group to screen out then use the subsequent ones to dig deeper.

• Listen out for consumers playing back to you the criteria you’ve used all the way through the development process – then you’ll know that you’ve asked the right questions.



• Put your thoughts into context (without getting caught up in the detail of design work). For example, set the scene with what your product is and does – not the price and pack size. You recruited these people because they’re your target, but they don’t have to like the product to tell you that the name isn’t right for it. You could also tell them about the personality of your brand, what its story is, because then they’ve got something to relate the names back to, not just a product or a usage occasion.

• Don’t let the respondents read the words until they have heard them. The consumer should react to the name, not just to words on a page. Get them to say them out loud – to test ease of pronunciation. (You won’t hear this in an on-line test, so how will you know if it works?).

• Remember the core idea? Which of the

names best fits the story?• Don’t worry about how many people like the

name – this is a brand, you want stronger emotions than ‘like’. You want and need people to sit up and take notice; a groan, a laugh. That doesn’t mean that they have to like it. And a name can be right for all the wrong reasons.

• Research is for your guidance and reassurance, not necessarily for cut-and-dried decisions.

• And finally, don’t believe them when they say they don’t like it. If you ask the right questions and prompt them the right way, you may find that the first name they heard – and hated – is actually the one they think works best for the brand! Brand names seep into our consciousness, they do not always bang us over the head. Give them time to fall in love.


4.Marketing Mix Researchd). Commercial Eye Tracking

• Determining what a user looks at. Using sophisticated equipment, eye tracking follows the eye movements of a person looking at any visual such as a printed ad, an application's user interface or a page on a Web site. It is used to analyze the usability and effectiveness of the layout.


4.Marketing Mix Researchd). Package Designs

I. Protect

II. Inform

III. Contain

IV. Transport

V. Preserve

VI. Display

I. Captures Attention

II. Offers First Impression

III. Provides Information

IV. Aids Purchasing

V. Addresses Needs in Global Markets

VI. Meets Legal Requirements


4.Marketing Mix Research

f).Conjoint analysis, • Technique that allows a

subset of the possible combinations of product features to be used to determine the relative importance of each feature in the purchase decision

• Conjoint Analysis is an advanced multivariate technique that helps to identify what value most in making decisions.


4.Marketing Mix Researchf).Conjoint analysis,

Attitudes towards dishwashing products

1.Clean: glass/dishes clean2.Shiny: glass/dishes shiny3.Smell: Non-perfumed/lemon

fresh/intensive lemon fresh4.Quantity: small/medium/x-

large5.Packaging: loose in box/tab in

plastic/tab in dissolving plastic

6.Design: single/multi-colored/multi-colored + ball

• Advantage:• Estimates psychological tradeoffs

that consumers make when evaluating several attributes together

• Ensures preferences at the individual level

• Uncovers real or hidden drivers which may not be apparent to the respondent themselves

• Realistic choice or shopping task • Able to use physical objects • If appropriately designed, the ability

to model interactions between attributes can be used to develop needs based segmentation


4.Marketing Mix Researchg).Multidimensional Analysis

• Multidimensional scaling (MDS) is a class of procedures for representing perceptions and preferences of respondents spatially by means of a visual display.

• Perceived or psychological relationships among stimuli are represented as geometric relationships among points in a multidimensional space.

• These geometric representations are often called spatial maps. The axes of the spatial map are assumed to denote the psychological bases or underlying dimensions respondents use to form perceptions and preferences for stimuli.


Statistics and Terms Associated with MDS

• Spatial map. Perceived relationships among brands or other stimuli are represented as geometric relationships among points in a multidimensional space called a spatial map.

• Coordinates. Coordinates indicate the positioning of a brand or a stimulus in a spatial map.

• Unfolding. The representation of both brands and respondents as points in the same space is referred to as unfolding.


Conducting Multidimensional Scaling

Formulate the Problem

Obtain Input Data

Decide on the Number of Dimensions

Select an MDS Procedure

Label the Dimensions and Interpret the Configuration

Assess Reliability and Validity


i). Formulate the Problem

• Specify the purpose for which the MDS results would be used.

• Select the brands or other stimuli to be included in the analysis. The number of brands or stimuli selected normally varies between 8 and 25.

• The choice of the number and specific brands or stimuli to be included should be based on the statement of the marketing research problem, theory, and the judgment of the researcher.


ii). Input Data for Multidimensional Scaling

Direct (Similarity Judgments)

Derived (Attribute Ratings)

MDS Input Data

Perceptions Preferences

• Perception Data: Direct Approaches. In direct approaches to gathering perception data, the respondents are asked to judge how similar or dissimilar the various brands or stimuli are, using their own criteria. These data are referred to as similarity judgments.

Very Very

Dissimilar Similar

Crest vs. Colgate 1 2 3 4 5 6 7

Aqua-Fresh vs. Crest 1 2 3 4 5 6 7

Crest vs. Aim 1 2 3 4 5 6 7

.

.

.

Colgate vs. Aqua-Fresh 1 2 3 4 5 6 7

• The number of pairs to be evaluated is n (n -1)/2, where n is the number of stimuli.


ii). Conducting Multidimensional Scaling Obtain Input Data

Similarity Rating Of Toothpaste BrandsAqua-Fresh Crest Colgate Aim Gleem Macleans Ultra Brite Close-Up Pepsodent Dentagard

Aqua-FreshCrest 5

Colgate 6 7Aim 4 6 6

Gleem 2 3 4 5Macleans 3 3 4 4 5Ultra Brite 2 2 2 3 5 5Close-Up 2 2 2 2 6 5 6

Pepsodent 2 2 2 2 6 6 7 6Dentagard 1 2 4 2 4 3 3 4 3

• Perception Data: Derived Approaches. Derived approaches to collecting perception data are attribute-based approaches requiring the respondents to rate the brands or stimuli on the identified attributes using semantic differential or Likert scales.

Whitens Does not

teeth ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ whiten teeth

Prevents tooth Does not prevent

decay ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ tooth decay

.

.

.

.

Pleasant Unpleasant

tasting ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ tasting

• If attribute ratings are obtained, a similarity measure is derived for each pair of brands.

Conducting Multidimensional ScalingObtain Input Data

A Spatial Map of Toothpaste Brands

0.5

-1.5

Dentagard

-1.0-2.0

0.0

2.0

0.0

Close Up

-0.5 1.0 1.5 0.5 2.0

-1.5

-1.0

-2.0

-0.5

1.5

1.0

Pepsodent

Ultrabrite

Macleans Aim

Crest

Colgate

Aqua- Fresh

Gleem

Using Attribute Vectors to Label Dimensions

0.5

-1.5

Dentagard

-1.0-2.0

0.0

2.0

0.0

Close Up

-0.5 1.0 1.5 0.5 2.0

-1.5

-1.0

-2.0

-0.5

1.5

1.0

Pepsodent

Ultrabrite

Macleans Aim

Crest

Colgate

Aqua- Fresh

Gleem Fights Cavities

Whitens Teeth

Cleans Stains

• Stimuli can be selectively eliminated from the input data and the solutions determined for the remaining stimuli.

• A random error term could be added to the input data. The resulting data are subjected to MDS analysis and the solutions compared.

• The input data could be collected at two different points in time and the test-retest reliability determined.

Conducting Multidimensional ScalingAssess Reliability and Validity

External Analysis of Preference Data

0.5

-1.5

Dentagard

-1.0-2.0

0.0

2.0

0.0

Close Up

-0.5 1.0 1.5 0.5 2.0

-1.5

-1.0

-2.0

-0.5

1.5

1.0

Pepsodent

Ultrabrite

Macleans Aim

Crest

Colgate

Aqua- Fresh

Gleem Ideal Point

Assumptions and Limitations of MDS

• It is assumed that the similarity of stimulus A to B is the same as the similarity of stimulus B to A.

• MDS assumes that the distance (similarity) between two stimuli is some function of their partial similarities on each of several perceptual dimensions.

• When a spatial map is obtained, it is assumed that interpoint distances are ratio scaled and that the axes of the map are multidimensional interval scaled.

• A limitation of MDS is that dimension interpretation relating physical changes in brands or stimuli to changes in the perceptual map is difficult at best.

4.Marketing Mix Researchh). Positioning Research,

• The first component is the product class or the structure of the market a company's brand will compete.

• The second component is consumer segmentation.

• The third component is the consumers perception of the company’s brand in relation to those of the competitors.

• Fourth component of positioning is the benefit offered by the company’s brand.


4.Marketing Mix Researchi). Pricing Research

Market Segmentation.

Estimate of Demand.

The Market Share.

The Marketing Mix.

Estimate of Costs.

Pricing Strategy.

The price Structure.



• Market Segmentation– Type of product to be produced or sold– The kind of service to be rendered– The costs of operations, to be estimated.– The type of customers or market segments

sought.



• Estimate of Demand:– Marketers will estimate total demand for the

products. It will be based on sales forecast, channel opinions and degree of competition in the market

• Market Share:– Marketer will choose a brand image and the

desired market share on the basis of competitive reaction. Market planners must know exactly what


4.Marketing Mix Researchj). Shop and Retail Audits

• Normally, the retailer would like that research studies should cover

• Trade area analysis,• Store image,• Customer perception studies,• In store traffic pattern,• Location analysis


•End of UnitII


•Unit III


Syllabus of Unit III

I. Marketing effectiveness and analytics research: a) Customer Satisfaction Measurement,

b) Mystery shopping,

c) Market and Sales Analysis .

II. Exploratory designs

III. Descriptive designs I. Longitudinal and cross-sectional analysis.

IV. Qualitative research techniques –a) Based on questioning: Focus groups, Depth interviews,

Projective techniques.

b) Based on observations: ethnography, grounded theory,

c) Participant observation.

V. Causal research – a) Basic experimental designs

b) internal and external validity of experiments.04/10/2023 Kartikeya Singh 125

I. Marketing effectiveness and analytics research:

• Marketing expert Tony Lennon believes marketing effectiveness is quintessential to marketing, going so far as to say It's not marketing if it's not measured

• Dimensions of marketing effectiveness:– Corporate – Each company operates within different bounds.

These are determined by their size, their budget and their ability to make organizes act in similar ways leading to the need to segment them. Based on these segments, they make choices based on how they value the attributes of a product and the brand, in return for price paid for the product.

– Exogenous Factors – There are many factors outside of our immediate control that can impact the effectiveness of our marketing activities. These can include the weather, interest rates, government regulations and many others.


I. Marketing effectiveness and analytics research:

• Factors driving marketing effectiveness:• Marketing Strategy – Improving marketing effectiveness can be achieved by employing

a superior marketing strategy. By positioning the product or brand correctly, the product/brand will be more successful in the market than competitors’ products/brands.

• Marketing Creative – Even without a change in strategy, better creativity can improve results.

• Marketing Execution – By improving how marketers go to market, they can achieve significantly greater results without changing their strategy or their creative execution. At the marketing mix level, marketers can improve their execution by making small changes in any or all of the 4-Ps (Product, Price, Place and Promotion) (Marketing) without making changes to the strategic position or the creative execution marketers can improve their effectiveness and deliver increased revenue.

• Marketing Infrastructure (also known as Marketing Management) – Improving the business of marketing can lead to significant gains for the company. Management of agencies, budgeting, motivation and coordination of marketing activities can lead to improved competitiveness and improved results.

• Exogenous Factors - Generally out of the control of marketers, external or exogenous factors also influence how marketers can improve their results.


Customer satisfaction measurement

“The customer you loose holds information you need to succeed.”

Frederick F.Reichheld

Measures of customer satisfaction

• Overall customer satisfaction with the organization and its products / services

• Rating in the industry on the basis of overall customer satisfaction• Satisfaction with value for money• Desire to recommend the product or service to others. • Loyalty in terms of repeat purchases

Means of measuring customer satisfaction

I.Customer feedback after delivery of product or service II.Customer complaints and suggestions III. Customer Surveys

I. Customer feedback after delivery of product or service

This is one of the simplest, fastest and the most effective method of measuring customer satisfaction. The customers should be immediately asked to evaluate the product or service and comment upon areas of satisfaction and dissatisfaction.

II. Customer complaints and suggestions

The organization must have a formalized system of recording all customer complaints and as well as the methods of their disposal. Customer complaints must be taken very positively as valuable inputs by the organization and should immediately trigger the improvement activities.

IV. Customer surveys

Steps in conducting customer surveys: -

A. Identify your customers requirements under various segments.

B. Determine your survey methodology

C. Develop survey / interview questions

D. Conduct survey / Interview your customers

A. Identify your customers requirement areas.

It is extremely important to know the requirement of your customers before designing a questionnaire or survey. This is because if we do not ask the right questions, the answers we get will be irrelevant and it will be difficult to find out if the customers are really satisfied with the issues that are important to them.

Ways to identify customer requirements

• Discuss the issue with sample group of customers

• Ask your existing customers “If we have to develop a questionnaire to measure our customers’ satisfaction, what questions should we ask.

• Brainstorming with employees from various functions within the organization. A cross section of ideas from various people will give us the complete picture about the requirement of the customers

Product requirements

For identifying the customer requirements for a PRODUCT, the survey must cover the following areas:

• Performance

• Timeliness

• Reliability

• Durability

• Serviceability

• Aesthetics

Service requirements

For identifying the customer requirements for a SERVICE, the survey must cover the following areas:

• Security

• Reliability

• Accessibility

• Timeliness

• Responsiveness

• Empathy

• Assurance

B. Determine your survey methodology

This requires the organization to answer the following questions :

• How many customers to survey?

• Whom to survey?

• How to survey?

• When to survey?

• Who should conduct the survey?

How many customers to survey

The basic rule behind sample selection is to choose a cross section of customers which represents your overall customer base. For example if your customer database consists of large, medium & small organizations, your sample must represent the same.

Other criteria for selecting may include percentage of frequent versus infrequent customers, industry sector & geographic area.

Whom to survey?

While conducting the survey, the organizations must include the following customers:

• Present customers• Potential customers• Past customers• Competitor’s customers

Whom to survey?

The customer sample must never be biased. Everyone wants to hear good things from the customers and nobody wants to hear a negative feedback. There is a natural tendency to include a positive feedback and to exclude the negative feedback. This will never reflect the true measure of customer satisfaction. The organization must be willing to hear both positive & the negative from the customers if they are truly willing to improve their customer satisfaction.

How to survey?

The following methods can be used for conducting the survey:• Mail survey• Telephonic surveys• Face to face interviews• Comment cards The best method will depend on your situation, number of

customers in the sample group and what works best for your customers.

When to survey?

Survey at periodic intervals: Many organizations prefer to conduct customer satisfaction

measurement survey at certain time of the year. This however has certain disadvantages. If the period of survey is widely known it can signal the time for enhanced services to the customers during that period. The marketing personnel may distribute questionnaires to customers during these periods. Such conduct is open to all sorts of bias & this practice should be discouraged and avoided.

When to survey?

Surveying continuously: More & more organizations are moving towards continuous

measurement of customer satisfaction due to turbulent & dynamic marketing environment. Continuous measurement recognizes the on-going importance of customer satisfaction and is not influenced by momentary events (good or bad). This method keeps the organization completely focused on customer satisfaction & does not allow it to be forgotten between survey waves.

When to survey?

Surveying after “moments of truth” : Moments of truth are any interactions with customers in which

an organizations effectiveness is tested. For example• Getting the car loan from the bank• Settlement of insurance claims• Similarly, receiving money from the cash counter of a bank

When to survey?

Every moment of truth can be followed up with a satisfaction survey to determine as to how well the organization has performed in this important interaction.

Who should conduct the survey?

The survey can be undertaken by the organizations themselves or it can also be given to outside agencies. There are following advantages of getting the survey by outside professional agencies.

Who should conduct the survey?

• They are more objective in formulating questions & analyzing responses.

• Customers are more open when providing information to third parties.

• Professional agencies have the expertise to ensure that the process is productive & effective.

C. Develop survey questions

The organization must develop a pre-determined set of questions which must take into account all the requirements of the customers.

Develop survey questions

The questionnaire must give an impression to the customers that you are thorough & organized when gathering customer satisfaction information. The presentation & packaging of the questionnaire should not be shoddy. A good appearance can suggest evidence of organization’s high commitment to customer satisfaction management process and vice versa.

Sample questionnaire - Airlines

Waiting time for getting the boarding pass

Behavior of the front desk executive

Ready availability of information

Time taken in identification of luggage

Excellent Good AverageA. At the airport


Cabin crew’s welcome at the time of boarding the flight

Availability of reading material

Quality and quantity of food & beverages

Quality of service

Space in the aircraft to keep your hand baggage

Responsiveness for special service asked for

Cleanliness in the toilets

Excellent Good AverageB. In-flight service


Timeliness of the flight

In flight experience with regard to:-

Noise level

Temperature

Ride and landing

Flight ambience

Overall ratings

Your suggestions for improvement

Excellent Good AverageC. In-flight experience

Customer feedback

Sample survey / feedback forms for consumer durables, consumer non-durables and service industry are given in MS Excel file “Feedback forms” given along with this package.

Advantages of a good survey

A well designed and executed customer satisfaction survey can be a great asset to any organization due to the following reasons:

• It can pinpoint expenditure & resources which is being spent but do not help to satisfy the customers.

Advantages of a good survey

• It can identify opportunities for product & service innovation.• It can ensure that the quality improvement efforts are correctly

focused on issues that are most important to a customer.

Why customer survey’s fail?

Unfortunately, a well designed & executed survey tends to be an exception rather than the rule. The challenge of conducting a customer survey is to minimize the total amount of error. This error comes from two different sources.

A. Sampling errors B. Measurement errors

Types of sampling errors

These errors deal with the manner in which people are selected in a survey. They are of following types

• Failing to use statistical sampling methods • Incorrect selection of profile• Incorrect selection of number of people• Ignoring non-responses.

Types of measurement errors

These errors are related to the content of the survey and the way in which the results are used. These mistakes deal with :

• Drawing incorrect inferences from the responses• Asking non-specific questions.• Failing to ask all the questions.• Using incorrect or incomplete data analysis methods.• Error in feeding the results

1.b.Mystery shopping

Mystery shopping:

• Mystery Shopping is a highly valuable performance tool that provides a clear, accurate and unbiased account of the interaction between your employees and your customers.

• It is a performance evaluation process that allows the owners and managers of service organisations to really understand how their customers are treated in their shops, offices or practices, on the telephone, in writing or online. It identifies the 'gap' between their service beliefs and the reality of the customer experience.

Process in mystery shopping:

Mystery shopper:

Mystery shopper is one who is paid by the company to masquerade as a customer to

discreetly measure the quality of services in their showrooms and front offices.

Mystery auditors often throw up startling facts and reveal huge room for improvement.Mystery shoppers identify soft skills and

intuitiveness as the key default areas among store staff in the country .

Typically ,mystery auditors charge Rs.1500-2000 for a small size retail format store with fees going

up for bigger showrooms.A mystery shopper spends 30-45 mins to review

a small retail shop,it could take 2-3 days to review a hotel.

1.c.Market and Sales Analysis

• Describe the goal of market analysis.• Enumerate and classify the different

dimensions of market analysis.• Discuss the dimensions of market analysis

and relate them to personal experiences and/or observations.

• Illustrate the value chain and experience curve.

1.c.Market and Sales Analysis Goal of Market Analysis

• To determine the attractiveness of a market and to understand its evolving opportunities and threats as they relate to the strengths and weaknesses of the firm.

1.c.Market and Sales Analysis Dimensions of Market Analysis

1. Market size (current and future)

2. Market growth rate

3. Market profitability

4. Industry cost structure

5. Distribution channel

6. Market trends

7. Key success factors


2. Market growth rate3. Market profitability4. Industry cost

structure5. Distribution channel6. Market trends7. Key success factors

1.c.Market and Sales Analysis Market Size

The size of the market can be evaluated based on:

• Present sales• Potential sales (if expanded)

Some information sources for determining market size:

• Government data• Trade associations• Financial data from major

players• Customer survey




1.c.Market and Sales Analysis Market Growth Rate

A simple means of forecasting the market growth rate is to extrapolate (infer or estimate) historical data into the future. While this method may provide a first-order estimate, it does not predict important turning points. A better method is to study growth drivers such as demographic information and sales growth in complementary products.




1.c.Market and Sales Analysis

Ultimately, the maturity and decline stages of the product life cycle will be reached. Some leading indicators of the decline phase include:

• Price pressure caused by competition• Decrease in brand loyalty• Emergence of substitute products• Market saturation• Lack of growth drivers




1.c.Market and Sales Analysis Market Profitability

While different firms in the market will have different levels of profitability, the average profit potential for a market can be used as a guideline for knowing how difficult it is to make money in the market.




1.c.Market and Sales Analysis Porter’s Five Competitive Forces

Rivalry among

Competitors

Threat of Substitute Products

Potential New

Entrants

Bargaining Power of Buyers

Bargaining Power of Suppliers

Internet tends to increase bargaining power of suppliers

Internet reduces barriers to entry Internet blurs

differences among competitors

Internet creates new substitution threats

Internet shifts greater power to end consumers




1.c.Market and Sales Analysis Industry Cost Structure

The cost structure is important for identifying key factors for success. To this end, Porter’s value chain model is useful for determining where value is added and for isolating the costs.

The cost structure also is helpful for formulating strategies to develop a competitive advantage. For example, in some environments the experience curve effect can be used to develop a cost advantage over competitors.




1.c.Market and Sales Analysis Porter’s Generic Value Chain

Infrastructure

Human Resource Management

Technology DevelopmentProcurement

Elapsed Time - Value added time cost

InboundLogistic

s

Operations

Outbound

Logistics

Marketing& Sales

Service

Support Activities

Primary Activities




1.c.Market and Sales Analysis Primary Value Chain Activities:

• Inbound Logistics: the receiving and warehousing of raw materials, and their distribution to manufacturing as they are required.

• Operations: the processes of transforming inputs into finished products and services.

• Outbound Logistics: the warehousing and distribution of finished goods.

• Marketing and Sales: the identification of customer needs and the generation of sales.

• Service: the support of customers after the products and services are sold to them.




1.c.Market and Sales Analysis Supports of the Primary Activities:

• The infrastructure of the firm: organizational structure, control systems, company culture, etc.

• Human resource management: employee recruiting, hiring, training, development, and compensation.

• Technology development: technologies to support value-creating activities.

• Procurement: purchasing inputs such as materials, supplies, and equipment.




1.c.Market and Sales Analysis Distribution Channel

The following aspects of the distribution system are useful in a market analysis:

• Existing distribution channel– can be described by how direct they are to

the customer.

• Trends and emerging channels– new channels can offer the opportunity to

develop a competitive advantage.

• Channel power structure– for example, in the case of a product having

little brand equity, retailers have negotiating power over manufacturers and can capture more margin.




1.c.Market and Sales AnalysisMarket Trends

Changes in the market are important because they often are the source of new opportunities and threats. The relevant trends are industry-dependent, but some examples include changes in price sensitivity, demand for variety, and level of emphasis on service and support. Regional trends also may be relevant.




1.c.Market and Sales Analysis Key Success Factors

– Elements that are necessary in order for the firm to achieve its marketing objectives.

few examples are:– Access to essential unique

resources– Ability to achieve economies of

scale– Access to distribution channels– Technological progress

It is important to consider that key success factors may change over time, especially as the product progresses through its life cycle.




2.Exploratory Research Design3.Descriptive Research Design4. Longitudinal and cross-sectional analysis.

•Research Design


Syllabus

Research Design

• A master plan that specifies the methods and procedures for collecting and analyzing needed information.


Research Design


Research Design

Exploratory Design

Survey of Experts

Pilot Surveys

Secondary Data research

Conclusive Design

Descriptive Research

Exploratory Research

• Usually conducted during the initial stage of the research process

• Purposes– To narrow the scope of the

research topic, and– To transform ambiguous

problems into well-defined ones

Research Design• Exploratory

– Secondary data Research

– Pilot Survey– Survey of

Experts• Conclusive

– Descriptive• Cross Sectional

– Single Cross Sectional

– Multiple Cross Sectional

• Longitudinal

– Causal

Exploratory Research Techniques• Secondary Data Analysis

– Secondary data are data previously collected & assembled for some project other than the one at hand

• Pilot Studies– A collective term for any small-scale

exploratory research technique that uses sampling but does not apply rigorous standards

– Includes• Focus Group Interviews

– Unstructured, free-flowing interview with a small group of people

• Projective Techniques– Indirect means of questioning that enables a

respondent to project beliefs and feelings onto a third party or an inanimate object

– Word association tests, sentence completion tests, role playing








• Longitudinal

– Causal

Exploratory Research Techniques

• Case Studies– Intensively investigate one or a few

situations similar to the problem situation

• Experience Surveys– Individuals who are knowledge about

a particular research problem are questioned








• Longitudinal

– Causal

Conclusive Research• Provide specific information that aids

the decision maker in evaluating alternative courses of action

• Sound statistical methods & formal research methodologies are used to increase the reliability of the information

• Data sought tends to be specific & decisive

• Also more structured & formal than exploratory data








• Longitudinal

– Causal

Types of Conclusive Research

• Descriptive Research:– Describes attitudes, perceptions,

characteristics, activities and situations.– Examines who, what, when, where, why, &

how questions

• Causal Research:– Provides evidence that a cause-and-effect

relationship exists or does not exist.– Premise is that something (and

independent variable) directly influences the behavior of something else (the dependent variable).








• Longitudinal

– Causal

Common Characteristics of Descriptive Studies

• Build on previous information• Show relationships between

variables• Representative samples

required• Structured research plans• Require substantial resources• Conclusive findings








• Longitudinal

– Causal

Major Types of Descriptive Studies

Descriptive Studies

Consumer PerceptionAnd Behavior Studies

Image

Product Usage

Advertising

Pricing

Market Characteristic Studies

Distribution

Competitive Analysis

Market Potential

Market Share

Sales Analysis

Sales Studies

Cross-sectional Designs


• Involve the collection of information from any given sample of population elements only once.

• In single cross-sectional designs, there is only one sample of respondents and information is obtained from this sample only once.

• In multiple cross-sectional designs, there are two or more samples of respondents, and information from each sample is obtained only once. Often, information from different samples is obtained at different times.

• Cohort analysis consists of a series of surveys conducted at appropriate time intervals, where the cohort serves as the basic unit of analysis. A cohort is a group of respondents who experience the same event within the same time interval.








• Longitudinal

– Causal

Longitudinal Designs

• A fixed sample (or samples) of population elements is measured repeatedly on the same variables

• A longitudinal design differs from a cross-sectional design in that the sample or samples remain the same over time








• Longitudinal

– Causal

Sample Surveyed at

T1

Sample Surveyed at

T1

Same Sample also Surveyed at

T2

T1 T2

Cross Sectional Design

Longitudinal Design

Time

Cross Sectional vs. Longitudinal Designs

Relative Advantages and Disadvantages of Longitudinal and Cross-Sectional Designs

Evaluation Criteria

Cross-Sectional Design Longitudinal Design

Detecting ChangeLarge amount of data collectionAccuracyRepresentative SamplingResponse bias

---++

+++--

Note: A “+” indicates a relative advantage over the other design, whereas a “-” indicates a relative disadvantage.

Common Characteristics of Causal Studies

• Logical Time Sequence– For causality to exist, the cause must

either precede or occur simultaneously with the effect

• Concomitant Variation– Extent to which the cause and effect

vary together as hypothesized

• Control for Other Possible Causal Factors








• Longitudinal

– Causal

How Descriptive & Causal Designs Differ

• Relationship between the variables– Descriptive designs determine degree of

association– Causal designs infer whether one or more

variables influence another variable

• Degree of environmental control– Descriptive designs enjoy lesser degrees of

control

• Order of the variables– In descriptive designs, variables are not

logically ordered








• Longitudinal

– Causal

Uses of Casual Research

• To understand which variables are the cause (independent variables) and which variables are the effect (dependent variables) of a phenomenon

• To determine the nature of the relationship between the causal variables and the effect to be predicted

• METHOD: Experiments








• Longitudinal

– Causal

5. Qualitative research techniques –1. Based on questioning: Focus groups,

Depth interviews, Projective techniques.

2. Based on observations: ethnography, grounded theory,

3. Participant observation.


Qualitative Research

Qualitative research is a loosely defined term. It implies that the research findings are not determined by quantification or quantitative analysis.

Qualitative vs. Quantitative Research(1 of 2)

Comparison Dimension Qualitative Research Quantitative Research

Types of questions Probing Limited probing

Sample size Small Large

Information per Much Varies

respondent

Administration Requires interviewers Fewer specialized skillswith special skills required

Types of analysis Subjective, interpretive Statistical, summarization

Qualitative vs. Quantitative Research(2 of 2)

Comparison Dimension Qualitative Research Quantitative Research

Tools Tape recorders, projection Questionnaires, computers

devices, video, pictures printouts

Ability to replicate Low High

Training needed by Psychology, sociology, Statistics, decision models,

the researcher social psychology, DSS, computer program-

consumer behavior ming, marketing

Type of research Exploratory Descriptive or causal

Qualitative Research Methods

Include• Depth Interviews• Projective Techniques • Focus Groups• Observation (Ethnography)

… and other methods

Depth Interview

Example: Wide Seats in an AirplaneI: “Why do you like wide seats in an

airplane?”R: “It makes me comfortable.”I: “Why is it important to be comfortable?”R: “I can accomplish more.”I: “Why is important that you can accomplish

more?”R: “I feel good about myself.”

Implication: Wide seats may relate to self-esteem!

Projective Techniques

Eliciting deep-seated feelings/opinions by enabling the respondents to project themselves into unstructured situations.

Word AssociationSentence CompletionRole playingStory telling with pictures

… and several others

Popularity of Focus Group Research

• Most marketing research firms, advertising agencies, and consumer goods manufacturers use focus groups.

• Focus groups tend to be used more extensively by consumer goods companies than by industrial goods organizations.

Focus Group

Focus Group

• Spot source of marketing problem• Spark new product ideas• Develop questionnaires for quantitative research• Identify new advertising themes• Diagnose competitors’ strengths and

weaknesses

A group of people who discuss a subject under the direction of a moderator. Focus groups are used to:

Focus Group Research - Overview

The goal of focus group research is to learn and understand what people have to say and why

The emphasis is on getting people to talk at length and in detail about the subject at hand

The intent is to find out how they feel about a product, concept, idea, or organization, how it fits into their lives, and their emotional involvement with it

Benefits of Focus Group Research

• Synergy - together, the group can provide more insights than insights obtained individually.

• Snowballing - chain reaction to comment by one individual.

• Stimulation - group interaction excites people.• Spontaneity/serendipity - participants may get

ideas on the spot and discuss them.

Focus Group Research - Steps

1. Define objectives of study

2. Develop questions for discussion - Moderator Guide

3. Recruit participants

4. Conduct Session with a moderator

5. Analyze and report results to decision makers

Results can be misleading if the focus group is not conducted properly.

Focus Group Issues (1 of 2)

• How many people in a focus group?

• What type of people should be recruited?

• Should participants be …

– Knowledgeable?

– Diverse?

– Representative?

Comparing Qualitative and Quantitative Methods

Before discussing the differences between qualitative and quantitative methodologies one must understand the foundational similarities.

=?

Foundational Similarities

• All qualitative data can be measured and coded using quantitative methods.

• Quantitative research can be generated from qualitative inquiries.

• Example: One can code an open-ended interview with numbers that refer to data specific references, or such references could become the origin of a randomized experiment.

Foundational Differences

• The major difference between qualitative and quantitative research stems from the researcher’s underlying strategies.

• Quantitative research is viewed as confirmatory and deductive in nature.

• Qualitative research is considered to be exploratory and inductive.


• Terminology• Methods• Strengths and weaknesses

Terminology

• Grounded theory• Ethnography• Phenomenology• Field research

Grounded Theory

• Grounded theory refers to an inductive process of generating theory from data.

• This is considered ground-up or bottom-up processing.

• Grounded theorists argue that theory generated from observations of the empirical world may be more valid and useful than theories generated from deductive inquiries.

Grounded Theory (con’t)

• Grounded theorists criticize deductive reasoning since it relies upon a priori assumptions about the world.

• However, grounded theory incorporates deductive reasoning when using constant comparisons.

• In doing this, researchers detect patterns in their observations and then create working hypotheses that directs the progression of the inquiry.

Ethnography

• Ethnography emphasizes the observation of details of everyday life as they naturally unfold in the real world. This is sometimes called naturalistic research.

• Ethnography is a method of describing a culture or society. This is primarily used in anthropological research.

Phenomenology

• Phenomenology is a school of thought that emphasizes a focus on people’s subjective experiences and interpretations of the world.

• Phenomenological theorists argue that objectivity is virtually impossible to ascertain, so to compensate, one must view all research from the perspective of the researcher.

Phenomenology (con’t)

• Phenomenologists attempt to understand those whom they observe from the subjects’ perspective.

• This outlook is especially pertinent in social work and research where empathy and perspective become the keys to success.

Field Research

• Field research is a general term that refers to a group of methodologies used by researchers in making qualitative inquiries.

• The field researcher goes directly to the social phenomenon under study and observes it as completely as possible.

Field Research (con’t)

• The natural environment is the priority of the field researcher. There are no implemented controls or experimental conditions to speak of.

• Such methodologies are especially useful in observing social phenomena over time.

Methods

• Participant observation• Direct observation• Unstructured or intensive

interviewing• Case studies

Participant Observation

• The researcher literally becomes part of the observation.

• Example: One studying the homeless may decide to walk the streets of a given area in an attempt to gain perspective and possibly subjects for future study.

Direct Observation

• Direct observation is where the researcher observes the actual behaviors of the subjects, instead of relying on what the subjects say about themselves or others say about them.

• Example: The observation booth at the CECP in Martha Van may be used for direct observation of behavior where survey or other empirical methodologies may seem inappropriate.

Unstructured or Intensive Interviewing

• This method allows the researcher to ask open-ended questions during an interview.

• Details are more important here than a specific interview procedure.

• Here lies the inductive framework through which theory can be generated.

Case Studies

• A particular case study may be the focus of any of the previously mentioned field strategies.

• The case study is important in qualitative research, especially in areas where exceptions are being studied.

• Example: A patient may have a rare form of cancer that has a set of symptoms and potential treatments that have never before been researched.

Strengths and Weaknesses

• Objectivity• Reliability• Validity• Generalizability

Objectivity

• It is given that objectivity is impossible in qualitative inquiry. Instead the researcher locates his/herself in the research.

• Objectivity is replaced by subjective interpretation and mass detail for later analysis.

Reliability

• Since procedure is de-emphasized in qualitative research, replication and other tests of reliability become more difficult.

• However, measures may be taken to make

research more reliable within the particular study (such as observer training, or more objective checklists, and so on).

Validity

• Qualitative researchers use greater detail to argue for the presence of construct validity.

• Weak on external validity.

• Content validity can be retained if the researcher implements some sort of criterion settings.

• Having a focused criterion adds to the study’s validity.

Generalizability

• Results for the most part, do not extend much further than the original subject pool.

• Sampling methods determine the extent of the study’s generalizability.

• Quota and Purposive sampling strategies are used to broaden the generalizability.

Summing Up

• Remember that there are always trade-offs in research.

• Are you willing to trade detail for generalizability?

• Will exploratory research enable you to generate new theories?

• Can you ask such sensitive questions on a questionnaire?

Summing Up (con’t)

• Will the results add any evidence toward any pre-existing theory or hypothesis?

• Is FUNDING available for this research?

• Do you really need to see numbers to support your theories or hypotheses?

• Are there any ethical problems that could be minimized by choosing a particular strategy?

Unit - IV (6 sessions )• Primary data –

– Questionnaire design – – Administration and analysis considerations in design – – Attitude measurement – Scaling techniques. – Observation method of primary data collection. – Web based primary data collection – – Issues of reach, analysis, accuracy, time and efficiency.

• Sampling – – sampling methods – sampling and non sampling errors – – sample size calculation – – population and sample size - large and small samples – – Practical considerations in determining sample size.


• Compilation and interpretation of primary and secondary sources of information.

• The integration of different sources will consolidate the write up of the report.

DATA COLLECTION

SOURCES OF INFORMATION

Primary Source• Data is collected by

researcher himself

• Data is gathered through questionnaire,

interviews,observations etc.

Secondary Source• Data collected,

compiled or written by other

researchers eg. books, journals, newspapers• Any reference must

be acknowledged

STEPS TO COLLECT DATA

DATA ANALYSIS AND INTERPRETATION

REVIEW & COMPILE SECONDARY SOURCE INFORMATION(Referred to in the BACKGROUND/ INTRODUCTION section of report)

REVIEW & COMPILE SECONDARY SOURCE INFORMATION(Referred to in the BACKGROUND/ INTRODUCTION section of report)

PLAN & DESIGN DATA COLLECTION INSTRUMENTS TO GATHER PRIMARY INFORMATION

(Referred to in the FINDINGS, CONCLUSIONS & RECOMMENDATIONS sections of report)

PLAN & DESIGN DATA COLLECTION INSTRUMENTS TO GATHER PRIMARY INFORMATION

(Referred to in the FINDINGS, CONCLUSIONS & RECOMMENDATIONS sections of report)

DATA COLLECTIONDATA COLLECTION

METHODS USED TO COLLECT

PRIMARY SOURCE DATA

1. Interviews2. Questionnaires3. Survey4. Experimentation5. Case Study6. Observation

However, for a small-scale study, the most commonly used methods are interviews, survey questionnaires and observations.

Effective way of gathering information

INTERVIEW

Involves verbal and non-verbal communications

Can be conducted face to face, by telephone,

online or through mail

Steps To An Effective Interview

Prepare your interview schedule

Select your subjects/ key informants

Conduct the interview

Analyze and interpret data collected from the interview

The most common data collection instrument

SurveyQuestionnaire

Useful to collect quantitative and qualitative

information

Should contain 3 elements:1. Introduction – to explain the objectives

2. Instructions – must be clear, simple language & short3. User-friendly – avoid difficult or ambiguous questions

2 Basic Types of survey questions:

1. Open-ended Questions– Free-response

(Text Open End)– Fill-in relevant

information

2. Close-ended Questions– Dichotomous question– Multiple-choice– Rank– Scale– Categorical– Numerical

Note: For specific examples and students’ activities on each question style, please refer to the notes on Data Collection in the e-learning.

Steps To An Effective Survey Questionnaire

Prepare your survey questions(Formulate & choose types of questions, order them, write instructions, make copies)

Select your respondents/samplingRandom/Selected

Administer the survey questionnaire(date, venue, time )

Analyze and interpret data collected

Tabulate data collected (Statistical analysis-frequency/mean/correlation/% )

A sample of complete survey questionnairehttp://www.custominsight.com/demo/form_widgets.rtf

http://www.custominsight.com/demo/form_widgets.rtf

Observe verbal & non-verbal communication, surrounding atmosphere,

culture & situation

Observations

Need to keep meticulous records of

the observations

Can be done through discussions,observations of habits, rituals,

review of documentation,experiments

Steps To An Effective ObservationDetermine what needs to be observed

(Plan, prepare checklist, how to record data)

Select your participantsRandom/Selected

Conduct the observation(venue, duration, recording materials, take photographs )

Analyze and interpret data collected

Compile data collected

3.Scaling Techniques

In business research, measurement of variables is a indispensable requirement

Problem – Defining what is to be measured, and how it is to be accurately and reliably measured

Some things (or concepts) which are inherently abstract in their nature (e.g. job satisfaction, employee morale, brand loyalty of consumers) are more difficult to measure than concepts which can be assigned numerical values (e.g. sales volume for employees X, Y and Z)


A scale is basically a continuous spectrum or series of categories and has been defined as any series of items that are arranged progressively according to value or magnitude, into which an item can be placed according to its quantification

Four popular scales in business research are:

– Nominal scales– Ordinal scales– Interval scales– Ratio scales

3.Scaling Techniques A nominal scale is the simplest of the four scale types

and in which the numbers or letters assigned to objects serve as labels for identification or classification

Example:

Males = 1, Females = 2 Sales Zone A = Islamabad, Sales Zone B = Rawalpindi Drink A = Pepsi Cola, Drink B = 7-Up, Drink C = Miranda


An ordinal scale is one that arranges objects or alternatives according to their magnitude

Examples:

Career Opportunities = Moderate, Good, Excellent Investment Climate = Bad, inadequate, fair, good, very good Merit = A grade, B grade, C grade, D grade

A problem with ordinal scales is that the difference between categories on the scale is hard to quantify, I,e., excellent is better than good but how much is excellent better?


An interval scale is a scale that not only arranges objects or alternatives according to their respective magnitudes, but also distinguishes this ordered arrangement in units of equal intervals (i.e. interval scales indicate order (as in ordinal scales) and also the distance in the order)

Examples: Consumer Price Index Temperature Scale in Fahrenheit

Interval scales allow comparisons of the differences of magnitude (e.g. of attitudes) but do not allow determinations of the actual strength of the magnitude.


A ratio scale is a scale that possesses absolute rather than relative qualities and has an absolute zero.

Examples: Money Weight Distance Temperature on the Kelvin Scale

Interval scales allow comparisons of the differences of magnitude (e.g. of attitudes) as well as determinations of the actual strength of the magnitude

254Measurement, Scaling, Questionnaire

& Form Design


Primary Scales of Measurement

7 38

Nominal Numbers Assigned

Ordinal Rank Orderof Winners

Interval PerformanceRating on a

0 to 10 Scale

Ratio Time to Finishin Seconds

Thirdplace

Secondplace

Firstplace

Finish

Finish

8.2 9.1 9.6

15.2 14.1 13.4


Type of Scale Numerical Operation Descriptive Statistics

Nominal Counting Frequency in each category, percentage in each category, mode

Ordinal Rank Ordering Median, range, percentile ranking

Interval Arithmetic Operations on Intervals between numbers

Mean, standard deviation, variance

Ratio Arithmetic Operations on actual quantities

Geometric mean, coefficient of variation

3.Scaling Techniques Criteria for Good Measurement: Reliability – Reliability is the degree to which

measurements are devoid of error and therefore in the position to yield consistent results, also over repeated attempts over time (ordinal measures always yield the same order, interval measurements always yield the same order and same distance between the measured items)

Validity – Validity is the ability of a scale or measuring instrument to measure what it is intended to measure (e.g. is absenteeism from work a valid measure of job satisfaction or are there other influences like a flu epidemic which is keeping employees from work)


Sensitivity – Sensitivity is the ability of a measurement instrument to accurately measure variability in stimuli or responses (e.g. on a scale, the choices very strongly agree, strongly agree, agree, don’t agree offer more choices than a scale with just two choices - agree and don’t agree – and is thus more sensitive)

3.Scaling Techniques Attitude

Measuring Attitude is a frequent undertaking in business research

Attitude may be defined as an enduring disposition to consistently respond in a given manner to various aspects

Attitude has three dimensions:

AffectiveComponent

AffectiveComponent

CognitiveComponent

CognitiveComponent

BehaviouralComponent

BehaviouralComponent


Components of AttitudeAffective Component – Reflective of a

person’s general feelings or emotions towards an object or subject (like, dislike, love, hate)

Cognitive Component – Reflective of a person’s awareness of and knowledge about an object or subject (know, believe)

Behavioral Component – Reflective of a person’s intentions and behavioral expectations, and predisposition to action


• It can be difficult to measure attitude, therefore, indicators such as verbal expression, physiological measurement techniques and overt behavior are used for this purpose. The three different components of attitude may require different measuring techniques

• Common techniques used in business research to determine attitude include rating, ranking, sorting and the choice technique


Rating Scales are frequently employed in business research for measuring attitude, and many scales have been developed for this purpose, including:

Simple Attitude Scales Category Scales Likert Scale Semantic Differential Numerical Scales Constant-Sum Scale Staple Scale Graphic Scales

3.Scaling Techniques.

Simple Attitude Scales In attitude scaling, individuals are typically asked

whether they agree or disagree with a question (or questions) put to them, or they are asked to respond to a question or questions

Simple attitude scales have the properties of a nominal scale and the disadvantages that go with it, also, they do not permit fine distinctions in the respondents’ answers because their choice of answers is limited, but they can be useful in instances where the respondents’ education level is low and questionnaires lengthy


Category Scale:A category scale consists of several

response categories to provide the respondent with alternative ratings

Category scales are more sensitive than rating scales which allow only two answer categories (because of the larger number of choices), and thus provides more data and information


The Likert Scale: A likert Scale is a measure of attitudes designed to allow

respondents to indicate how strongly they agree or disagree with carefully constructed statements that range from very positive to very negative towards an object or subject

The number of alternatives on the Likert scale can vary, often five alternatives are foreseen

A Likert Scale may include a number of question items, each covering some aspect of the respondent’s attitude, and these items collectively form an index

3.Scaling Techniques The Semantic Differential

The semantic differential is an attitude measuring technique that which consists of a series of seven bi-polar rating scales which allow response to a concept (e.g. organization, product, service, job)

An advantage of the semantic differential is its versatility, on the other hand, it uses extremes which may influence respondents’ answers

Strong ____:____:____:____:____:____:____ Weak

Decisive ____:____:____:____:____:____:____ Indecisive

Good ____:____:____:____:____:____:____ Bad

Cheap ____:____:____:____:____:____:____ Expensive

Active ____:____:____:____:____:____:____ Passive

Lazy ____:____:____:____:____:____:____ Industrious

267

SAMPLINGMETHODS

SAMPLING

• A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005)

• Why sample?– Resources (time, money) and workload– Gives results with known accuracy that can be

calculated mathematically• The sampling frame is the list from which the

potential respondents are drawn – Registrar’s office– Class rosters– Must assess sampling frame errors

268

SAMPLING……

269

• What is your population of interest?• To whom do you want to generalize your

results?–All doctors–School children– Indians–Women aged 15-45 years–Other

• Can you sample the entire population?

270

SAMPLING BREAKDOWN

SAMPLING…….

271

TARGET POPULATION

STUDY POPULATION

SAMPLE

Types of Samples

• Probability (Random) Samples• Simple random sample

– Systematic random sample– Stratified random sample– Multistage sample– Multiphase sample– Cluster sample

• Non-Probability Samples– Convenience sample– Purposive sample– Quota

272

Process

• The sampling process comprises several stages:– Defining the population of concern – Specifying a sampling frame, a set of items or

events possible to measure – Specifying a sampling method for selecting

items or events from the frame – Determining the sample size – Implementing the sampling plan – Sampling and data collecting – Reviewing the sampling process

273

http://en.wikipedia.org/wiki/Sampling_(statistics)

http://en.wikipedia.org/wiki/Set_(mathematics)

http://en.wikipedia.org/wiki/Sampling_(statistics)

Population definition

• A population can be defined as including all people or items with the characteristic one wishes to understand.

• Because there is very rarely enough time or money to gather information from everyone or everything in a population, the goal becomes finding a representative sample (or subset) of that population.

274

SAMPLING FRAME

• In the most straightforward case, such as the sentencing of a batch of material from production (acceptance sampling by lots), it is possible to identify and measure every single item in the population and to include any one of them in our sample. However, in the more general case this is not possible. There is no way to identify all rats in the set of all rats. Where voting is not compulsory, there is no way to identify which people will actually vote at a forthcoming election (in advance of the election)

• As a remedy, we seek a sampling frame which has the property that we can identify every single element and include any in our sample .

• The sampling frame must be representative of the population

275

PROBABILITY SAMPLING

• A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined.

• When every element in the population does have the same probability of selection, this is known as an 'equal probability of selection' (EPS) design. Such designs are also referred to as 'self-weighting' because all sampled units are given the same weight.

276

PROBABILITY SAMPLING…….

• Probability sampling includes:

• Simple Random Sampling, • Systematic Sampling,• Stratified Random

Sampling, • Cluster Sampling• Multistage Sampling. • Multiphase sampling

277

1. Probability sampling includes:

I. Simple Random Sampling,

II. Systematic Sampling,

III. Stratified Random Sampling,

IV. Cluster Sampling

V. Multistage Sampling.

VI. Multiphase sampling

2. Non probability Sampling includes:

I. Accidental Sampling,

II. Quota Sampling and

III. Purposive Sampling.

NON PROBABILITY SAMPLING

• Any sampling method where some elements of population have no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or where the probability of selection can't be accurately determined. It involves the selection of elements based on assumptions regarding the population of interest, which forms the criteria for selection. Hence, because the selection of elements is nonrandom, nonprobability sampling not allows the estimation of sampling errors..

• Example: We visit every household in a given street, and interview the first person to answer the door. In any household with more than one occupant, this is a non probability sample, because some people are more likely to answer the door (e.g. an unemployed person who spends most of their time at home is more likely to answer than an employed housemate who might be at work when the interviewer calls) and it's not practical to calculate these probabilities.

278












NONPROBABILITY SAMPLING…….

279


I. Accidental Sampling, II. Quota Sampling and III. Purposive Sampling.












SIMPLE RANDOM SAMPLING

• Applicable when population is small, homogeneous & readily available

• All subsets of the frame are given an equal probability. Each element of the frame thus has an equal probability of selection.

• It provides for greatest number of possible samples. This is done by assigning a number to each unit in the sampling frame.

• A table of random number or lottery system is used to determine which units are to be selected.

280












SIMPLE RANDOM SAMPLING……..

• Estimates are easy to calculate.• Simple random sampling is always an EPS

design, but not all EPS designs are simple random sampling.

• Disadvantages • If sampling frame large, this method

impracticable.• Minority subgroups of interest in population

may not be present in sample in sufficient numbers for study.

281












REPLACEMENT OF SELECTED UNITS

• Sampling schemes may be without replacement ('WOR' - no element can be selected more than once in the same sample) or with replacement ('WR' - an element may appear multiple times in the one sample).

• For example, if we catch fish, measure them, and immediately return them to the water before continuing with the sample, this is a WR design, because we might end up catching and measuring the same fish more than once. However, if we do not return the fish to the water (e.g. if we eat the fish), this becomes a WOR design.

282












SYSTEMATIC SAMPLING• Systematic sampling relies on arranging the target

population according to some ordering scheme and then selecting elements at regular intervals through that ordered list.

• Systematic sampling involves a random start and then proceeds with the selection of every kth element from then onwards. In this case, k=(population size/sample size).

• It is important that the starting point is not automatically the first in the list, but is instead randomly chosen from within the first to the kth element in the list.

• A simple example would be to select every 10th name from the telephone directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10').

283












http://en.wikipedia.org/wiki/Systematic_sampling

SYSTEMATIC SAMPLING……

As described above, systematic sampling is an EPS method, because all elements have the same probability of selection (in the example given, one in ten). It is not 'simple random sampling' because different subsets of the same size have different selection probabilities - e.g. the set {4,14,24,...,994} has a one-in-ten probability of selection, but the set {4,13,24,34,...} has zero probability of selection.

284












SYSTEMATIC SAMPLING……

285

• ADVANTAGES:• Sample easy to select• Suitable sampling frame can be

identified easily• Sample evenly spread over entire

reference population• DISADVANTAGES:• Sample may be biased if hidden

periodicity in population coincides with that of selection.

• Difficult to assess precision of estimate from one survey.












STRATIFIED SAMPLING

Where population embraces a number of distinct categories, the frame can be organized into separate "strata." Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected.

• Every unit in a stratum has same chance of being selected.

• Using same sampling fraction for all strata ensures proportionate representation in the sample.

• Adequate representation of minority subgroups of interest can be ensured by stratification & varying sampling fraction between strata as required.

286












STRATIFIED SAMPLING……

• Finally, since each stratum is treated as an independent population, different sampling approaches can be applied to different strata.

• Drawbacks to using stratified sampling.

• First, sampling frame of entire population has to be prepared separately for each stratum

• Second, when examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata.

• Finally, in some cases (such as designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods

287












STRATIFIED SAMPLING…….

288

Draw a sample from each stratum












POSTSTRATIFICATION

• Stratification is sometimes introduced after the sampling phase in a process called "poststratification“.

• This approach is typically implemented due to a lack of prior knowledge of an appropriate stratifying variable or when the experimenter lacks the necessary information to create a stratifying variable during the sampling phase. Although the method is susceptible to the pitfalls of post hoc approaches, it can provide several benefits in the right situation. Implementation usually follows a simple random sample. In addition to allowing for stratification on an ancillary variable, poststratification can be used to implement weighting, which can improve the precision of a sample's estimates.

289












OVERSAMPLING

• Choice-based sampling is one of the stratified sampling strategies. In this, data are stratified on the target and a sample is taken from each strata so that the rare target class will be more represented in the sample. The model is then built on this biased sample. The effects of the input variables on the target are often estimated with more precision with the choice-based sample even when a smaller overall sample size is taken, compared to a random sample. The results usually must be adjusted to correct for the oversampling.

290












CLUSTER SAMPLING

• Cluster sampling is an example of 'two-stage sampling' .

• First stage a sample of areas is chosen;• Second stage a sample of respondents

within those areas is selected.• Population divided into clusters of

homogeneous units, usually based on geographical contiguity.

• Sampling units are groups rather than individuals.

• A sample of such clusters is then selected.• All units from the selected clusters are

studied.291












http://en.wikipedia.org/wiki/Cluster_sampling

CLUSTER SAMPLING…….

• Advantages :• Cuts down on the cost of

preparing a sampling frame.• This can reduce travel and

other administrative costs.• Disadvantages: sampling error

is higher for a simple random sample of same size.

• Often used to evaluate vaccination coverage in EPI

292













• Identification of clusters– List all cities, towns, villages & wards of cities

with their population falling in target area under study.

– Calculate cumulative population & divide by 30, this gives sampling interval.

– Select a random no. less than or equal to sampling interval having same no. of digits. This forms 1st cluster.

– Random no.+ sampling interval = population of 2nd cluster.

– Second cluster + sampling interval = 4th cluster.

– Last or 30th cluster = 29th cluster + sampling interval

293













Two types of cluster sampling methods.

One-stage sampling. All of the elements within selected clusters are included in the sample.

Two-stage sampling. A subset of elements within selected clusters are randomly selected for inclusion in the sample.

294












Difference Between Strata and Clusters

295

• Although strata and clusters are both non-overlapping subsets of the population, they differ in several ways.

• All strata are represented in the sample; but only a subset of clusters are in the sample.

• With stratified sampling, the best survey results occur when elements within strata are internally homogeneous. However, with cluster sampling, the best results occur when elements within clusters are internally heterogeneous












http://stattrek.com/Help/Glossary.aspx?Target=Strata

http://stattrek.com/Help/Glossary.aspx?Target=Homogeneous

http://stattrek.com/Help/Glossary.aspx?Target=Heterogeneous

MULTISTAGE SAMPLING

• Complex form of cluster sampling in which two or more levels of units are embedded one in the other.

• First stage, random number of districts chosen in all

states.

• Followed by random number of talukas, villages.

• Then third stage units will be houses. • All ultimate units (houses, for instance)

selected at last step are surveyed.

296












MULTISTAGE SAMPLING……..

• This technique, is essentially the process of taking random samples of preceding random samples.

• Not as effective as true random sampling, but probably solves more of the problems inherent to random sampling.

• An effective strategy because it banks on multiple randomizations. As such, extremely useful.

• Multistage sampling used frequently when a complete list of all members of the population not exists and is inappropriate.

• Moreover, by avoiding the use of all sample units in all selected clusters, multistage sampling avoids the large, and perhaps unnecessary, costs associated with traditional cluster sampling.

297












MULTI PHASE SAMPLING

• Part of the information collected from whole sample & part from subsample.

• In Tb survey MT in all cases – Phase I• X –Ray chest in MT +ve cases – Phase II• Sputum examination in X – Ray +ve cases -

Phase III • Survey by such procedure is less costly, less

laborious & more purposeful

298












MATCHED RANDOM SAMPLING

A method of assigning participants to groups in which pairs

of participants are first matched on some characteristic and then individually assigned randomly to groups.

• The Procedure for Matched random sampling can be briefed with the following contexts,

• Two samples in which the members are clearly paired, or are matched explicitly by the researcher. For example, IQ measurements or pairs of identical twins.

• Those samples in which the same attribute, or variable, is measured twice on each subject, under different circumstances. Commonly called repeated measures.

• Examples include the times of a group of athletes for 1500m before and after a week of special training; the milk yields of cows before and after being fed a particular diet.

299












QUOTA SAMPLING

• The population is first segmented into mutually exclusive sub-groups, just as in stratified sampling.

• Then judgment used to select subjects or units from each segment based on a specified proportion.

• For example, an interviewer may be told to sample 200 females and 300 males between the age of 45 and 60.

• It is this second step which makes the technique one of non-probability sampling.

• In quota sampling the selection of the sample is non-random.

• For example interviewers might be tempted to interview those who look most helpful. The problem is that these samples may be biased because not everyone gets a chance of selection. This random element is its greatest weakness and quota versus probability has been a matter of controversy for many years

300












http://en.wikipedia.org/wiki/Mutually_exclusive

http://en.wikipedia.org/wiki/Stratified_sampling

http://en.wikipedia.org/wiki/Random

http://en.wikipedia.org/wiki/Biased_samples

CONVENIENCE SAMPLING

• Sometimes known as grab or opportunity sampling or accidental or haphazard sampling.

• A type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand. That is, readily available and convenient.

• The researcher using such a sample cannot scientifically make generalizations about the total population from this sample because it would not be representative enough.

• For example, if the interviewer was to conduct a survey at a shopping center early in the morning on a given day, the people that he/she could interview would be limited to those given there at that given time, which would not represent the views of other members of society in such an area, if the survey was to be conducted at different times of day and several times per week.

• This type of sampling is most useful for pilot testing. • In social science research, snowball sampling is a similar technique,

where existing study subjects are used to recruit more subjects into the sample.

301












http://en.wikipedia.org/wiki/Snowball_sampling

CONVENIENCE SAMPLING…….

– Use results that are easy to get

302302












Judgmental sampling or Purposive sampling

• - The researcher chooses the sample based on who they think would be appropriate for the study. This is used primarily when there is a limited number of people that have expertise in the area being researched

303












PANEL SAMPLING

• Method of first selecting a group of participants through a random sampling method and then asking that group for the same information again several times over a period of time.

• Therefore, each participant is given same survey or interview at two or more time points; each period of data collection called a "wave".

• This sampling methodology often chosen for large scale or nation-wide studies in order to gauge changes in the population with regard to any number of variables from chronic illness to job stress to weekly food expenditures.

• Panel sampling can also be used to inform researchers about within-person health changes due to age or help explain changes in continuous dependent variables such as spousal interaction.

• There have been several proposed methods of analyzing panel sample data, including growth curves.

304












Result from survey is never

exactly the same as

the actual value in the population

WHY?

Components of total error

0% 100%

True population

value50%

Pointestimate

from survey40%

Total error

Nonsamplingbias

Sampling bias

Samplingerror

Prevalence

Nonsampling bias

• Is present even if sampling and analysis done correctly

• Would still be present if survey measured outcome in ENTIRE sampling frame

In sum, you have either sampled the wrong people or screwed up your measurements!

Nonsampling bias

• Types:– Sampling frame is not equal to population to which

you want to generalize (sampling universe)• Sampling frame out of date• Non-response among sampling units in sampling frame

– Measurement error• Tape incorrectly fixed to height board• Scale consistently reads low by 0.5 kg• Failure to remove heavy clothing before weighing• Misleading questions• Recall bias

Nonsampling bias

Source of bias• Sampling frame out of

date

• Non-response

• Measurement error

Prevention or cure• Use current sampling frame• Limit generalizations

• Minimize non-response• Use various statistical

methods to weight data

• Standardize instruments• Write clear & simple

questions• Train survey workers• Supervise survey workers

Sampling bias

• Selection of nonrepresentative sample, i.e., the likelihood of selection not equal for each sampling unit

• Failure to weight analysis of unequal probability sampleIn sum, you have not sampled people with equal probability and you have not accounted for this

in your analysis!

Sampling bias

• Examples– Nonrepresentative sample

• Selecting youngest child in household• Choosing households close to the road• Using a different sampling fraction in different

provinces

– Failure to do statistical weighting

Sampling bias

Source of biasNonrepresentative sampling

Failure to do weighting

ALWAYS ask yourself "Will this choice enhance representativeness or reduce it"?

Calculate the probabilities of selection

Apply appropriate statistical weights if selection probabilities unequal

Prevention or cure

Sampling error

• Difference between survey result and population value due to random selection of sample

• Influenced by:– Sample size– Sampling schemeUnlike nonsampling bias and sampling bias, it

can be predicted, calculated, and accounted for.

Sampling error

• Measures of sampling error:– Confidence limits– Standard error– Coefficient of variance– P values– Others

• Use these measures to:– Calculate sample size prior to sampling– Determine how sure we are of result after

analysis

Bias and sampling error

Non sampling biasSampling bias

Sampling error

Bias

Sampling error

In sum…

Bias• Includes nonsampling bias and sampling bias• Is due to mistakes which can be avoided• Cannot be precisely measured• Control and prevention requires careful attention

Sampling error• Is unavoidable if sampling < 100% of population• Can be controlled by selecting appropriate sample size

and sampling method• Can be precisely calculated after-the-fact

Introduction to Data Analysis

• Data Measurement• Measurement of the data is the first step in the process that ultimately

guides the final analysis.

• Consideration of sampling, controls, errors (random and systematic) and the required precision all influence the final analysis.

• Validation: Instruments and methods used to measure the data must be validated for accuracy.

• Precision and accuracy…Determination of error• Social vs. Physical Sciences


• Types of data• Univariate/Multivariate

• Univariate: When we use one variable to describe a person, place, or thing.

• Multivariate: When we use two or more variables to measure a person, place or thing. Variables may or may not be dependent on each other.

• Cross-sectional data/Time-ordered data (business, social sciences)• Cross-Sectional: Measurements taken at one time period• Time-Ordered: Measurements taken over time in chronological

sequence.

The type of data will dictate (in part) the appropriate data-analysis method.

• Measurement Scales• Nominal or Categorical Scale

• Classification of people, places, or things into categories (e.g. age ranges, colors, etc.).

• Classifications must be mutually exclusive (every element should belong to one category with no ambiguity).

• Weakest of the four scales. No category is greater than or less (better or worse) than the others. They are just different.

• Ordinal or Ranking Scale• Classification of people, places, or things into a ranking such that

the data is arranged into a meaningful order (e.g. poor, fair, good, excellent).

• Qualitative classification only



• Measurement Scales (business, social sciences)• Interval Scale

• Data classified by ranking.• Quantitative classification (time, temperature, etc).• Zero point of scale is arbitrary (differences are meaningful).

• Ratio Scale • Data classified as the ratio of two numbers.• Quantitative classification (height, weight, distance, etc).• Zero point of scale is real (data can be added, subtracted,

multiplied, and divided).

Univariate Analysis/Descriptive Statistics

• Descriptive Statistics– The Range– Min/Max– Average– Median– Mode– Variance– Standard Deviation– Histograms and Normal Distributions

Univariate Analysis/Histograms

• Distributions– Descriptive statistics are easier to interpret when

graphically illustrated.– However, charting each data element can lead to very

busy and confusing charts that do not help interpret the data.

– Grouping the data elements into categories and charting the frequency within these categories yields a graphical illustration of how the data is distributed throughout its range.


0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

X-axis labels

Da

ta V

alu

es

With just a few columns this chart is difficult to interpret. It tells you very little about the data set. Even finding the Min and Max can be difficult.

The data can be presented such that more statistical parameters can be estimated from the chart (average, standard deviation).


• Frequency Table– The first step is to decide on the categories and group

the data appropriately.

(45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100)

Category Labels Frequency

0-50 3

51-60 2

61-70 6

71-80 5

81-90 3

>90 1


• Histogram– A histogram is simply a column chart of the frequency

table.

Category Labels Frequency

0-50 3

51-60 2

61-70 6

71-80 5

81-90 3

>90 10

1

2

3

4

5

6

7

0-50 51-60 61-70 71-80 81-90 >90

Scores

Fre

qu

en

cy


• Histogram

0

1

2

3

4

5

6

7

0-50 51-60 61-70 71-80 81-90 >90

Scores

Fre

qu

en

cy

Average (68.6) and Median (68)

Mode (74)

-1SD

+1SD

0

0.02

0.04

0.06

0.08

0.1

0.12

25 45 65 85 105 125 145 165

Univariate Analysis/Normal Distributions

• Distributions that can be described mathematically as Gaussian are also called Normal

• The Bell curve– Symmetrical– Mean ≈ Median

Mean, Median, Mode

Univariate Analysis/Skewed Distributions

• When data are skewed, the mean and SD can be misleading

• Skewnesssk= 3(mean-median)/SDIf sk>|1| then distribution is

non-symetrical• Negatively skewed

– Mean<Median– Sk is negative

• Positively Skewed– Mean>Median– Sk is positive

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 20 40 60 80 100 120 140 160

0

0.02

0.04

0.06

0.08

0.1

0.12

25 45 65 85 105 125 145 165 185 205 225

Central Limit Theorem

• Regardless of the shape of a distribution, the distribution of the sample mean based on samples of size N approaches a normal curve as N increases.– N must be less than the entire sample

N=10


• The Range– Difference between minimum and maximum

values in a data set– Larger range usually (but not always)

indicates a large spread or deviation in the values of the data set.

(73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100)


• The Average (Mean)– Sum of all values divided by the number of values in the data set.– One measure of central location in the data set.

Average =

Average=(73+66+69+67+49+60+81+71+78+62+53+87+74+65+74+50+85+45+63+100)/20 = 68.6

Excel function: AVERAGE()

N

i

imN 1

1


0 2.5 7.5 10

4.8

0 2.5 7.5 10

4.8

The data may or may not be symmetrical around its average value


• The Median– The middle value in a sorted data set. Half the values

are greater and half are less than the median.– Another measure of central location in the data set.(45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74,

78, 81, 85, 87, 100)Median: 68

(1, 2, 4, 7, 8, 9, 9)

– Excel function: MEDIAN()


• The Median– May or may not be close to the mean.– Combination of mean and median are used to define

the skewness of a distribution.

0 2.5 7.5 10

6.25


• The Mode– Most frequently occurring value.– Another measure of central location in the data set.– (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74,

74, 78, 81, 85, 87, 100)– Mode: 74

– Generally not all that meaningful unless a larger percentage of the values are the same number.


• Variance– One measure of dispersion (deviation from the mean) of a data

set. The larger the variance, the greater is the average deviation of each datum from the average value.

m

mmN

N

ii

2

1

)(1

Variance =

Average value of the data set

Variance = [(45 – 68.6)2 + (49 – 68.6)2 + (50 – 68.6)2 + (53 – 68.6)2 + …]/20 = 181

Excel Functions: VARP(), VAR()


• Standard Deviation– Square root of the variance. Can be thought of as the

average deviation from the mean of a data set.– The magnitude of the number is more in line with the

values in the data set.

Standard Deviation = ([(45 – 68.6)2 + (49 – 68.6)2 + (50 – 68.6)2 + (53 – 68.6)2 + …]/20)1/2 = 13.5

Excel Functions: STDEVP(), STDEV()

Multivariate Analysis

• Many statistical techniques focus on just one or two variables

• Multivariate analysis (MVA) techniques allow more than two variables to be analysed at once– Multiple regression is not typically included

under this heading, but can be thought of as a multivariate analysis

Outline of Lectures

• We will cover– Why MVA is useful and important

• Simpson’s Paradox

– Some commonly used techniques• Principal components• Cluster analysis• Correspondence analysis• Others if time permits

– Market segmentation methods– An overview of MVA methods and their niches

Simpson’s Paradox

• Example: 44% of male applicants are admitted by a university, but only 33% of female applicants

• Does this mean there is unfair discrimination?

• University investigates and breaks down figures for Engineering and English programmes

Male Female

Accept 35 20

Refuse entry

45 40

Total 80 60

Simpson’s Paradox• No relationship between sex

and acceptance for either programme– So no evidence of

discrimination• Why?

– More females apply for the English programme, but it it hard to get into

– More males applied to Engineering, which has a higher acceptance rate than English

• Must look deeper than single cross-tab to find this out

Engineer-ing

Male Female

Accept 30 10

Refuse entry

30 10

Total 60 20

English Male Female

Accept 5 10

Refuse entry

15 30

Total 20 40

Another Example

• A study of graduates’ salaries showed negative association between economists’ starting salary and the level of the degree– i.e. PhDs earned less than Masters degree holders,

who in turn earned less than those with just a Bachelor’s degree

– Why?• The data was split into three employment

sectors– Teaching, government and private industry– Each sector showed a positive relationship– Employer type was confounded with degree level

Simpson’s Paradox

• In each of these examples, the bivariate analysis (cross-tabulation or correlation) gave misleading results

• Introducing another variable gave a better understanding of the data– It even reversed the initial conclusions

Many Variables

• Commonly have many relevant variables in market research surveys– E.g. one not atypical survey had ~2000 variables– Typically researchers pore over many crosstabs– However it can be difficult to make sense of these,

and the crosstabs may be misleading• MVA can help summarise the data

– E.g. factor analysis and segmentation based on agreement ratings on 20 attitude statements

• MVA can also reduce the chance of obtaining spurious results

Multivariate Analysis Methods

• Two general types of MVA technique– Analysis of dependence

• Where one (or more) variables are dependent variables, to be explained or predicted by others– E.g. Multiple regression, PLS, MDA

– Analysis of interdependence• No variables thought of as “dependent”• Look at the relationships among variables, objects

or cases– E.g. cluster analysis, factor analysis

Principal Components

• Identify underlying dimensions or principal components of a distribution

• Helps understand the joint or common variation among a set of variables

• Probably the most commonly used method of deriving “factors” in factor analysis (before rotation)

Principal Components

• The first principal component is identified as the vector (or equivalently the linear combination of variables) on which the most data variation can be projected

• The 2nd principal component is a vector perpendicular to the first, chosen so that it contains as much of the remaining variation as possible

• And so on for the 3rd principal component, the 4th, the 5th etc.

Principal Components - Examples

• Ellipse, ellipsoid, sphere• Rugby ball• Pen• Frying pan• Banana• CD• Book

Multivariate Normal Distribution

• Generalisation of the univariate normal• Determined by the mean (vector) and

covariance matrix

• E.g. Standard bivariate normal ,~ NX

22

22

2

1)(,,0,0~

yx

expINX

Example – Crime Rates by State

The PRINCOMP Procedure

Observations

50

Variables 7

Simple Statistics

Murder Rape Robbery Assault Burglary Larceny

Auto_Theft

Mean

7.444000000

25.73400000

124.0920000

211.30000001291.90400

02671.28800

0377.5260000

StD 3.866768941

10.75962995

88.3485672100.253049

2432.455711 725.908707 193.3944175

Crime Rates per 100,000 Population by State

Obs

StateMurde

rRape

Robbery

Assault

Burglary

Larceny

Auto_Theft

1 Alabama 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7

2 Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3

3 Arizona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5

4 Arkansas 8.8 27.6 83.2 203.4 972.6 1862.1 183.4

5 California

11.5 49.4 287.0 358.0 2139.4 3499.8 663.5

… … ... ... ... ... ... ... ...

Correlation Matrix

Murde

r RapeRobber

yAssaul

tBurglar

yLarcen

yAuto_Thef

t

Murder 1.00000.601

20.4837 0.6486 0.3858 0.1019 0.0688

Rape 0.60121.000

00.5919 0.7403 0.7121 0.6140 0.3489

Robbery 0.48370.591

91.0000 0.5571 0.6372 0.4467 0.5907

Assault 0.64860.740

30.5571 1.0000 0.6229 0.4044 0.2758

Burglary 0.38580.712

10.6372 0.6229 1.0000 0.7921 0.5580

Larceny 0.10190.614

00.4467 0.4044 0.7921 1.0000 0.4442

Auto_Theft

0.06880.348

90.5907 0.2758 0.5580 0.4442 1.0000

Eigenvalues of the Correlation Matrix

Eigenvalue

Difference

Proportion

Cumulative

1 4.11495951 2.87623768 0.5879 0.5879

2 1.23872183 0.51290521 0.1770 0.7648

3 0.72581663 0.40938458 0.1037 0.8685

4 0.31643205 0.05845759 0.0452 0.9137

5 0.25797446 0.03593499 0.0369 0.9506

6 0.22203947 0.09798342 0.0317 0.9823

7 0.12405606 0.0177 1.0000

Eigenvectors

Prin1 Prin2 Prin3 Prin4 Prin5 Prin6 Prin7

Murder 0.300279 -.629174 0.178245 -.232114 0.538123 0.259117 0.267593

Rape 0.431759 -.169435 -.244198 0.062216 0.188471 -.773271 -.296485

Robbery 0.396875 0.042247 0.495861 -.557989 -.519977 -.114385 -.003903

Assault 0.396652 -.343528 -.069510 0.629804 -.506651 0.172363 0.191745

Burglary 0.440157 0.203341 -.209895 -.057555 0.101033 0.535987 -.648117

Larceny 0.357360 0.402319 -.539231 -.234890 0.030099 0.039406 0.601690

Auto_Theft 0.295177 0.502421 0.568384 0.419238 0.369753 -.057298 0.147046

• 2-3 components explain 76%-87% of the variance• First principal component has uniform variable

weights, so is a general crime level indicator• Second principal component appears to contrast

violent versus property crimes• Third component is harder to interpret

Cluster Analysis

• Techniques for identifying separate groups of similar cases– Similarity of cases is either specified directly

in a distance matrix, or defined in terms of some distance function

• Also used to summarise data by defining segments of similar cases in the data– This use of cluster analysis is known as

“dissection”

Clustering Techniques

• Two main types of cluster analysis methods– Hierarchical cluster analysis

• Each cluster (starting with the whole dataset) is divided into two, then divided again, and so on

– Iterative methods• k-means clustering (PROC FASTCLUS)• Analogous non-parametric density estimation method

– Also other methods• Overlapping clusters• Fuzzy clusters

Applications

• Market segmentation is usually conducted using some form of cluster analysis to divide people into segments– Other methods such as latent class models or

archetypal analysis are sometimes used instead

• It is also possible to cluster other items such as products/SKUs, image attributes, brands

Tandem Segmentation

• One general method is to conduct a factor analysis, followed by a cluster analysis

• This approach has been criticised for losing information and not yielding as much discrimination as cluster analysis alone

• However it can make it easier to design the distance function, and to interpret the results

Tandem k-means Exampleproc factor data=datafile n=6 rotate=varimax round reorder flag=.54 scree out=scores; var reasons1-reasons15 usage1-usage10;run;

proc fastclus data=scores maxc=4 seed=109162319 maxiter=50; var factor1-factor6;run;

• Have used the default unweighted Euclidean distance function, which is not sensible in every context

• Also note that k-means results depend on the initial cluster centroids (determined here by the seed)

• Typically k-means is very prone to local maxima– Run at least 20 times to ensure reasonable maximum

Selected Outputs

19th run of 5 segments Cluster Summary Maximum Distance RMS Std from Seed Nearest Distance Between Cluster Frequency Deviation to Observation Cluster Cluster Centroids ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 433 0.9010 4.5524 4 2.0325 2 471 0.8487 4.5902 4 1.8959 3 505 0.9080 5.3159 4 2.0486 4 870 0.6982 4.2724 2 1.8959 5 433 0.9300 4.9425 4 2.0308

Selected Outputs

19th run of 5 segments

FASTCLUS Procedure: Replace=RANDOM Radius=0 Maxclusters=5 Maxiter=100 Converge=0.02

Statistics for Variables Variable Total STD Within STD R-Squared RSQ/(1-RSQ) ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FACTOR1 1.000000 0.788183 0.379684 0.612082 FACTOR2 1.000000 0.893187 0.203395 0.255327 FACTOR3 1.000000 0.809710 0.345337 0.527503 FACTOR4 1.000000 0.733956 0.462104 0.859095 FACTOR5 1.000000 0.948424 0.101820 0.113363 FACTOR6 1.000000 0.838418 0.298092 0.424689 OVER-ALL 1.000000 0.838231 0.298405 0.425324

Pseudo F Statistic = 287.84 Approximate Expected Over-All R-Squared = 0.37027 Cubic Clustering Criterion = -26.135 WARNING: The two above values are invalid for correlated variables.

Selected Outputs


Cluster Means

Cluster FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 -0.17151 0.86945 -0.06349 0.08168 0.14407 1.17640 2 -0.96441 -0.62497 -0.02967 0.67086 -0.44314 0.05906 3 -0.41435 0.09450 0.15077 -1.34799 -0.23659 -0.35995 4 0.39794 -0.00661 0.56672 0.37168 0.39152 -0.40369 5 0.90424 -0.28657 -1.21874 0.01393 -0.17278 -0.00972

Cluster Standard Deviations

Cluster FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 0.95604 0.79061 0.95515 0.81100 1.08437 0.76555 2 0.79216 0.97414 0.88440 0.71032 0.88449 0.82223 3 0.89084 0.98873 0.90514 0.74950 0.92269 0.97107 4 0.59849 0.74758 0.56576 0.58258 0.89372 0.74160 5 0.80602 1.03771 0.86331 0.91149 1.00476 0.93635

Cluster Analysis Options• There are several choices of how to form clusters in

hierarchical cluster analysis– Single linkage– Average linkage– Density linkage– Ward’s method– Many others

• Ward’s method (like k-means) tends to form equal sized, roundish clusters

• Average linkage generally forms roundish clusters with equal variance

• Density linkage can identify clusters of different shapes

FASTCLUS

Density Linkage

Cluster Analysis Issues• Distance definition

– Weighted Euclidean distance often works well, if weights are chosen intelligently

• Cluster shape– Shape of clusters found is determined by method, so choose method

appropriately• Hierarchical methods usually take more computation time than k-

means• However multiple runs are more important for k-means, since it can

be badly affected by local minima• Adjusting for response styles can also be worthwhile

– Some people give more positive responses overall than others– Clusters may simply reflect these response styles unless this is adjusted

for, e.g. by standardising responses across attributes for each respondent

MVA - FASTCLUS

• PROC FASTCLUS in SAS tries to minimise the root mean square difference between the data points and their corresponding cluster means– Iterates until convergence is reached on this criterion– However it often reaches a local minimum– Can be useful to run many times with different seeds

and choose the best set of clusters based on this RMS criterion

• See http://www.clustan.com/k-means_critique.html for more k-means issues

http://www.clustan.com/k-means_critique.html

Iteration History from FASTCLUS

Relative Change in Cluster Seeds Iteration Criterion 1 2 3 4 5 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 0.9645 1.0436 0.7366 0.6440 0.6343 0.5666 2 0.8596 0.3549 0.1727 0.1227 0.1246 0.0731 3 0.8499 0.2091 0.1047 0.1047 0.0656 0.0584 4 0.8454 0.1534 0.0701 0.0785 0.0276 0.0439 5 0.8430 0.1153 0.0640 0.0727 0.0331 0.0276 6 0.8414 0.0878 0.0613 0.0488 0.0253 0.0327 7 0.8402 0.0840 0.0547 0.0522 0.0249 0.0340 8 0.8392 0.0657 0.0396 0.0440 0.0188 0.0286 9 0.8386 0.0429 0.0267 0.0324 0.0149 0.0223 10 0.8383 0.0197 0.0139 0.0170 0.0119 0.0173

Convergence criterion is satisfied.

Criterion Based on Final Seeds = 0.83824

Results from Different Initial Seeds


Cluster Means

Cluster FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 -0.17151 0.86945 -0.06349 0.08168 0.14407 1.17640 2 -0.96441 -0.62497 -0.02967 0.67086 -0.44314 0.05906 3 -0.41435 0.09450 0.15077 -1.34799 -0.23659 -0.35995 4 0.39794 -0.00661 0.56672 0.37168 0.39152 -0.40369 5 0.90424 -0.28657 -1.21874 0.01393 -0.17278 -0.00972


Cluster Means

Cluster FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 0.08281 -0.76563 0.48252 -0.51242 -0.55281 0.64635 2 0.39409 0.00337 0.54491 0.38299 0.64039 -0.26904 3 -0.12413 0.30691 -0.36373 -0.85776 -0.31476 -0.94927 4 0.63249 0.42335 -1.27301 0.18563 0.15973 0.77637 5 -1.20912 0.21018 -0.07423 0.75704 -0.26377 0.13729

Howard-Harris Approach• Provides automatic approach to choosing seeds for k-

means clustering• Chooses initial seeds by fixed procedure

– Takes variable with highest variance, splits the data at the mean, and calculates centroids of the resulting two groups

– Applies k-means with these centroids as initial seeds– This yields a 2 cluster solution– Choose the cluster with the higher within-cluster variance– Choose the variable with the highest variance within that cluster,

split the cluster as above, and repeat to give a 3 cluster solution– Repeat until have reached a set number of clusters

• I believe this approach is used by the ESPRI software package (after variables are standardised by their range)

Another “Clustering” Method• One alternative approach to identifying clusters is to fit a

finite mixture model– Assume the overall distribution is a mixture of several normal

distributions– Typically this model is fit using some variant of the EM algorithm

• E.g. weka.clusterers.EM method in WEKA data mining package• See WEKA tutorial for an example using Fisher’s iris data

• Advantages of this method include:– Probability model allows for statistical tests– Handles missing data within model fitting process– Can extend this approach to define clusters based on model

parameters, e.g. regression coefficients• Also known as latent class modeling

Cluster MeansCluster 1 Cluster 2 Cluster 3 Cluster 4

Reason 1 4.55 2.65 4.21 4.50

Reason 2 4.32 4.32 4.12 4.02

Reason 3 4.43 3.28 3.90 4.06

Reason 4 3.85 3.89 2.15 3.35

Reason 5 4.10 3.77 2.19 3.80

Reason 6 4.50 4.57 4.09 4.28

Reason 7 3.93 4.10 1.94 3.66

Reason 8 4.09 3.17 2.30 3.77

Reason 9 4.17 4.27 3.51 3.82

Reason 10 4.12 3.75 2.66 3.47

Reason 11 4.58 3.79 3.84 4.37

Reason 12 3.51 2.78 1.86 2.60

Reason 13 4.14 3.95 3.06 3.45

Reason 14 3.96 3.75 2.06 3.83

Reason 15 4.19 2.42 2.93 4.04

=max. =min.

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Usage 1 3.43 3.66 3.48 4.00

Usage 2 3.91 3.94 3.86 4.26

Usage 3 3.07 2.95 2.61 3.13

Usage 4 3.85 3.02 2.62 2.50

Usage 5 3.86 3.55 3.52 3.56

Usage 6 3.87 4.25 4.14 4.56

Usage 7 3.88 3.29 2.78 2.59

Usage 8 3.71 2.88 2.58 2.34

Usage 9 4.09 3.38 3.19 2.68

Usage 10 4.58 4.26 4.00 3.91

Cluster Means=max. =min.

Cluster Means

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Usage 1 3.43 3.66 3.48 4.00

Usage 2 3.91 3.94 3.86 4.26

Usage 3 3.07 2.95 2.61 3.13

Usage 4 3.85 3.02 2.62 2.50

Usage 5 3.86 3.55 3.52 3.56

Usage 6 3.87 4.25 4.14 4.56

Usage 7 3.88 3.29 2.78 2.59

Usage 8 3.71 2.88 2.58 2.34

Usage 9 4.09 3.38 3.19 2.68

Usage 10 4.58 4.26 4.00 3.91

Correspondence Analysis

• Provides a graphical summary of the interactions in a table

• Also known as a perceptual map– But so are many other charts

• Can be very useful– E.g. to provide overview of cluster results

• However the correct interpretation is less than intuitive, and this leads many researchers astray

Reason 1

Reason 2

Reason 3

Reason 4

Reason 5

Reason 6

Reason 7

Reason 8

Reason 9

Reason 10

Reason 11

Reason 12

Reason 13

Reason 14

Reason 15

Usage 1

Usage 2

Usage 3

Usage 4

Usage 5

Usage 6

Usage 7Usage 8

Usage 9

Usage 10

Cluster 1

Cluster 2

Cluster 3

Cluster 4

25.3%

53.8%

2D Fit = 79.1%

Four Clusters (imputed, normalised)

= Correlation < 0.50

Interpretation

• Correspondence analysis plots should be interpreted by looking at points relative to the origin– Points that are in similar directions are positively

associated– Points that are on opposite sides of the origin are

negatively associated– Points that are far from the origin exhibit the strongest

associations• Also the results reflect relative associations, not

just which rows are highest or lowest overall

Software for Correspondence Analysis

• Earlier chart was created using a specialised package called BRANDMAP

• Can also do correspondence analysis in most major statistical packages

• For example, using PROC CORRESP in SAS:

*---Perform Simple Correspondence Analysis—Example 1 in SAS OnlineDoc; proc corresp all data=Cars outc=Coor; tables Marital, Origin; run;

*---Plot the Simple Correspondence Analysis Results---; %plotit(data=Coor, datatype=corresp)

Cars by Marital Status

Canonical Discriminant Analysis

• Predicts a discrete response from continuous predictor variables

• Aims to determine which of g groups each respondent belongs to, based on the predictors

• Finds the linear combination of the predictors with the highest correlation with group membership– Called the first canonical variate

• Repeat to find further canonical variates that are uncorrelated with the previous ones– Produces maximum of g-1 canonical variates

CDA Plot

Canonical Var 1

Canonical Var 2

Discriminant Analysis

• Discriminant analysis also refers to a wider family of techniques– Still for discrete response, continuous

predictors– Produces discriminant functions that classify

observations into groups• These can be linear or quadratic functions• Can also be based on non-parametric techniques

– Often train on one dataset, then test on another

CHAID

• Chi-squared Automatic Interaction Detection• For discrete response and many discrete

predictors– Common situation in market research

• Produces a tree structure– Nodes get purer, more different from each other

• Uses a chi-squared test statistic to determine best variable to split on at each node– Also tries various ways of merging categories, making

a Bonferroni adjustment for multiple tests– Stops when no more “statistically significant” splits

can be found

Example of CHAID Output

Titanic Survival Example• Adults (20%)• /• /• Men• / \• / \• / Children (45%)• /• All passengers• \• \ 3rd class or crew (46%)• \ /• \ /• Women• \• \• 1st or 2nd class passenger (93%)

CHAID Software

• Available in SAS Enterprise Miner (if you have enough money)– Was provided as a free macro until SAS decided to

market it as a data mining technique– TREEDISC.SAS – still available on the web, although

apparently not on the SAS web site• Also implemented in at least one standalone

package• Developed in 1970s• Other tree-based techniques available

– Will discuss these later

TREEDISC Macro

%treedisc(data=survey2, depvar=bs,

nominal=c o p q x ae af ag ai: aj al am ao ap aw bf_1 bf_2 ck cn:,

ordinal=lifestag t u v w y ab ah ak,

ordfloat=ac ad an aq ar as av,

options=list noformat read,maxdepth=3,

trace=medium, draw=gr, leaf=50,

outtree=all);

• Need to specify type of each variable– Nominal, Ordinal, Ordinal with a floating value

Partial Least Squares (PLS)

• Multivariate generalisation of regression– Have model of form Y=XB+E– Also extract factors underlying the predictors– These are chosen to explain both the response

variation and the variation among predictors

• Results are often more powerful than principal components regression

• PLS also refers to a more general technique for fitting general path models, not discussed here

Structural Equation Modeling (SEM)

• General method for fitting and testing path analysis models, based on covariances

• Also known as LISREL• Implemented in SAS in PROC CALIS• Fits specified causal structures (path

models) that usually involve factors or latent variables– Confirmatory analysis

SEM Example:Relationship between

Academic and Job Success

SAS Code• data jobfl (type=cov);• input _type_ $ _name_ $ act cgpa

entry• salary promo;• cards;• n 500 500 500 500 500• cov act 1.024• cov cgpa 0.792 1.077• cov entry 0.567 0.537 0.852• cov salary 0.445 0.424 0.518 0.670• cov promo 0.434 0.389 0.475 0.545

0.716• ;

• proc calis data=jobfl cov stderr;• lineqs• act = 1*F1 + e1,• cgpa = p2f1*F1 + e2,• entry = p3f1*F1 + e3,• salary = 1*F2 + e4,• promo = p5f1*F2 + e5;• std• e1 = vare1,• e2 = vare2,• e3 = vare3,• e4 = vare4,• e5 = vare5,• F1 = varF1,• F2 = varF2;• cov• f1 f2 = covf1f2;• var act cgpa entry salary promo;• run;

Results

• All parameters are statistically significant, with a high correlation being found between the latent traits of academic and job success

• However the overall chi-squared value for the model is 111.3, with 4 d.f., so the model does not fit the observed covariances perfectly

Latent Variable Models

• Have seen that both latent trait and latent class models can be useful– Latent traits for factor analysis and SEM– Latent class for probabilistic segmentation

• Mplus software can now fit combined latent trait and latent class models– Appears very powerful– Subsumes a wide range of multivariate

analyses

Broader MVA Issues

• Preliminaries– EDA is usually very worthwhile

• Univariate summaries, e.g. histograms• Scatterplot matrix• Multivariate profiles, spider-web plots

– Missing data• Establish amount (by variable, and overall) and pattern

(across individuals)• Think about reasons for missing data• Treat missing data appropriately – e.g. impute, or build into

model fitting

MVA Issues

• Preliminaries (continued)– Check for outliers

• Large values of Mahalonobis’ D2

• Testing results– Some methods provide statistical tests– But others do not

• Cross-validation gives a useful check on the results– Leave-1-out cross-validation– Split-sample training and test datasets

» Sometimes 3 groups needed» For model building, training and testing

market research mb mk 02 - mba - iii sem, uptu syllabus

Education