measurement in market reasearch

8/4/2019 Measurement in Market Reasearch

1/39

9-1

Business Research Methods

Measurement and Scaling:

Noncomparative ScalingTechniques


2/39

9-2

Noncomparative Scaling

Techniques Respondents evaluate only one object at a time, and for

this reason non comparative scales are often referred to

as monadic scales.

Noncomparative techniques consist ofcontinuous and

itemized rating scales.


3/39

9-3

Continuous Rating ScaleRespondents rate the objects by placing a mark at the appropriate position

on a line that runs from one extreme of the criterion variable to the other.

The form of the continuous scale may vary considerably.

How would you rate Sears as a department store?

Version 1

Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Probably the best

Version 2

Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - Probably the best

0 10 20 30 40 50 60 70 80 90 100

Version 3

Very bad Neither good Very good

nor bad

Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -Probably the best

0 10 20 30 40 50 60 70 80 90 100


4/39

9-4

Itemized Rating Scales

The respondents are provided with a scale that has a

number or brief description associated with each

category.

The categories are ordered in terms of scale position, and

the respondents are required to select the specified

category that best describes the object being rated.

The commonly used itemized rating scales are the Likert,

semantic differential, and Stapel scales.


5/39

9-5

Likert ScaleThe Likert scale requires the respondents to indicate a degree of agreement or

disagreement with each of a series of statements about the stimulus objects.

Strongly Disagree Neither Agree Strongly

disagree agree nor agree

disagree

1. Sears sells high quality merchandise. 1 2X 3 4 5

2. Sears has poor in-store service. 1 2X 3 4 5

3. I like to shop at Sears. 1 2 3X 4 5

The analysis can be conducted on an item-by-item basis (profile analysis), or atotal (summated) score can be calculated.

When arriving at a total score, the categories assigned to the negativestatements by the respondents should be scored by reversing the scale.


6/39

9-6

Semantic Differential Scale

The semantic differential is a seven-point rating scale with end

points associated with bipolar labels that have semantic meaning.

SEARS IS:

Powerful --:--:--:--:-X-:--:--: WeakUnreliable --:--:--:--:--:-X-:--:

Reliable

Modern --:--:--:--:--:--:-X-: Old-fashioned

The negative adjective or phrase sometimes appears at the left sideof the scale and sometimes at the right.

This controls the tendency of some respondents, particularly thosewith ver ositive or ver ne ative attitudes to mark the ri ht- or le t-

A S ti Diff ti l S l f M i


7/39

9-7A Semantic Differential Scale for Measuring

Self- Concepts, Person Concepts, and

Product Concepts

1) Rugged :---:---:---:---:---:---:---: Delicate

2) Excitable :---:---:---:---:---:---:---: Calm

3) Uncomfortable :---:---:---:---:---:---:---: Comfortable

4) Dominating :---:---:---:---:---:---:---: Submissive

5) Thrifty :---:---:---:---:---:---:---: Indulgent

6) Pleasant :---:---:---:---:---:---:---: Unpleasant

7) Contemporary :---:---:---:---:---:---:---: Obsolete

8) Organized :---:---:---:---:---:---:---: Unorganized

9) Rational :---:---:---:---:---:---:---: Emotional

10) Youthful :---:---:---:---:---:---:---: Mature


8/39

9-8

Stapel ScaleThe Stapel scale is a unipolar rating scale with ten categories

numbered from -5 to +5, without a neutral point (zero). This scale

is usually presented vertically.

SEARS

+5 +5

+4 +4+3 +3

+2 +2X

+1 +1

HIGH QUALITY POOR SERVICE

-1 -1

-2 -2

-3 -3

-4X -4

-5 -5

The data obtained by using a Stapel scale can be analyzed in the

same way as semantic differential data. It shows both intensity & direction


9/39

9-9

Scale BasicCharacteristics Examples Advantages DisadvantagesContinuousRatingScale

Place a mark on acontinuous line

Reaction toTV

commercials

Easy to construct Scoring can becumbersome

unlesscomputerized

Itemized RatingScales

Likert Scale Degrees ofagreement on a 1(strongly disagree)to 5 (strongly agree)

scale

Measurementof attitudes

Easy to construct,administer, and

understand

Moretime - consuming

SemanticDifferential

Seven - point scalewith bipolar labels

Brand,product, and

company

images

Versatile Controversy asto whether thedata are interval

StapelScale

Unipolar ten - pointscale, - 5 to +5,

without a neutralpoint (zero)

Measurementof attitudesand images

Easy to construct,administer over

telephone

Confusing anddifficult to apply

Basic Noncomparative Scales


10/39

9-10

Summary of Itemized Scale

Decisions1) Number of categories Although there is no single, optimal number,traditional guidelines suggest that there

should be between five and nine categories

2) Balanced vs. unbalanced In general, the scale should be balanced toobtain objective data

3) Odd/even no. of categories If a neutral or indifferent scale response ispossible from at least some of the respondents,an odd number of categories should be used

4) Forced vs. non-forced In situations where the respondents areexpected to have no opinion, the accuracy ofthe data may be improved by a non-forced scale

5) Verbal description An argument can be made for labeling all or many scale categories. The categorydescriptions should be located as close to theresponse categories as possible

6)Physical form A number of options should be tried and thebest selected


11/39

9-11

Jovan Musk for Men is Jovan Musk for Men is

Extremely good Extremely goodVery good Very goodGood Good Bad Somewhat good

Very bad BadExtremely bad Very bad

Figure 9.1

Balanced and Unbalanced

Scales


12/39

9-12

A variety of scale configurations may be employed to measure the

gentleness of Cheer detergent. Some examples include:Cheer detergent is: 1) Very harsh --- --- --- --- --- --- --- Very gentle

2) Very harsh 1 2 3 4 5 6 7 Very gentle

3) . Very harsh

.

.

. Neither harsh nor gentle

.

.

. Very gentle

4) ____ ____ ____ ____ ____ ____ ____Very Harsh Somewhat Neither harsh Somewhat Gentle Very

harsh Harsh nor gentle gentle gentle

5)

Very Neither harsh Very

harsh nor gentle gentle

Rating Scale Configurations Figure 9.2

-3 -1 0 +1 +2-2 +3

Cheer

9 13


13/39

9-13

Thermometer Scale

Instructions: Please indicate how much you like McDonalds hamburgers by coloringin the thermometer. Start at the bottom and color up to the temperature level that best

indicates how strong your preference is.

Form:

Smiling Face Scale

Instructions: Please point to the face that shows how much you like the Barbie Doll. Ifyou do not like the Barbie Doll at all, you would point to Face 1. If you liked it verymuch, you would point to Face 5.

Form:

1 2 3 4 5

Figure 9.3

Like very

much

Dislike

very much

100

75

50

25

0

Some Unique Rating Scale

GRAPHIC

9 14


14/39

9-14

Thurstone Scale

It is a two stage procedure In the first stage researcher selects 80 to 100

items indicating different degrees offavourable attitude for concept under study

They are given to a group of judges to groupthem into favourable & disfavour able by

keeping equal intervals between categories All items that have consensus from judges

are selected & distributed uniformly on ascale of favourability

This scale is then administered torespondents to measure their attitude towardsa particular concept

It is time consuming & costly & is rarely usedin applied BR

9 15


15/39

9-15

In psychology, the Thurstone scale was the firstformal technique for measuring an attitude. Itwas developed by Louis Leon Thurstone in1928, as a means of measuring attitudestowards religion. It is made up of statementsabout a particular issue, and each statement hasa numerical value indicating how favorable orunfavorable it is judged to be. People check

each of the statements to which they agree, anda mean score is computed, indicating theirattitude.

9 16


16/39

9-16

Measurement AccuracyThe true score model provides a framework for

understanding the accuracy of measurement.

XO = XT + XS + XR

where

XO = the observed score or measurement

XT = the true score of the characteristic

XS = systematic error

XR = random error

9 17


17/39

9-17

Potential Sources of Error on

Measurement11) Other relatively stable characteristics of the individual that

influence the test score, such as intelligence, social desirability,and education.

2) Short-term or transient personal factors, such as health, emotions,and fatigue.

3) Situational factors, such as the presence of other people, noise,and distractions.

4) Sampling of items included in the scale: addition, deletion, orchanges in the scale items.

5) Lack of clarity of the scale, including the instructions or the items

themselves.6) Mechanical factors, such as poor printing, overcrowding items in

the questionnaire, and poor design.

7) Administration of the scale, such as differences amonginterviewers.

8) Analysis factors, such as differences in scoring and statisticalanalysis..

9 18


18/39

9-18

Criteria for evaluating measurement

The criteria for evaluating measurements

are Reliability

Validity

Sensitivity

Generalizability

Relevance

9 19


19/39

9-19

Reliability

The degree to which measures are freefrom random error and therefore yield

consistent results across time or

situations.Perfect reliability requires that there is

no random error

XR=0

9-20

V lidit


20/39

9-20

Validity

The ability of a scale to measure what

was intended to be measured.

Perfect validity requires that there is no

measurement error either systematic or

random.XR=o XS=0

9-21


21/39

9 21

Relationship between validity & reliability

If a measure is perfectly valid it is also

perfectly reliable

However if a measure is perfectly reliable

it may or may not be perfectly valid

If a measure is unreliable it will not be valid

Reliability is a necessary but not a

sufficient condition for validity

9-22


22/39

9 22

THE GOAL OF

MEASUREMENT:

VALIDITY and RELIABILITY

9-23


23/39

9 23

Reliability and Validity on Target

Old Rifle New Rifle New Rifle

Sunglare

Low Reliability High Reliability Reliable but Not

Valid

(Target A) (Target B) (Target C)

9-24


24/39

9 24

RELIABILITY

T E S T R

S T A B I L

E Q U I V A S P L I T T I

I N T E R N A

R E L I A B

Repeatability Of index measures

9-25Types of Reliability


25/39

9 25 Types of Reliability There are two dimensions of reliability:Repeatability & Internal

consistency

If the results of the research are the same even when it is

conducted second or third time it confirms repeatability aspect

Test-Retest Method: An approach for assessing reliability in

which respondents are administered identical sets of scale

items at two different times under as nearly equivalent

conditions as possible This measures repeatability since the same scale or measure

is administered to the same set of respondents at two

separate points. If the measure is stable over time , it should

obtain similar results.(40% satisfied with jobs both times) However it is difficult to locate all respondents for the second

round, their attitudes may change over time or the first

measure may sensitize the respondents

9-26

E i l t F M th d


26/39

9 6

Equivalent Forms Method

An approach to assess reliability that

requires two equivalent forms of scale to beconstructed &administered to the same

respondents at two different times

However it is difficult , time consuming &expensive to construct two equivalent forms

of scale

9-27

I t l C i t


27/39

Internal Consistency

This measure of reliability focuses on

internal consistency of the set of items

forming the scale.

It is used to assess reliability of a

summated scale where several items are

summed to form a total score .Each itemmeasures some aspect of the construct

and the items should be consistent in

what they indicate about thecharacteristics

9-28


28/39

Split half Method

Split half Method: It is a method of measuring

internal consistency reliability in which the itemsconstituting the scale are divided into two halvesand the resulting scores of two halves arecorrelated. High correlation indicates high

consistency However results will depend on how the scale

items are split

Coefficient alpha :A measure of internal

consistency reliability that is the average of allpossible split half coefficients resulting fromdifferent splitting of the scale items

9-29


29/39

Some multi item scales include several sets of items measuringdifferent dimensions of a multidimensional construct. Since thesedimensions are independent a measure of internal consistency

computed across dimensions would be inappropriate. so internalconsistency reliability can be computed for each dimension

Store image is a multidimensional construct that includes

--- Quality of goods,

--- variety of goods,---returns policy,

---service ,

----price,

----location,----layout

----billing & credit policy

9-30


30/39

F A C E O R

C O N C P R E D

C R I T E R I O C O N S T R

V a l i d i t y

Face Professional agreement that logically it appears valid.(Subjective)

Content-Depends on established theories for support(objective)Criterion Does it fit or correlate with other similarmeasure/constructs? Body Fat caliper, water displacement,electrical impedance, BMI.

Concurrent two measure, same timePredictive Two measures at diff. times.Construct - confirmed with network of hypotheses.Convergent(High relationship with similar concepts). and divergent ordiscriminant validit low relationship with dissimilar concepts .

9-31

F V lidit


31/39

Face Validity

Face Validity: Subjective agreement amongprofessionals that a scale logically appears toaccurately measure what it is intended tomeasure. Weakest form without any analysis

Face validity is concerned with how a

measure or procedure appears. Does it seemlike a reasonable way to gain the informationthe researchers are attempting to obtain?Does it seem well designed? Does it seem asthough it will work reliably? Unlike contentvalidity, face does not depend on establishedtheories for support

9-32

C t t V lidit


32/39

Content Validity Content Validity is based on the extent to which a

measurement reflects the specific intended domain

of content . Researchers aim to study mathematical learning and

create a survey to test for mathematical skill. Ifthese researchers only tested for multiplication and

then drew conclusions from that survey, their studywould not show content validity because it excludesother mathematical functions.

To measure adequacies of facilities in schools:

attractiveness of school name, frequency of oldstudents meet. eatables in the canteen not relevantvariables:

Number of classrooms, Number of qualifiedteachers, playground, liabrary- relevant variables

9-33


33/39

Criterion related Validity Criterion related validity, also referred to as

instrumental validity, is used to demonstrate theaccuracy of a measure or procedure by comparingit with another measure or procedure which hasbeen demonstrated to be valid.

For example, imagine a hands-on driving test hasbeen shown to be an accurate test of driving skills.By comparing the scores on the written driving testwith the scores from the hands-on driving test, thewritten test can be validated by using a criterionrelated strategy in which the hands-on driving test iscompared to the written test.

New measure correlates with criterion measure

9-34


34/39

Predictive Validity Predictive Validity. A type of criterion validity whereby a

new measure correlates with criterion measure administered

at a later time In order for a test to be a valid screening device for some

future behaviour, it must have predictive validity. The SAT

is used by college screening committees as one way to

predict college grades. The GMAT is used to predict success

in business .It measures predictive validity .

We determine predictive validity by computing a correlation

coefficient comparing SAT(New) scores, for example, andcollege grades (Criterion). If they are directly related, then

we can make a prediction regarding college grades based on

SAT score. We can show that students who score high on the

SAT tend to receive high grades in college.

9-35


35/39

Construct Validity

Construct validity seeks agreement between atheoretical concept and a specific measuring

device or procedure. For example, a researcherinventing a new IQ test might spend a great dealof time attempting to "define" intelligence in orderto reach an acceptable level of construct validity.

Construct validity can be broken down into twosub-categories: Convergent validity anddiscriminate validity. Convergent validity is theactual general agreement among ratings, wheremeasures should be theoretically related.

Discriminate validity is the lack of a relationshipamong measures which theoretically should notbe related

9-36


36/39

To measure: Tendency to stay in low cost hotels

Four personality variables: High level of selfconfidence, low need for status, low need fordistinctiveness, high level of adaptability

Not related to: brand loyalty, high level ofaggressiveness

The scale can be said to have construct if itcorrelates highly with other measures of tendencyto stay in low cost hotels: Reported hotelspatronised and social class (convergent)

Low correlation with the unrelated constructs ofbrand loyalty & high level of aggressiveness(Divergent)

9-37


37/39

SENSITIVITY

A measurement instruments ability to

accurately measure variability in stimuli or

responses.

Yes and no agree or disagree are not verysensitive

Strongly agree, mildly agree, indifferent, mildly

disagree, strongly disagree ,are categories whoseinclusion increases scales sensitivity

9-38


38/39

Generizability

It is the degree to which a study based on

a sample applies to a universe ofgeneralization

Universe of generalization includes set of

all conditions of measurement :items,interviewers, modes of data collection etc.

To generalize a scale developed for

personal interview to other modes of datacollection such as mail, telephone etc.

To generalize from a sample of items to

universe of items

9-39

R l


39/39

Relevance

It represents appropriateness of using a

particular scale for measuring a variable Relevance= Reliability x Validity

If either reliability or validity is low then the

scale will have little relevance

If correlation coefficient is used to analyse

both reliability & validity then the scale can

have relevance from 0 to 1.

measurement in market reasearch

Documents