a" research methods reliability and validity

Validity! We need to find out if our

research is sound. Do our tests

measure what they claim to measure?

Are techniques used to collect data in tests, questionnaires, interviews and observations measuring what is claimed? For example was the Strange Situation

really measuring attachment style?

We need to be able to measure

or observe something time after time and produce the

same or similar results

I want to measure intelligence. If the same person sits the test on several occasions and the results change each time, then that test

lacks reliability

The test also arguably lacks validity because the scores are meaningless

If I test my participants again several months later and their scores remains consistent, I can

say the test is reliable, but it might still lack validity.

Is an A level in Psychology a valid and reliable assessment of your performance in Psychology.

This measures consistency from one occasion to another – the same result should be found on different days, in different

labs , observations or interviews, by different researchers

I exposed these teenage brain cells to 1000 PowerPoint slides last

Monday and they’re all dead

I thought that was a fluke but they seem to be shrivelling after only five

minutes!

Participants take the same test on different occasions – a high correlation between test scores indicates the test has good external reliability .

Timing is crucial. Why?

January June

I hope that’s the right

answer this time

This refers to the consistency of a researcher’s behaviour.A researcher should produce similar test results, or make similar observations or

carry out interviews in the same way on more than one occasion.

Thanks for taking part today. Any

problems and I’ll be right over. Take

your time.

Right. Let’s get on. Fast as you can.

How much longer before I can get in the pub and relax

my facial muscles?

In observational studies this is known

as inter-observer reliability – observers have to agree on what they see and carry out the same procedure

Consistency between different researchers working on the some

study is very important for reliability

There should be a high positive

correlation between the

scores of different

observers

Improving Researcher Reliability1. Increase reliability by standardising

instructions2. Carry out a pilot study to improve

procedures and materials3. You will be thoroughly trained in the use of materials and procedures prior to our study taking place

This measures the extent to which a test or procedure is consistent within itself, i.e., questionnaire items or questions

in an interview should all be measuring the same thing

Do you like to keep to deadlines?Do you get impatient driving?

Do you like cheese?Do you like doing several tasks at once?

Do you like chocolate?Do you get easily irritated?

Are you competitive?

This interviewer seems a little confused about

Type A personality traits

Compares a participant’s performance on two halves of a test or questionnaire – there should be a close correlation between scores on both halves of the test.

Questions in both halves should be of equal quality for good internal reliability.

Odds/Evens Top/Bottom

Would you see this as bullying or horseplay in the playground?

You would see this from your own subjective

viewpoint – we’re biased by experience and

expectation

Observers must agree about what

they are observing – they need to use

standardised behavioural categories

Measuring Reliability Match the method of estimating reliability to the

description

Test-Retest

reliability

If the measure depends upon interpretation of behaviour, we can compare the results from two or more raters.

If the results in the two halves are similar, we can assume the test is reliable

Split Half

Reliability

Splitting a test into two halves, and comparing the scores in both halves

If the results on the two tests are similar, we can assume the test is reliable

Inter-Rater

reliability

The measure is administered to the same group of people twice

If there is high agreement between the raters, the measure is reliable

The tool is measuring what it is intending to measure

=

=The findings can be generalized

beyond the context of the research situation

Assessing and Improving Internal Validity

Does our measuring tool appear to be doing

what it should?

Face validity:

One or more judges assess whether the test seems appropriate and suggest changes if necessary

Improving Validity

Improving Internal Validity

Does the content of a test cover everything in

the area of interest?

Content validity:More rigorous –

experts in the field systematically examine the tool’s components

and compare them with set standards

They have to agree the content is appropriate

Improving internal validity

• Single blind procedure - reduces demand characteristics

• Double blind procedure ….

Population Validity

Can we generalise findings from our

research participants to other population

groups?

Can we apply our findings to other contexts and situations

outside of the research setting?Ecological Validity

Improving external validity

• Sample must be representative of target population and be unbiased…..

• Research situation must reflect real life situation e.g. debate over Milgram….Strange Situation

a" research methods reliability and validity

Education