reliability and validity introduction to study skills & research methods (hl10040) dr james...

Post on 19-Dec-2015

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reliability and Validity

Introduction to Study Skills & Research Methods (HL10040)

Dr James Betts

Lecture Outline:•Definition of Terms

•Types of Validity

•Threats to Validity

•Types of Reliability

•Threats to Reliability

•Introduction to Measurement Error.

Commonly used terms…

“She has a valid point”

“My car is unreliable”

…in science…“The conclusion of the study was not valid”

“The findings of the study were not reliable”.

Some definitions…

• Validity

“The soundness or appropriateness of a test or instrument in measuring what it is designed to measure”

(Vincent 1999)

Some definitions…

• Validity

“Degree to which a test or instrument measures what it purports to measure”

(Thomas & Nelson 1996)

Some definitions…

• Reliability

“…the degree to which a test or measure produces the same scores when applied in the same circumstances…”

(Nelson 1997)

Some definitions…

• Objectivity

“…the degree to which different observers agree on measurements…”

(Atkinson & Nevill 1998)

Types of Experimental Validity• Internal

– Is the experimenter measuring the effect of the independent variable on the dependent variable?

• External

– Can the results be generalised to the wider population?

Logical Statistical

AKA Criterion

Face Content Predictive

Construct

Concurrent

Validity

ConsistencyReliability Objectivity

Logical Validity• Face Validity

– Infers that a test is valid by definition

– It is clear that the test measures what it is supposed to

e.g.If you want to assess reaction time, measuring how long it takes an individual to react to a given stimulus would have face validity

Externally Valid?

Logical Validity• Face Validity

– Infers that a test is valid by definition

– It is clear that the test measures what it is supposed to

Assessing face validity is therefore a subjective process.

i.e.Would assessing 15 m sprint time be a valid means of assessing reaction time?

Logical Validity• Content Validity

– Infers that the test measures all aspects contributing to

the variable of interest

…also a subjective process.

e.g.Who is the most physically fit?

VO2 max test?

Wingate test?

1 RM?

Overall:

A logically valid test simply appears to

measure the right variable in its entirety?

Statistical Validity• Concurrent Validity

– Infers that the test produces similar results to a previously validated test

e.g. VO2 max

Incremental Treadmill Protocol with expired gas analysis Multi-Stage Fitness (Beep) Test

Statistical Validity• Predictive Validity

– Infers that the test provides a valid reflection of future performance using a similar test

e.g.Can performance during test A be used to predict

future performance in test B?

A Bhttp://www.youtube.com/watch?v=vdPQ3QxDZ1s

Overall:

A statistically valid test produces results

that agree with other similar tests?

Logical/Statistical Validity• Construct Validity

– Infers not only that the test is measuring what it is supposed to, but also that it is capable of detecting what should exist, theoretically

– Therefore relates to hypothetical or intangible constructs

e.g. Team Rivalry

Sportsmanship.

Logical/Statistical Validity• Construct Validity

– Infers not only that the test is measuring what it is supposed to, but also that it is capable of detecting what should exist, theoretically

– Therefore relates to hypothetical or intangible constructs

– This makes assessment difficult, i.e. if what should exist cannot be detected, this could mean:

a) Test Invalid? b) Theory Incorrect? c) Sensitivity/Specificity Issues?

Interesting Example: Breast Cancer• Incidence: ~1 % (0.8 %)

(i.e. a positive result should be detected for approximately 1 in every 100 women tested)

• Sensitivity: ~90 % (87 %)(the mammogram is sensitive enough that approximately 90 in every 100 breast cancer patients will receive a positive result)

• Specificity: ~90 % (93 %)(the mammogram is specific enough that approximately 90 in every 100 healthy patients will receive a negative result).

Data from Kerlikowske et al. (1996)

Quick Test

• What is the probability that a patient receiving a positive result actually has breast

cancer?

Threats to Validity(and possible solutions?)

Threats to Internal Validity• Maturation

– Changes in the DV over time irrespective of the IV

Threats to Internal Validity• Maturation

e.g. One Group Pre-test Post-test

TO1O2

Threats to Internal Validity• Maturation (possible solution)

Time series

TO1 O2 O3 O4 O5 O6

Threats to Internal Validity• Maturation (possible solution)Pre-test Post-test Randomised Group Comparison

O1 T O2

P O4

O3

Rn.b. RCT

Threats to Internal Validity• Maturation (possible solution)

Repeated measures designs can occasionally be an inappropriate solution, even when randomised and counterbalanced

e.g.

Muscle Damage (repeated bout effect)

Vitamin Supplementation (wash-out period)

In which case independent measures designs could be used.

Threats to Internal Validity• History

– Unplanned events between measurements

Threats to Internal Validity• History

TO1O2

e.g. exercise?

Therefore, solution = control extraneous variables!

Threats to Internal/External Validity

• Pre-testing– Interactive effects due to the pre-test (e.g. learning,

sensitisation, etc.)– Also influences External Validity

• Pre-testing

…but then respond better to the T than the P…

e.g.

O1 T O2

PO3

RO4

…so it is actually T+O1 that

is better than P, not T alone.

Threats to Internal/External Validity

Assessing muscle mass here could make them train harder in both trials…

• Pre-testing (possible solution)

Solomon Four-Group Design

O1

T O2

RO4

PO3

P O6

TO5

Threats to Internal/External Validity

Threats to Internal Validity

• Statistical Regression– AKA regression to the mean

– An initial extreme score is likely to be

followed by less extreme subsequent scores

e.g.

Training has the greatest effect on untrained individuals.

Therefore, solution = effective sampling.

Sophomore Slump & SI

‘Cover Jinx’

Threats to Internal Validity• Instrumentation

– A difference in the way 2 comparable variables were measured

e.g.

Uncalibrated equipment

Therefore, solution = calibrate!

Threats to Internal Validity• Selection Bias

– The groups for comparison are not equivalent

Threats to Internal Validity• Selection Bias

e.g. Groups not randomly assigned

Static Group Comparison

T O1

Oa

P

i.e.

Group T were resistance trained to start with

Threats to Internal Validity• Selection Bias (possible solution)

T O1

Oa

P

Either:

-Randomise group assignment,

-Pre-test and post-test difference,

-Repeated Measures Design.

Threats to Internal/External Validity• Experimental Mortality

– Missing Data due to subject drop-out– Reduced n = reduced statistical Power– Not only challenges quality of data gathered

(Internal Validity) but also our ability to generalise (External Validity).

Therefore, solution = recruit sufficient

participants(young?)

Threats to External Validity• Inadequate description

– 5th characteristic of research…

…should be replicable

If nobody can replicate the methods of a given study, then it is irrefutable and therefore lacks external validity.

Therefore, solution = comprehensive methodology

Threats to External Validity• Biased sampling

– Linked to statistical regression– Sample does not reflect target population– n ≠ N

Results generalised across gender

Therefore, solution = random sample (of target population).

Threats to External Validity• Hawthorne Effect

– DV is influenced by the fact that it is being recorded

e.g.

Fastest sprint when professor enters lab

Therefore, solution = control the lab environment.

Threats to External Validity

CHO H2O

Therefore, solution = double or single

blinding.

• Demand Characteristics– Participants detect the purpose of the study and

behave accordingly

e.g.

Sports Science students already know that the carbohydrate drink is supposedly superior

Threats to External Validity• Operationalisation

– AKA Ecological Validity– The DV must have some relevance in the

‘real world’

e.g.TTE has no Olympic equivalent

Therefore, solution = choose your DV carefully.

Reliability• Reliability is a pre-requisite of validity

e.g. Direct versus Indirect measures of VO2 max

-Gold Standard -Expensive -Complex

-Predictive -Cheap -Easy

(i.e. valid and reliable)

Reliability

Subject 1 60 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 55 ml.kg-1.min-1 55 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 70 ml.kg-1.min-1 70 ml.kg-1.min-1

Valid and Reliable

Reliability

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 65 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 75 ml.kg-1.min-1 75 ml.kg-1.min-1

Not Valid but Reliable5 ml.kg-1.min-1

correction?

Reliability

Subject 1 60 ml.kg-1.min-1 72 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 61 ml.kg-1.min-1 52 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 40 ml.kg-1.min-1 84 ml.kg-1.min-1

Not Valid and not Reliablei.e. a test can never be valid without being reliable?

Types of Reliability

• Relative

• Absolute

• Rater reliability (Objectivity)– Intrarater reliability– Interrater reliability.

Relative Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1

Relatively Reliablei.e. Individuals maintain position in the group

Absolute Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1

Not Absolutely Reliablei.e. Test-Retest within individuals

Rater Reliability• Intrarater reliability

– The consistency of a given observer or measurement tool on more than one occasion

Rater Reliability• Interrater reliability

– The consistency of a given measurement from more than one observer or measurement tool

e.g.

Score for the American Gymnast

British Judge = 9.9

French Judge = 4.4

Japanese Judge = 7.0

Threats to Reliability• Fatigue

Subject 1 60 ml.kg-1.min-1 55 ml.kg-1.min-1 50 ml.kg-1.min-1

8 am 9 am 10 am

Therefore, solution = increase time between tests.

Threats to Reliability• Habituation

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 70 ml.kg-1.min-1

Therefore, solution = familiarise prior to test.

Threats to Reliability

• Standardisation of Procedures– Control of extraneous variables

• Precision of Measurements– i.e. if we are happy to measure VO2 max to the nearest

10 ml.kg-1.min-1, then it could probably be reliably predicted from your training volume and age.

Measurement Errors

• Ultimately, reliability is dependent on the degree of measurement error in a given study

• The overall error in any measurement is comprised of both systematic and random error

• We will address measurement error further next week…

Literature Search Assignment

• The handout lists 8 questions which can be answered through retrieving the corresponding source articles

• Answer as many as possible and bring them to next week’s lecture

• DO NOT contact author or order articles.

Selected Reading• Atkinson, G. and A. M. Nevill. Statistical methods for

assessing measurement error (Reliability) in variables relevant to sports medicine. Sports Medicine. 26:217-238, 1998.

• Holmes, T. H. Ten categories of statistical errors: a guide for research in endocrinology and metabolism. American Journal of Physiology. 286: E495-501.

• Thomas J. R. & Nelson J. K. (2001) Research Methods in Physical Activity, 4th edition. Champaign, Illinois: Human Kinetics

J.Betts@bath.ac.uk

top related