valiadity and reliability- language testing
TRANSCRIPT
![Page 1: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/1.jpg)
www.ou.edu.vn
Click to edit Master subtitle style
HCMC OPEN UNIVERSITY GRADUATE SCHOOL
MINISTRY OF EDUCATION AND TRAINING
HO CHI MINH CITY OPEN UNIVERSITY
TEST CONSIDERATION: REALIABILITY & VALIDITY
Presenters : Group 5 Lý Tuấn Phú
Đặng Kiều Anh
Nguyễn Duy Cường
Nguyễn Thị Kim Loan
Mai Xuân Ái
Trần Thị Kim NgânJuly, Hochiminh City, 2013
![Page 2: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/2.jpg)
I. Reliability - Introduction - Factors that affect language test scores - Classical true score measurement theory - Generalizability theory - Standard error of measurement: interpreting individual test
scores within classical true score and generalizability theory - Item response theory - Reliability of criterion-referenced test scores - Factors that affect reliability estimates - Systematic measurement error II. Validation/ Validity - Introduction - Reliability and validity revisited - Validity as a unitary concept - The evidential basis of validity - Test bias - The consequential or ethical basis of validity - Postmortem: face validity
2
Content
![Page 3: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/3.jpg)
I. RELIABILITY: I.1. Introduction
Relationship between reliability & validity: complementary aspects
(1) (Reliability) to minimize the effects of measurement error, and
(2) (Validity) to maximize the effects of the language abilities we want to measure.
Investigation of reliability: <= we must identify sources of error and estimate the degree of their effects on test scores <= distinguishing the effects of the language abilities we want to measure from the effects of other factors: Complex problem!
![Page 4: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/4.jpg)
I.2. Factors that affect language test scores
Test method facets Personal
attributes
TEST SCORE
Communicative language
ability
Random
factors
![Page 5: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/5.jpg)
Different factors will affect different individuals differently.
Designing and developing language tests: to minimize their effects on test performance :◦ test method,◦ random factors◦ personal attributes: Sources of test bias (test invalidity)
I.2. Factors that affect language test scores (cont)
Sources of measurement error
• ‘Mean’ (): the average of the scores of a given group of test takers.
• ‘Variance’ (): how much individual scores vary from the group mean.
![Page 6: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/6.jpg)
Classical true score (CTS) measurement theory consists of a set of assumptions about the relationships between actual, or observed test scores and the factors that affect these scores. Concept 1: True score and error score 1. An observed score on a test comprises 2 factors:
a true score (an individual’s level of ability) & an error score (factors other than the ability being tested).
2. The relationship between true and error scores: error scores are unsystematic, or random, and are
uncorrelated with true scores.
I.3. Classical true score measurement theory
![Page 7: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/7.jpg)
Concept 2: Parallel tests Two tests are parallel if, for every group of
persons taking both tests, (1) the true score on one test is equal to the true score on the other, and (2) the error variances for the two tests are equal.
I.3. Classical true score measurement theory (cont)
![Page 8: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/8.jpg)
3. Reliability of observed scores: a. Reliability as the correlation between parallel tests:
If the observed scores on two parallel tests are highly correlated, this indicates that effects of the error scores are minimal, and that they can be considered reliable indicators of the ability being measured. b. Reliability and measurement error as proportions of observed score variance
If an individual's observed score on a test is composed of a true score and an error score, the greater the proportion of true score, the less the proportion of error score, and thus the more reliable the observed score.
![Page 9: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/9.jpg)
3 approaches to estimating reliability: (p.173 – p. 185)1. Internal consistency estimates: are concerned primarily with sources of error from within the test and scoring procedures.2. Stability estimates indicate how consistent test scores are over time.3. Equivalence estimates provide an indication of the extent to which scores on alternate forms of a test are equivalent.
![Page 10: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/10.jpg)
Problems with the classical true score model:
+ The CTS model treats error variance as homogeneous in origin. + The CTS model considers all error to be random, and consequently fails to distinguish systematic error from random error.https://www.youtube.com/watch?v=CSI-1Zk6oeM
https://www.youtube.com/watch?v=k84ksLUWKKc
![Page 11: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/11.jpg)
I.4. Generalizability theory (G-theory)(Cronbach & his colleagues)
Constitutes a theory and set of procedures for specifying and estimating the relative effects of different factors on observed test scores
=> Provide a means for relating the uses or interpretations of test scores to the way test users specify and interpret dif. factors, or sources of errors.
![Page 12: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/12.jpg)
A given measure or score is treated as a sample from a hypothetical universe of possible measures.
Interpreting a test score = generalizing from a single measure to a universe of measures. (on the basis of an individual’s performance on a test => generalize to her performance in other contexts).
The more reliable the sample of performance, or test score is, the more generalizable it is.
Generalizability theory (G-theory)(Cronbach & his colleagues)
![Page 13: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/13.jpg)
Reliability = generalizabilit
y
The extent of generalizabilit
y
Defining the universe of measures
The universe of generalization
![Page 14: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/14.jpg)
The application of G-theory to test development and use:
generalizability study (‘G-study’)decision study (‘D-study’)
Þ specify the dif. sources of variance,Þ estimate the relative importance of these
dif. sources simultaneously,Þ employ these estimates in the
interpretation and use of test scores.
![Page 15: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/15.jpg)
Universes of generalization and universes of measures Universe of generalization is a domain of
uses or abilities to which we want test scores to generalize.
Universe of measures are types of test scores we would be willing to accept as indicators of the ability to be measured for the purpose intended.
![Page 16: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/16.jpg)
Populations of persons Are whom we are going to make decisions
or inferences. The degree of generalizability determines
the way we define the population.Ex: using test results for making decisions
about 1 group => this group is population of persons.
Using a test with more than one group (entrance or placement tests) => generalizing beyond a particular group.
![Page 17: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/17.jpg)
Universe score If we could obtain measures for an
individual under all the different conditions specified in the universe of possible measures, his average score on these measures might be considered the best indicator of his ability.
=> is defined as the mean of a person’s scores on all measures from the universe of possible measures.
![Page 18: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/18.jpg)
I.5 Standard error of measurement: interpreting individual test scores within classical true score and generalizability theory
The standard error of measurement is the indicator of how much we would expect an individual’s test scores to vary, given a particular level of reliability.
When investigating the amount of measurement error in individual test scores, we are looking at differences b/w test takers’ obtained scores and their true scores.
![Page 19: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/19.jpg)
The error score is the difference between an obtained score and the true score.
The more reliable the test is, the closer the obtained scores will cluster around the true score mean => smaller standard deviation of errors.
The less reliable the test, the greater the standard deviation.
Because of the importance in the interpretation of test scores, the standard deviation of the error scores has a name: the standard error of measurement (SEM).
![Page 20: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/20.jpg)
Þ SEM provides a means for applying estimates of reliability to the interpretation and use of individuals’ observed test scores
Þ its primary advantage: makes test users aware of how much variability in observed scores to expect as a result of measurement error.
![Page 21: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/21.jpg)
I.6. Reliability of criterion-referenced test scores
![Page 22: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/22.jpg)
Norm-referenced (NR) test scores: maximize inter-individual score differences or score variance
Criterion-referenced (CR) test scores: - provide information about an individual’s relative ‘mastery’ of an ability domain- develop to be representative of the
criterion ability- occur in educational programs and language classrooms- commonly use achievement tests
![Page 23: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/23.jpg)
Aspects of reliability in CR tests
consistency
stability
equivalence
CR tests
![Page 24: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/24.jpg)
well-defined set of tasks or items that constitute a domain CR test development
“true score” and “universe score” “domain score”
Dependability of domain score estimates
![Page 25: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/25.jpg)
Length of test
Difficulty of test and test score variance
Cut-off score
I.7. Factors that affect reliability estimates
![Page 26: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/26.jpg)
Effects of systematic- General effect- Specific effect
Effects of test method - systematic error- random error
I.8. Systematic measurement error
![Page 27: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/27.jpg)
- Traditionally ,validity has been classified into different types, such as :content, criterion, and construct validity.
- According to measurement specialist- Messick(1989) describes validity as: “ An integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores’.-A whole in the most recent revision of the Standards for Educational and Psychological Testing:
Validity us a unitary concept, refers to the degree to which that evidence supports the inferences that are made from test scores.
II. 1. Introduction Definition of validation
![Page 28: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/28.jpg)
The examination of validity : examining the validity of a given use of test scores is a complex process that must involve the examination of both the evidence that supports that interpretation or use and the ethical values that provide the basis or justification for that interpretation or use ( Messick 1975,1980, 1989).
In test validation we are not examining the validity of the test content or of even the test scores themselves, but rather the validity of the way we interpret or use the information gathered through the testing procedure.
Validity is not simply a function of the content and procedure of the test itself, it must consider how test takers perform also.
![Page 29: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/29.jpg)
Reliability is a requirement for validity The investigation of reliability and validity can be viewed as
complementary aspects of identifying, estimating, & interpreting different sources of variance in test scores.
The investigation of reliability is concerned with answering the question : How much variance in test scores is due to measurement error?, How much variance is due to factors other than measurement error?
Validity is concerned with identifying the factors that produce the reliable variance in test scores,
The question addressed : what specific abilities account for the reliable variance in test score ?
II. 2. Reliability and validity revisited The relationship b/w reliability & validity
![Page 30: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/30.jpg)
Definition of validation. The relationship between reliability and validity, viewing the
estimation of reliability as an essential requisite of validation. The framework proposed by Messick (1989) for considering
validity as a unitary through multifaceted concept. The evidential basis for validity Construct validity ( includes content relevance, criterion
relatedness) Test bias ( including culture, test content, personality characteristics
of test takers, sex, age). The ethical, or consequential basis of test use.
II. Validation/ Validity
![Page 31: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/31.jpg)
Another way to distinguish reliability from validity : to consider the theoretical frameworks upon which they depends.
In estimating reliability we are concerned primarily with examining variance in test scores themselves.
In validity, we must consider other sources of variance, and utilize the theory of abilities that we hypothesize will affect test performance.
The process if validation must look beyond reliability and examine the relationship b/w test performance and factors outside the test itself.
The relationship b/w reliability & validity (cont)
![Page 32: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/32.jpg)
However, the distinguishing of reliability & validity is still not clear , due to :
Different test methods from each other
Abilities from test methods
The relationship b/w reliability & validity (cont)
![Page 33: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/33.jpg)
The classic statement of the relationship b/w reliability & validity by Campbell and Fiske (1959) :
Agreement b/w similar Measures of the same trait
( for example, correlation b/w scores on parallel tests)
Agreement b/w different measures of the same trait
(for example, correlationb/w scores on a multipleChoice test of grammar
& ratings of grammar on An oral interview)
The relationship b/w reliability & validity (cont)
![Page 34: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/34.jpg)
In many cases, the distinctiveness of the test methods is not so clear.
->We must carefully consider not only similarities in the test content, but also similarities in the test methods, in order to determine whether correlations b/w tests should be interpreted as estimators of reliability or as evidence supporting validity.
Language testing has a very special and complex problem when its comes to traits and methods-> It’s difficult for language test to distinguish traits and methods.
![Page 35: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/35.jpg)
Source ofJustification
Evidential Basis
Consequential Basis
Function of outcome of testing
Test interpretation Test use
Construct validity
Construct validity+Value implications
Construct validity+Relevance/
Utility
Construct validity+Relevance/
Utility+Social
consequences
II. 2. Validity as a unitary concept
![Page 36: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/36.jpg)
II.4. THE EVIDENTIAL BASIS OF VALIDITY
![Page 37: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/37.jpg)
II.4.1.Content Relevance and Content Coverage (Content validity) content relevance requires ‘the specification
of the behavioral domain in question and the attendant specification of the task or test domain’ (Messick 1980: 1017).
content coverage, or the extent to which the tasks required in the test adequately represent the behavioral domain in question.
![Page 38: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/38.jpg)
II4.2. Content Relevance and Content Coverage
The examination of the content relevance and content coverage is a necessary part of the validity process
![Page 39: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/39.jpg)
II.4.3. Criterion Validity may be level of ability as defined by group
membership, individuals’ performance on another test of the ability in question, or their relative success in performing some task that involves this ability.
![Page 40: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/40.jpg)
II.4.4.Concurrent Validity Information on concurrent criterion relatedness
is undoubtedly the most commonly used in language testing.
There are 2 forms:(1) examining differences in test performance
among groups of individuals at different levels of language ability.
(2) examining correlations among various measures of a given ability.
![Page 41: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/41.jpg)
II.4.5. Predictive Validity
need to collect data a relationship between scores on the test and job or course performance
can largely ignore the question of what abilities are being measured
![Page 42: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/42.jpg)
II.4.6. Construct Validation
Construct validity is indeed the unifying concept integrates criterion and content considerations into a common framework for testing rational hypotheses about theoretically relevant relationships.
(Messick 1980: 1015)Construct validation requires both logical analysis and empirical investigation.
![Page 43: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/43.jpg)
II.4.7. Evidence supporting Construct Validation
the test developer involved in the process of construct validation is likely to collect several types of empirical evidence. These may include any or all of the following:
(1) the examination of patterns of correlations among item scores and test scores, and between characteristics of and tests and scores on items and tests;
(2) analyses and modeling of the processes underlying test performance;
(3) studies of group differences;(4) studies of changes over time (5) investigation of the effects of experimental
treatment (Messick 1989)
![Page 44: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/44.jpg)
II.4.8. Correlational evidenceCorrelational evidence is derived €ram a
family of statistical procedures that examine the relationships among variables, or measures.
A correlation is a functional relationship between two measures.
Correlational approaches to construct validation may utilize both
exploratory and confirmatory modes.
![Page 45: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/45.jpg)
II.4.9. Directly examining patterns of correlations among measures
It is impossible to make clear, unambiguous inferences regarding the influence of various factors on test scores on the basis of a single correlation between two tests.
![Page 46: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/46.jpg)
II.4.10. Factor analysis
A commonly used procedure for interpreting a large number of correlations is factor analysis
![Page 47: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/47.jpg)
II.4.11. The multitrait-multimetbod (MTMM) design Characteristic: each measure is considered
to be a combination of trait and method, and tests are included in the design so as to combine multiple traits with multiple methods.
Advantage: permits the investigator to examine patterns of both convergence and discrimination among correlations. Convergence is essentially what
![Page 48: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/48.jpg)
II.4.12. The multitrait-multimetbod (MTMM) design
Analysis of data: many ways(1) the direct inspection of convergent and
discriminant correlations(2) the analysis of variance(3) confirmatory factor analysis
![Page 49: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/49.jpg)
II.4.13. Experimental Evidence individuals are assigned .at random to two or more
groups, each of which is given a different treatment. At the end of the treatment, observations are made to investigate differences among the different groups.
There are two distinguishing characteristics of a true experimental design. The first is that of randomization, which means that (1) a sample of subjects is randomly selected from a population, and (2) the individuals in this random sample are then randomly assigned to two or more groups for comparison.
The second characteristic is that of experimental intervention, or treatment. This means that the different groups of subjects are exposed to distinct treatments, or sets of circumstances, as part of the experiment.
![Page 50: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/50.jpg)
II.4.14. Conclusion
the process of construct validation is a complex and continuous undertaking, involving both ( 1) theoretical, logical analysis leading to empirically testable hypotheses, and (2) a variety of appropriate approaches to empirical observation and analysis
The result of this process of construct validation will be a statement regarding the extent to which the test under consideration provides a valid basis for making inferences about the given ability with respect to the types of individuals and contexts that have provided the setting for the validation research.
![Page 51: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/51.jpg)
51
What is test bias? Systematic differences in test performance ,
resulted by the differences in individual characteristics
Examples: Gender Difference in Mathematical Ability
A reliable mathematics test to a representative groups of males and females.On average, males have higher scores than females=> Tendency to interpret that: “males have greater mathematical ability than female”
II.5. Test BiasII.5.1. Definition
![Page 52: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/52.jpg)
52
However, test score should not be interpreted to reflect purely mathematical ability.The differences b/w test scores due to test score bias, NOT due to differences in true mathematical abilitt => Differences in group performance do not indicate test bias. => The systematic differences which are not logically related to the ability in the questions/ tests => test is biased
![Page 53: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/53.jpg)
53
+ misinterpretation of test score + sexist / racist content (content validity) + unequal prediction of criterion
performance + unfair content (content validity) + inappropriate selection procedures + inadequate criterion measures + threatening atmosphere + conditions of testing
II.5.2. Topic of test bias: complex
![Page 54: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/54.jpg)
54
+ Cultural background Cultural differences (Britre (1968, 1973: Britre and Brown. 1971) The problem of cultural content ((Plaister (1967) and Condon
(1975)) In item response theory, some items in multiple - choice
vocabulary are in favor of one linguistic and cultural subgroups (Chen and Henning (1985))
Aptitude tests: possibly biased toward culturally different groups (Zeidner (1986))
+ Background knowledge Prior knowledge affects test performance (Chacevycn et al. (1982)) In ESP testing, students' performance: affected as much by their
prior knowledge as by their language proficiency + Cognitive characteristics Cognitive factors influence language acquisition (Brown 1987)
II.5.3. Potential sources of bias
![Page 55: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/55.jpg)
55
Cognitive styles/ learning styles: + field- dependent/ independent a field-independent learning style is defined
by a tendency to separate details from the surrounding context ( cited from http://www.teachingenglish.org.uk/knowledge-database/field-independent-learners)
a field-dependent learning style, which is defined by a relative inability to distinguish detail from other information around it
![Page 56: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/56.jpg)
56
Example Field-independent learners tend to rely less on the
teacher or other learners for support. => Psychological differences Ambiguity tolerance/ intolerance : cognitive flexibility Tolerance of ambiguity: one's acceptance of confusing
situation and a lack of clear r line demarcation (Ely (1989)),
One facet of personality characteristics : related to risk taking . Those who can tolerate ambiguity are more likely to take risks in language learning, an essential of making progress on the language acquisition
(As cited in Grace ,1997)
![Page 57: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/57.jpg)
57
Test: serve the need of an educational system or of society
The use of language tests reflect in microcosm the role of test in general as instrument of social policy
The role of tests can be described via kinds of tests + placement + diagnosis + selection (based in the proficiency/ achievement ) + evaluation + making decisions The issues involved in the ethics of tests: + numerous + vary across societies, cultures, testing contexts
II.6. The consequential or ethical basis of validity
![Page 58: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/58.jpg)
58
=> focus on the rights of individual test takers :+ secrecy+ access to information+ privacy+ confidentiality+ consent+ the balance b/w individual rights and the values of the society
![Page 59: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/59.jpg)
59
As test developers and test users, people need to consider:
+ the rights & interests of test takers + the responsibilities of institutions for
making decisions based on tests+ public interest
These considerations are political, dynamic, and vary across societies
These considerations have implications for the practice of teachers' profession, kinds of tests to be developed an the ways in which test usefulness is justified .
![Page 60: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/60.jpg)
60
We must move out of the comfortable combines of applies linguistic and psychometric theory into the arena of public policy.
Hulin et at. (1983) "it is important to realize that testing and social policy
a=cannot be totally separated and that questions about the use of tests can not be addressed without considering existing social forces, whatever they are (p. 285)
4 areas of considerations in the ethical use and interpretation of test results (Messick (1980, 1988b)
+ construct validity/ the evidence supports the interpretation of test scores
+ value systems that inform test use+ practical usefulness of the test+ the consequences to the educational system or
society of using test results for a particular purpose
![Page 61: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/61.jpg)
61
In short , complete evidence should be provided:
+ to prove that tests are used as valid indicators of the abilities which are appropriate to the intended use
+ to determine the use of test
![Page 62: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/62.jpg)
62
Test validity: the appeal or appearance of a test
Measure what it is supposed to measure. Test appearance has a considerable effect on
the acceptability of tests to both test takers and test users.
Test talkers will take the test seriously enough to try the best or not. Accept/ not accept the test. Test is useful or not.
=> test takers' reaction influent the validity and reliability of tests.
II.7. Portmortern: face validity
![Page 63: Valiadity and reliability- Language testing](https://reader034.vdocuments.net/reader034/viewer/2022042701/55c5b0b3bb61eb531c8b4730/html5/thumbnails/63.jpg)
63