university admission testing in chile: the psu

Post on 13-Dec-2014

249 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Chile's university admission test, the PSU, has been sold as a test that can do anything you may want a test to do. The result is that it does none of them well. It should be scrapped, in favor of new tests modeled on the old system of a universal aptitude test with highly-focused content tests for those faculties that want them.

TRANSCRIPT

Large-scale testing: Uses and abuses

Richard P. Phelps

Universidad Finis Terrae, Santiago, Chile

January 7, 2014

Large-scale testing: Uses and abuses

1. 3 types of large-scale tests2. Measuring test quality3. A chronology of mistakes4. Economists misunderstand testing5. How SIMCE is affected

AchievementAptitude

Non-cognitive

1. Three types of large-scale tests

Achievement tests Historically, were larger versions of classroom tests

~ 1900 - “scientific” achievement tests developed (Germany & USA)

SOURCE: Phelps, Standardized Testing Primer, 2007

J.M. Rice - systematically analyzed test structures & effects

E.L. Thorndike - developed scoring scales

Achievement tests

Purpose: to measure how much you know and can recall

Developed using: content coverage analysis

How validated: retrospective or concurrent validity (correlation with past measures, such as high school

grades)

Requires a mastery of content prior to test.

Fairness assumes that all have same opportunity to learn content

Coachable – specific content is known in advance

SOURCE: Phelps, Standardized Testing Primer, 2007

Aptitude tests

1917 – Adapted by U.S. Army to select, assign soldiers in World War 1

1930s – Harvard University president J. Conant- wanted new admission test to identify students from lower social classes with the

potential to succeed at Harvard- developed the first Scholastic Aptitude Test (SAT)

SOURCE: Phelps, Standardized Testing Primer, 2007

1890s – A. Binet & T. Simon (France)

- Pre-school children with mental disabilities

- achievement test not possible- developed content-free test of mental abilities

(association, attention, memory, motor skills, reasoning)

Aptitude testsPurpose: predict how much can be learned

Developed using: skills/job analysis

How validated: predictive validity, correlation with future activity (e.g., university or job evaluations)

Content independent. Measures: … what student does with content provided… how student applies skills & abilities developed over a lifetime

Not easily coachable – the content is either…… not known in advance, … basic, broad, commonly known by all, curriculum-free;… less dependent on the quality of schools

SOURCE: Phelps, Standardized Testing Primer, 2007

Aptitude tests

Aptitude tests can identify:

- Students bored in school who study what interests them on their own

- Students not well adapted to high school, but well adapted to university

- Students of high ability stuck in poor schools

SOURCE: Phelps, Standardized Testing Primer, 2007

Achievement Aptitude

Measure past learning potential

Development content analysis job/skills analysis

Validation retrospective predictive

Content dependent independent

Coachable? very much not much

Comparing Achievement & Aptitude tests

Non-cognitive tests

More recently developed – measure values, attitudes, preferences

Types: integrity tests career exploration matchmakingemployment “fit”

Non-cognitive tests

Purpose: to identify “fit” with others or a situation

Developed using: surveys, personal interviews

How validated? success rate in future activities

Content is personal, not learned

“Faking” can be an issue (e.g., “honesty” tests)

Achievement Aptitude Non-Cognitive

Measure past learning potential attitudes, values, preferences

Development content analysis job/skills analysis surveys

Validation retrospective predictive predictive

Content dependent independent independent

Coachable? very much very little can be faked

Comparing Achievement, Aptitude, & Non-Cognitive Tests

2. Measuring test quality

3 measures are important:1. Predictive validity2. Content coverage3. Sub-group differences

Test reports can be “data dumps”

Predictive validity(values from -1.0 to +1.0)

…measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion)

A test with low predictive validity provides little information.

Source: NIST, Engineering Statistics Handbook

A positive correlation between two measures

Source: NIST, Engineering Statistics Handbook

A negative correlation between two measures

Source: NIST, Engineering Statistics Handbook

No correlation between two measures

How does one measure predictive capacity?

Correlation Coefficient: I--------------------------------------------I

-1 0 1

0

0.1

0.2

0.3

0.4

0.5

0.6

SAT

PSU 2010

Predictive validities: SAT and PSU

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

Language Mathematics SAT Writing PSU Social Science

0

0.1

0.2

0.3

0.4

0.5

0.6

SAT PSU Administracion

Predictive validities: SAT and PSU(faculty: Administracion)

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

Language Mathematics SAT Writing PSU Social Science

0

0.1

0.2

0.30.4

0.5

0.6

SAT PSU Arquitectura

Predictive validities: SAT and PSU(faculty: Arquitectura)

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

Language Mathematics SAT Writing PSU Social Science

0

0.1

0.2

0.30.4

0.5

0.6

SAT PSU Educacion

Predictive validities: SAT and PSU(faculty: Educacion)

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

Language Mathematics Social Science Science0

0.1

0.2

0.30.4

0.5

0.6

ACT PSU

Predictive validities: ACT and PSU

SOURCE: ACT, Research Summary Services, 1997_1998; Pearson, Final Report Evaluation of the Chile PSU, January 2013

Language Mathematics0

0.1

0.2

0.30.4

0.5

0.6

CTA Pearson

Predictive validities of the PSU(CTA v Pearson estimates)

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; CTA

Incremental Predictive validities (engineering): (controlling for NEM)

SOURCE: S.A. Prado, Estudio de Validez Predictiva de la PSU y Comparacion con el Sistema PAA, Universidad de Chile

U. Chile PUC U. Chile PUCLanguage & Math Language & Math + subject test

0

5

10

15

20

25

30

35

PAAPSU

Content coverage (values from 0% to 100%)

It is not fair to expect students to master content to which they have not been exposed. …or, to compare students who have been exposed to students who have not.

…how much of the content domain of a test has been taught in the schools.

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012

Municipal Subvencionado Pagado0

25

50

75

100

Percentage curricular coverage in Chilean high schools, by type of school: 2012

Mathematics, Level 1

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012

Percentage curricular coverage in Chilean high schools, by type of school: 2012

Language & Communication, Level 2

Municipal Subvencionado Pagado0

25

50

75

100

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012

Percentage curricular coverage in Chilean high schools, by type of school: 2012

Mathematics, Level 3

Municipal Subvencionado Pagado0

25

50

75

100

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012

Percentage curricular coverage in Chilean high schools, by type of school: 2012

Language & Communication, Level 4

Municipal Subvencionado Pagado0

25

50

75

100

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012

Percentage curricular coverage in Chilean high schools, by type of curriculum: 2012

Mathematics, Level 4

Humanista Cientifica Technico Profesional Polivante0

25

50

75

100

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012

Percentage curricular coverage in Chilean high schools, by type of curriculum: 2012

Language & Communication, Level 4

Numanista Cientifica Technico Profesional Polivante0

25

50

75

100

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012

Percentage of Chilean high schools with full curricular coverage, by subject area: 2012

Levels 1--4

Mathem

atics

Langu

age &

Communication

0%

25%

50%

75%

100%

Do NOT Cover 100%Cover 100%

Subgroup differences

Differences in test scores among subgroups (e.g., gender, ethnic, school type) should be due only to differences in the attribute measured by the test and not to systematic biases in the test.

111

170

46

8595

124

43

51

0102030405060708090

100110120130140150160170180190

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra

0

20

40

60

80

100

120

140

160

180

200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra PP Muni-TP Brecha Sin Ajustar

PP Muni-TP Brecha Ajustada

PP Muni-CH Brecha Sin Ajustar

PP Muni-CH Brecha Ajustada

0

20

40

60

80

100

120

140

160

180

200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra PP Muni-TP Brecha Sin Ajustar

PP Muni-TP Brecha Ajustada

PP Muni-CH Brecha Sin Ajustar

PP Muni-CH Brecha Ajustada

0

20

40

60

80

100

120

140

160

180

200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra PP Muni-TP Brecha Sin Ajustar

PP Muni-TP Brecha Ajustada

PP Muni-CH Brecha Sin Ajustar

PP Muni-CH Brecha Ajustada0

20

40

60

80

100

120

140

160

180

200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra PP Muni-TP Brecha Sin Ajustar

PP Muni-TP Brecha Ajustada

PP Muni-CH Brecha Sin Ajustar

PP Muni-CH Brecha Ajustada

SOURCE: Koljatic, Silva, & Phelps, Consequential Tests and Conflicts of Interest: The Case of Chile’s PSU, forthcoming, 2014

Growing gaps in PSU Mathematics raw & adjusted scores, by type of curriculum: 2002—2010

106

161

44

79

86

113

36

44

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Lenguaje para toda la muestra

0

20

40

60

80

100

120

140

160

180

200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra PP Muni-TP Brecha Sin Ajustar

PP Muni-TP Brecha Ajustada

PP Muni-CH Brecha Sin Ajustar

PP Muni-CH Brecha Ajustada

0

20

40

60

80

100

120

140

160

180

200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra PP Muni-TP Brecha Sin Ajustar

PP Muni-TP Brecha Ajustada

PP Muni-CH Brecha Sin Ajustar

PP Muni-CH Brecha Ajustada

0

20

40

60

80

100

120

140

160

180

200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra PP Muni-TP Brecha Sin Ajustar

PP Muni-TP Brecha Ajustada

PP Muni-CH Brecha Sin Ajustar

PP Muni-CH Brecha Ajustada0

20

40

60

80

100

120

140

160

180

200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Bre

chas

Brechas PSU Matemáticas para toda la muestra PP Muni-TP Brecha Sin Ajustar

PP Muni-TP Brecha Ajustada

PP Muni-CH Brecha Sin Ajustar

PP Muni-CH Brecha Ajustada

SOURCE: Koljatic, Silva, & Phelps, Consequential Tests and Conflicts of Interest: The Case of Chile’s PSU, forthcoming, 2014

Growing gaps in PSU Language & Communication raw & adjusted scores, by type of curriculum: 2002—2010

3. A chronology of mistakes

2000, initial proposal, SIES/PSU project

This proposal attempts a redesign of the tests currently used to select students for higher education in Chile. It is expected that [this new test will] have a positive impact in the efficiency of the selection process, improving the psychometric properties of the measuring instruments, and establishing a better articulation between the selection system and the secondary education curriculum.

SOURCE: Proyecto FONDEF, Reformulacion de las Pruebas de Seleccion a la Educacion Superior

…the Academic Aptitude Test for entry to the university system is under revision, together with the universities belonging to the Council of Rectors. This instrument of entry selection, needs also to be aligned with the new curriculum and may become an exit exam from the secondary education system.

2001 (World Bank & MINEDUC)

SOURCE: World Bank, Implementation Completion Report on a Loan in the Amount of $35 million to the Republic of Chile for Secondary Education, 2001

A chronology of mistakes (cont.)

…The new law adopted in May 2005 (Bulletin 3223-04) established a system of student loans available to all students achieving a threshold score in the University Admission Exam (PSU). …the new system does not impede students unable to provide collateral from financing their studies. The new system promises to improve equity further by increasing options for talented students from non-affluent families to access higher education.

2005 (World Bank)

SOURCE: IMPLEMENTATION COMPLETION REPORT (TF-25378 SCL-44040 PPFB-P3360) ON A LOAN IN THE AMOUNT OF US$145.45 MILLION TO THE REPUBLIC OF CHILE FOR THE HIGHER EDUCATION IMPROVEMENT PROJECT, December 2005

A chronology of mistakes (cont.)

[One option for revising admission testing] would be for Chile to move away from a university entry test towards a national school leaving test or set of tests – ideally, not simple multiple choice tests but longer exams, which test both knowledge and candidates’ ability to think and to apply knowledge. Such school leaving exams or tests could also remove the need for a separate school leaving certificate, by having two pass levels, the lower level equivalent to the NEM and the higher level setting the minimum standard for entry to an academic or professional degree course.

2009 (OECD & World Bank)

SOURCE: OECD & World Bank, Tertiary Education in Chile, 2009

A chronology of mistakes (cont.)

The second option [to revising admissions testing] would be to reform the PSU by incorporating elements other countries consider useful and important in identifying the students most likely to benefit from HE. These elements would include extended essays and questions designed to test reasoning ability and learning potential. They could also include personal statements which could cover non-curricular experience, personal motivation and interest in the programme. Again, there should be a variant for vocational secondary school students.

2009 (OECD & World Bank)

A chronology of mistakes (cont.)

SOURCE: OECD & World Bank, Tertiary Education in Chile, 2009

Over time the government should consider replacing the university entry exam with a national school leaving exam as the prime criterion for entry into tertiary education institutions. This could establish a closer link between test results and the school that is responsible for them, making it easier to reach the goal that has been pursued with the introduction of the PSU.

2010 (World Bank)

SOURCE: N. Brandt, CHILE: CLIMBING ON GIANTS' SHOULDERS: BETTER SCHOOLS FOR ALL CHILEANCHILDREN; ECONOMICS DEPARTMENT WORKING PAPERS No. 784

A chronology of mistakes (cont.)

There is evidence that central curriculum based exit exams are strongly and positively related to student academic performance (Wößmann, 2005; Bishop, 2006). To allow students to show in more detail their knowledge and their ability to apply it, the school exit exam could be a bit more in-depth than the multiple-choice PSU, including verbal and nonverbal reasoning.

2010 (World Bank)

SOURCE: N. Brandt, CHILE: CLIMBING ON GIANTS' SHOULDERS: BETTER SCHOOLS FOR ALL CHILEANCHILDREN; ECONOMICS DEPARTMENT WORKING PAPERS No. 784

A chronology of mistakes (cont.)

4. Economists misunderstand testing

EDUC 501 Classroom AssessmentEDUC 553 Construction, Validation, and Uses of Criterion-Referenced TestsEDUC 555 Introduction to Statistics & Computer Analysis IEDUC 632 Principles of Educational & Psychological TestingEDUC 637 Non-Parametric Statistics AnalysisEDUC 656 Introduction to Statistical & Computer Analysis IIEDUC 661 Educational Research Methods IEDUC 727 Scale and Instrument DevelopmentEDUC 731 Structural Equation ModelingEDUC 735 Advanced Theory & Practice of Testing IEDUC 736 Advanced Theory & Practice of Testing IIEDUC 771 Application of Applied Multivariate Statistics IEDUC 772 Application of Applied Multivariate Statistics IIEDUC 821 Advanced Validity Theory & Test Validation

Testing & Measurement PhD program (University of Massachusetts, USA, 2013-2014)

How economists misunderstand testing - 1

Increasing an admission test’s correlation with high school work can decrease its correlation with university work

Incentives aren’t all that matter in improving efficiency; …also important: more and better information, better classification & allocation

How economists misunderstand testing - 2

Incentives generally work best when applied to the actor responsible for the target behavior; …currently, students bear the consequences when schools do not teach the curriculum tested on the PSU

How economists misunderstand testing - 3

Many useful and successful tests serve multiple purposes. But, some purposes are compatible and some are not. Responsible authorities have argued that the PSU will: 1. Measure the implementation of a new curriculum; 2. Fairly measure mastery of two, very different curricula;3. Incentivize high schools to implement the new curriculum; 4. Incentivize high school students to study more; 5. Predict success in university generally;6. Predict success across very different types of university programs;7. Reduce socio-economic disparities.

How economists misunderstand testing - 4

The PSU: A test at war with itself

Expected to do to many things…

…it does none of them well,

…and makes some of them worse.

(a science-humanities exit exam, sold originally as a science-humanities curriculum coverage survey, that is used as an entry exam for all students)

You cannot get there from here

A non-cognitive test, used as a high-stakes admission test, will exacerbate the problems. It is easily faked. Wealthier students will pay for coaching and the scores will be invalid.

The PSU cannot be “fixed”; it is fundamentally flawed.

The old system – PAA + PCEs – was a sensible system.

Option for Technical-Professional Graduates:

As is done in Germany, offer short course on scientific-humanistic 11th & 12th grade curricula with exam at the end for technical-professional graduates who decide after graduation that they wish to change careers.

Create separate test for technical-professionals to enter university.

ETS & Pearson recommendations:

Lessen the content in PSU to the common level – 10th grade – and to that which is genuinely necessary for a good prediction.

Other options to consider

04/10/2023

How the PSU Runs:

• CRUCh: "owners" of the PSU• Comité Técnico Asesor (CTA) para la PSU: designated

by CRUCh as supervisors of DEMRE and official evaluators of the PSU

• DEMRE: responsible for developing test items, test assembly, tests administration, test scoring, application system for CRUCh and associated universities, etc.

Ministry of Education--funds the system since 2007 (fee waivers)

CRUCH

COMITÉ TECNICOASESOR DEL CRUCHPARA LA PSU (CTA)

DEMRE U. de Chile

Source: adapted from the Pearson Report (2013)

What does this have to do with SIMCE?

Most do not see the difference among tests. In public perception, one bad test makes all tests look bad.

SIMCE’s largest challenge may the loss of public goodwill towards all testing.

5. How SIMCE is affected

“If a thing exists, it exists in some amount. If it exists in some amount, then it is capable of being measured.”

−−Rene Descartes, Principles of Philosophy, 1664

top related