july 1, 20081 early childhood assessment and accountability presented by anita skop, bill stroud and...

July 1, 2008 1

Early Childhood Assessment and AccountabilityPresented by Anita Skop, Bill Stroud and Deena Abu-Lughod

SAF Conference, July 1 2008

July 1, 2008 2

Presentation

• Rationale and purpose• Big Issues, Experiences,

Challenges and Benefits• Alignment: ECLAS 2 and

the ELA• Statistical Exploration:

Examples from ECLAS 2, DIBELS and Running Records

July 1, 2008 3

Rationale and Purpose

• SAF advisors for development of early childhood progress report– Consider methods that could measure value

added while controlling for known demographic trends

– Evaluate students or programs?– Ensure that EC schools get meaningful

feedback, including backmapped data

• Familiarize colleagues with several EC assessments to enable them to advise IQTs on their utility

July 1, 2008 4

The Big Questions

• What outcomes are we trying to achieve with an Early Childhood accountability system?

• What do we want to develop in children as a result of their experiences in school?

July 1, 2008 5

EC Program Accountability in Other States:

• New Jersey - Early Childhood Environmental Rating Scale. Random samples from Abbott districts to assess instructional practices in language and math in order to plan professional development and other supports for learning.

• Texas - Texas Primary Reading Inventory and a social skills test to rate program quality.

• Virginia - Pre-school rating system based on level of teacher training, class size, and expert observation (CLASS: Classroom Assessment Scoring System).

July 1, 2008 6

Recommendations(from the National Early Childhood Accountability Task Force)

• Create a unified system that connects standards, assessments, data, and professional development for teachers.

• Align comprehensive standards, curriculum, and assessments as a continuum pre-K - 3.

• Assure that all child assessments and program evaluations are valid, reliable, and well-suited for their intended purpose.

• ELL students should be evaluated in both their primary language and their language of instruction.

• Adaptations in assessment tools and procedures should be made to allow children with disabilities to participate in the same assessments as their peers to allow a valid assessment of their knowledge and abilities.

July 1, 2008 7

Concerns Associated with EC Accountability Systems:(NECATF)

• Adequacy of tools. Assessments do not cover all domains nor capture normal fluctuations of children’s development.

• Data Integrity: Integrity can be challenged when assessments are administered by individual teachers under conditions of high stakes accountability.

• Sample size: Small schools may exhibit substantial fluctuations due to changes in the characteristics of a few children.

• Investment: More benefits could come from investing to remedy deficiencies in program quality and staff training rather than developing an accountability system at scale.

• Potential consequences: Using child assessment data for high-stakes decisions could lead to serious negative consequences for children as curriculum and instruction narrow to focus more heavily on the assessment measures.

July 1, 2008 8

Potential Positive Effects of High Quality Accountability System:

• Development of aligned student assessments to draw attention to trajectories of children’s progress K - 3.

• Development of aligned institutional assessments for environment, opportunities, instruction, and quality of implementation

• Development of vertical teams of teachers and administrators from each grade/age level to review data, plan and adjust practices, and support children’s continuous progress.

• Development of focused professional development efforts coordinated K - 3.

• Development of a stronger sense of shared responsibility for children’s success across the K - 3 continuum.

July 1, 2008 9

What do we know?

• What are the advantages of early childhood testing?

• What are the limitations?

July 1, 2008 10

Child Assessment Option #1: Observational Tools• Widely used to generate ratings or

estimates of knowledge, skills, or abilities based on performance, behavior or work.

• Criterion referenced (compares the student against criteria/standards)

• Advantages: cover all domains; multiple opportunities to observe over time and various contexts; teachers use this format already for instructional purposes so results need only be standardized and aggregated; risks of ‘teaching to the test’ are minimized because it does not involve individual questions.

• Limitations: assessors must be well trained; possible teacher bias related to culture/language; accuracy of ratings can decline or drift over time; risks of inflating ratings to show rapid progress if results are used to evaluate the program.

July 1, 2008 11

Child Assessment Option #2: Direct Assessment Tools• Standardized “Direct” or “On Demand”

instruments: use a common set of questions or tasks.

• Norm referenced. • An adapted direct approach uses a 2-stage

method, adjusting the difficulty depending on children’s responses to an initial set of items. This is quicker, reduces risk of frustration or boredom, and reduces risks of pre-coaching.

• Advantages: Lower risk of errors based on assessor’s judgment; common set of questions creates perception that results are objective; scope, depth and costs of training are lower.

• Limitations: Assessors must be well trained; children must feel comfortable with the assessor; students must be able to process language well; cultural differences and pedagogical practices may influence how children respond to questions or tasks; do not assess social-emotional goals; reliance on a specific set of questions creates risks that teachers can coach children to inflate outcomes.

July 1, 2008 12

NYC has laid important groundwork• NYC has vertically-aligned

standards-based early childhood assessments (eg, ECLAS 2)

• Inquiry team work provides framework for developing vertical teams to review data, plan and adjust practice

• Expertise exists for coordinated professional development

• Required EC testing has increased shared sense of responsibility for success across the K-12 continuum

July 1, 2008 13

Unpacking ECLAS-2-Kindergarten

July 1, 2008 14

Unpacking ECLAS-2- Grade 1

July 1, 2008 15

Unpacking ECLAS-2 –Grade 2

July 1, 2008 16

Unpacking ECLAS-2 – Grade 3

July 1, 2008 17

Paper bag over our heads• Early childhood schools have

NEVER received the results of backmapping.

• “This is like working with a paper bag over our head. We don’t know how our students do after they leave us.”

• “DIBELS is too easy. More students were on benchmark than I believe.” (Informal backmapping showed that 90% of her former 2nd grade students scored in Levels 3+4 in 2007.)

July 1, 2008 18

Removing the Paper Bag

• Given the issues identified earlier, how well do our Early Childhood assessment tools help us predict how the students will perform on the Grade 3 ELA?

• What are the implications?

July 1, 2008 19

Gr 2 ECLAS 2 Spring 2007 and Grade 3 ELA 2008 (sample school #1; n=65)

• Strong correlations between some ECLAS 2 components and Grade 3 ELA Proficiency Rates

• Reading comprehension, oral fluency and reading accuracy scores are highly correlated

Component Correlation

Decoding .256

Vocabulary .502

Sight Words .567

Reading Comprehension

.755

Oral Fluency .790

Reading Accuracy

.797

July 1, 2008 20

Grade 2 EPAL and Gr 3 ELA• EPAL has a listening, reading and

writing component, each scored in house on a 3 point rubric.

• In a sample school, a very weak relationship was found between EPAL and ELA scores.

• Exploring the mismatches indicated mis-scoring of student responses and misadministration of running records.

• Implication: School has decided to work on grade-level and schoolwide standard setting to ensure that all teachers understand the standards and rubrics in the same way.

July 1, 2008 21

DIBELS (Dynamic Indicators of Basic Literacy Skills)SIX COMPONENTS, Available for Pre-K to Grade 3

• ISF: Initial Sounds Fluency -- The student must select a picture of an object whose name begins with a given phoneme. The teacher monitors both fluency and accuracy. (PreK and K)

• LNF: Letter Naming Fluency -- The student must identify as many upper-case and lower-case letters as possible within one minute. (K and 1)

• PSF: Phoneme Segmentation Fluency -- The teacher says a word aloud, and the student must quickly repeat that word, inserting a clear pause between each phoneme. The student must do this for as many words as possible within one minute. (K and 1)

• NWF: Nonsense Word Fluency -- The student must correctly pronounce as many nonsense words as possible in one minute. (K and 1)

• ORF: Oral Reading Fluency -- The student must read aloud as much of a passage of text as possible in one minute. After reading aloud, the student must also describe or retell the content of the passage of text. (1, 2, 3)

• WUF: Word Use Fluency -- The student is given a word to use in a sentence or to define, and the teacher monitors both the accuracy of the use or definition as well as the number of words the student uses in his or her response. (K, 1, 2, 3)

July 1, 2008 22

Sample School 2

• In this sample school, 37% of its 108 3rd graders scored at or above proficiency on the 2008 Grade 3 ELA.

• Last year, 30% of those 108 scored at benchmark on the Grade 2 DIBELS.

• How well did the DIBELS predict the results?

July 1, 2008 23

Crosstab: Grade 3 ELA 2008 by Grade 2 DIBELS EOY 2007

Bench-mark(low risk)

Strategic

(medium risk)

Intensive (high

risk)

Total

Lvl 1

1 5 12 18(17%)

Lvl 2

9 11 29 49(45%)

Lvl 3+4

22 15 4 41(37%)

Total

32(30%)

31(29%)

45(42%)

108

Grade 3 ELA Level

DIBELS Instructional Recommendation

July 1, 2008 24

What do you notice?

July 1, 2008 25

Statistically Significant Relationship• The relationship between

DIBELS scores and the Gr 3 ELA proficiency rates is statistically significant.

• Chi-square between categories is 39.766 (significant at .000 level)

• Correlation between actual ORF score and ELA proficiency rate is .578** (significant at .000 level)

July 1, 2008 26

But is it meaningful? • 4 students who were “high risk”

scored in Level 3. Here’s how their teachers explained these false negatives.

• “I taught the students to be careful readers, to self correct and monitor for sense.”

• “I am interested in increasing their comprehension, not their fluency. I couldn’t in good conscience give them different directions for the test.”

• These false negatives suggest that the results can be muddied when the assessment is misadministered.

July 1, 2008 27

What about false positives?• Half of the false positives (low risk on

DIBELS but scored as level 2s on ELA) were ELLs. They took the DIBELS as instructed (read as fast as you can; don’t worry about mistakes). Oral reading is not connected to comprehension.

• The other two students were described as “very bright” but had motivational issues and were “uninterested” in the ELA.

• The high representation of ELLs among the false positives suggests that interpreting results for ELLs requires additional considerations.

July 1, 2008 28

DIBELS Gr 3 October Oral Reading Fluency and January ELA Proficiency Rate (sample school 3)

BOY ORF with Gr 3 ELA Proficiency Rate

0

20

40

60

80

100

120

140

160

180

1 1.5 2 2.5 3 3.5 4 4.5 5

Proficiency Rate

DIB

ELS

Ora

l Rea

ding

Flu

ency

Sco

re

Gr 3 ELA Proficiency Rate 2008

Oct

2007

ORF

Score

July 1, 2008 29

DIBELS Gr 3 October Retell Fluency and January ELA Proficiency Rate

RTF

0

50

100

150

200

250

1 1.5 2 2.5 3 3.5 4 4.5 5

RTF

Gr 3 ELA Proficiency Rate 2008

Oct

2007

RTF

Score

July 1, 2008 30

What about DIBELS in Grade 1?• At a Reading First school, we

were able to backmap September 2005 DIBELS scores (beginning 1st grade) for 129 current 3rd graders.

• 51% were at benchmark on DIBELS in Grade 1; 50% were proficient on the ELA in Grade 3.

• Did their BOY 1st grade DIBELS scores (LNF, NWF, PSF) allow us to predict how well they would perform on the Grade 3 ELA?

July 1, 2008 31

Crosstab: Gr 3 ELA by Gr 1 DIBELS (sample school #3, n=66)

DIBELS

ELA ↓

Benchmark(low risk)

Strategic

(medium risk)

Intensive

(high risk)

Total

Lvl 1 35%

413%

1547%

22(17%)

Lvl 2 1422%

1753%

1031%

41(32%)

Lvl 3+4

4873%

1134%

722%

66(51%

)

Total 100%(50%

)

100%(25%)

100%(25%)

100%

Grade 3 ELA Level

DIBELS Instructional Recommendation

July 1, 2008 32

Running Records – Fairly AccurateIn a sample of 83 students, where

60 were proficient on the ELA:• All Level 4 students were above

the F&P benchmark• 63% of Level 3 students were at

or above the F&P benchmark• 5% of Level 2 students were at

the F&P benchmark• No Level 1 students approached

the benchmark

July 1, 2008 33

Reading and Writing Continuums• Reading and writing progress is best

monitored through use of continuums, keeping track of date that particular behaviors associated with particular levels is either observed or evidenced in written samples.

• Certain behaviors associated with a higher level may be observed before the child actually achieves that level.

• These continuums are useful especially for tracking progress in Kindergarten, where teachers often fail to push students who have advanced skills after attending high quality pre-K schools.

July 1, 2008 34

Data findings:• The Grade 1 DIBELS components, especially the

LNF, is a strong predictor of 3rd grade outcomes. • The reliability of the Grade 2 DIBELS component,

the ORF, could not be determined due to misadministration. Word calling ≠ comprehension: fluency in early grades is not necessarily a good predictor.

• Well-administered running records are good predictors. Mismatches between running records and ELA scores suggest inconsistent administration of running records (lenient scoring of comprehension questions and oral retells).

• Inquiry Team relevance: DIBELS results, running records and ECLAS 2 can all provide very important formative assessment information for guiding instruction and targeting skills, provided they are administered properly. An investment in observing administration will generate benefits to schools.

• Dilemma: For accountability purposes, assessments requiring assessor judgment may be best administered by someone other than the teacher, but this can create stress for the child and lead to unreliable administration.

July 1, 2008 35

Improving data use• Expand initial exploration: The correlation and

regression of citywide DIBELS, ECLAS 2 and other EC assessment data with the Grade 3 ELA results should be conducted to verify whether the patterns explored here are replicated on a broader scale.

• Data transparency: Schools should receive information about the relationship between EC assessment results and the ELA to build their confidence in these instruments and become more vigilant about administration.

• Use our online tools! Both ECLAS 2 and DIBELS results are available on line through very user-friendly interfaces from WGEN.

• Study the outliers: the false negatives will help us generate hypotheses about how to beat the odds; the false positives will teach us to look in more nuanced ways.

• Evaluate students and programs: It is important to collect both student assessment data and data on program quality.

July 1, 2008 36

Implications

• What are the implications of this information for your work with inquiry teams?

July 1, 2008 37

Resources

• Database of EC Assessments– http://

www.sedl.org/reading/rad/database.html

• “Taking Stock: Assessing and Improving Early Childhood Learning and Program Quality” - The Report of the National Early Childhood Accountability Task Force

http://www.sedl.org/reading/rad/database.html



july 1, 20081 early childhood assessment and accountability presented by anita skop, bill stroud and...

Documents

ec assessments

child assessments

early childhood assessment

ec program accountability

institutional assessments

ec accountability systems

professional development

valid assessment