the big-f ish-little-p ond ef fect: persistent negativ e...

39
The Big-Fish-Little-Pond Effect: Persistent Negative Effects of Selective High Schools on Self-Concept After Graduation Herbert W. Marsh University of Oxford Ulrich Trautwein Oliver Lüdtke Jürgen Baumert Max Planck Institute for Human Development Olaf Köller Humboldt University According to the big-fish-little-pond effect (BFLPE), attending academically selective high schools negatively affects academic self-concept. Does the BFLPE persist after graduation from high school? In two large, representative sam- ples of German high school students (Study 1: 2,306 students, 147 schools; Study 2: 1,758 students, 94 schools), the predictive effects of individual achievement test scores and school grades on math self-concept are very pos- itive, whereas the predictive effects of school-average achievement are nega- tive (the BFLPE). Both studies showed that the BFLPE was substantial at the end of high school and was still substantial 2 years (Study 1) or 4 years (Study 2) later. In addition, because of the highly salient system of school tracks within the German education system, the authors are able to show that neg- ative effects associated with school type (highly academically selective schools, the Gymnasium) were similar—but smaller—than the BFLPE based on school-average achievement. KEYWORDS: academically selective schools, big-fish-little-pond effect, frame- of-reference effects, German educational system, grade-on-a-curve effect, multilevel modeling, self-concept T he concept of self is one of the oldest (James, 1890/1963) and most impor- tant constructs in the social sciences (Branden, 1994; Marsh & Craven, 2002, 2006). However, psychologists from the time of William James have rec- ognized that objective accomplishments are evaluated in relation to frames of reference (also see Festinger, 1954). Thus, James (1890/1963) indicated, “We American Educational Research Journal Month XXXX, Vol. XX, No. X, pp. X –X DOI: 10.3102/0002831207306728 © AERA. http://aerj.aera.net [AERJ306728_correx2]

Upload: trinhnga

Post on 21-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

The Big-Fish-Little-Pond Effect: PersistentNegative Effects of Selective High Schools

on Self-Concept After Graduation

Herbert W. MarshUniversity of Oxford

Ulrich TrautweinOliver Lüdtke

Jürgen BaumertMax Planck Institute for Human Development

Olaf KöllerHumboldt University

According to the big-fish-little-pond effect (BFLPE), attending academicallyselective high schools negatively affects academic self-concept. Does the BFLPEpersist after graduation from high school? In two large, representative sam-ples of German high school students (Study 1: 2,306 students, 147 schools;Study 2: 1,758 students, 94 schools), the predictive effects of individualachievement test scores and school grades on math self-concept are very pos-itive, whereas the predictive effects of school-average achievement are nega-tive (the BFLPE). Both studies showed that the BFLPE was substantial at theend of high school and was still substantial 2 years (Study 1) or 4 years (Study2) later. In addition, because of the highly salient system of school trackswithin the German education system, the authors are able to show that neg-ative effects associated with school type (highly academically selective schools,the Gymnasium) were similar—but smaller—than the BFLPE based onschool-average achievement.

KEYWORDS: academically selective schools, big-fish-little-pond effect, frame-of-reference effects, German educational system, grade-on-a-curve effect,multilevel modeling, self-concept

The concept of self is one of the oldest (James, 1890/1963) and most impor-tant constructs in the social sciences (Branden, 1994; Marsh & Craven,

2002, 2006). However, psychologists from the time of William James have rec-ognized that objective accomplishments are evaluated in relation to frames ofreference (also see Festinger, 1954). Thus, James (1890/1963) indicated, “We

American Educational Research JournalMonth XXXX, Vol. XX, No. X, pp. X –X

DOI: 10.3102/0002831207306728© AERA. http://aerj.aera.net

[AERJ306728_correx2]

have the paradox of a man shamed to death because he is only the secondpugilist or the second oarsman in the world” (p. 310). Marsh (1974, 1984b,1991, 1993, 2005; Marsh & Craven, 2002; Marsh & Parker, 1984) proposed thebig-fish-little-pond effect (BFLPE) to encapsulate frame-of-reference effects ineducational settings.

According to the BFLPE model, students compare their own academicability with the academic abilities of their classmates and use this social com-parison impression as one basis for forming their own academic self-con-cepts. A negative BFLPE (a contrast effect) occurs where equally ablestudents have lower academic self-concepts when they compare themselvesto more able students and higher academic self-concepts when they com-pare themselves to less able students. For average-ability students whoattend a school where the average achievement level of other students ishigh (hereafter referred to as a high-ability school), their academic abilities

Marsh et al.

2

HERB ERT W. MARSH is a professor of education at Oxford University, Department ofEducation, Oxford OX2 6PY UK; e-mail: [email protected]. He is widelypublished (350 articles in 70 journals, 55 chapters, 13 monographs, 350 conferencepapers) and coedits the International Advances in Self Research monograph seriesand is an ISI highly cited researcher. He founded the SELF Research Centre, withmembers and satellite centers around the world. His areas of specialization includeself-concept, students’ evaluation of university teaching, and application of advancedquantitative research methods to diverse areas of education and psychology.

ULRICH TRAUTWEIN is a research scientist at the Center for Educational Research atthe Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin,Germany; e-mail: [email protected]. His main research interests includethe effects of various characteristics of the learning environments on student achieve-ment, self-concept, and interest.

OLIVER LÜDTKE is a research scientist at the Center for Educational Research at the Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin,Germany; e-mail: [email protected]. His main research interests includethe modeling of effects of classroom factors on individual development and the roleof personality development in school and university contexts.

JÜRGEN BAUMERT is the director of the Center for Educational Research at the Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin,Germany; e-mail: [email protected]. His major research interestsinclude research on teaching and learning, large-scale assessment, cognitive andmotivational development in adolescence, and educational institutions as develop-mental environments.

OLAF KÖLLER is a full professor of educational research and director of theInstitute for Educational Progress at Humboldt University Berlin, Unterden Linden 6,D-10099 Berlin, Germany; e-mail: [email protected]. His main research inter-ests include educational assessment, the association between school achievementand motivation, and social comparison processes.

HERBERT /

[cap]

[author names should all have full cap initial

cap for first letter of first and last name and

small caps for other letters -- see other

AERJ articles for examples.]

are below the average of other students in their school. According to theBFLPE, this educational context will foster social comparison processes lead-ing to academic self-concepts that are lower than if the same studentsattended an average-ability school. Conversely, if these students attend alow-ability school, their abilities will be above average in relation to otherstudents in the school, and the social comparison processes will result inhigher academic self-concepts. Hence, academic self-concepts depend notonly on a student’s academic accomplishments but also on the accomplish-ments of those in the educational setting that the student attends.

The BFLPE is typically operationalized (see Marsh & Craven, 2002) as apath model (see Figure 1A) in which the effects of individual student achieve-ment on academic self-concept are predicted to be positive and the effects ofschool-average achievement are predicted to be negative. Empirical supportfor this negative effect of school-average achievement on academic self-con-cept (the BFLPE) comes from numerous studies based on a variety of experi-mental and analytical approaches (see reviews by Marsh, 2005; Marsh &Craven, 2002). Good support for the cross-cultural generalizability of theBFLPE comes from studies conducted in Germany (e.g., Jerusalem, 1984;Marsh, Köller, & Baumert, 2001), Hong Kong (e.g., Marsh, Kong, & Hau, 2000),Israel (e.g., Zeidner & Schleyer, 1999), the United States (e.g., Marsh, 1991),and Australia (e.g., Marsh, Chessor, Craven, & Roche, 1995). Whereas mostresearch has been limited to results from a single country, Marsh and Hau(2003) tested the cross-cultural generalizability of the BFLPE with nationallyrepresentative samples of approximately 4,000 15-year olds from each of 26countries who completed the same self-concept instrument and achievementtests as part of the Organization for Economic Cooperation and Development(OECD) Program for International Student Assessment (PISA) study. Consistentwith a priori predictions, the standardized effects of individual studentachievement were substantial and positive (.38), and the effects of school-average achievement were smaller and negative (–.21). Although the resultsvaried somewhat from country to country, this variation was small.

Distinctive Characteristics of the BFLPE

Multidimensionality and Domain Specificity

In self-concept research, there is growing support for a multidimensionalperspective of self-concept rather than a unidimensional perspective thatplaces primary reliance on a single global component of self-concept or self-esteem (Marsh & Craven, 2006). Particularly in educational settings, there isincreasing evidence that global self-esteem is nearly unrelated to a wide vari-ety of educational outcomes (Marsh, 1993; Marsh & Craven, 1997). Consistentwith theoretical predictions and this growing support for a multidimensionsalperspective, the BFLPE is very specific to academic components of self-concept. Marsh and Parker (1984; Marsh, 1987) showed that there were largenegative BFLPEs for academic self-concept, but little or no BFLPEs on general

Stability of the BFLPE

3

self-concept or self-esteem. Marsh et al. (1995) reported two studies of theeffects of participation in gifted and talented programs on different compo-nents of self-concept over time and in relation to a matched comparison group.There was clear evidence for negative BFLPEs in that the academic self-con-cepts of students in the gifted and talented programs declined over time andin relation to the comparison group. These BFLPEs were consistently large foracademic components of self-concept but were small and largely nonsignifi-cant for four nonacademic self-concepts and for global self-esteem. Whereasdifferent studies have evaluated the BFLPE for different academic domains, themost frequent tests are based on a global measure of academic self-conceptor on math self-concept (see review by Marsh & Craven, 2002).

Relation to Individual Student Ability

Do both the best and poorest students suffer the BFLPE? Marsh (1984a,1987, 1991, 1993; Marsh et al., 1995; Marsh & Rowe, 1996) argued that attend-ing selective schools should lead to reduced academic self-concepts for students of all achievement levels, based on several different theoretical per-spectives. In their review of BFLPE research, Marsh and Craven (2002; alsosee Marsh & Hau, 2003) concluded that there is little evidence that the sizeof the BFLPE varies systematically with individual student ability levels.

Juxtaposition Between Standardized Test Scores, School Grades, and Grading on a Curve

It is important that tests of the BFLPE are based on achievement indica-tors that vary along a metric that is common to all individual students, classes,and schools. For this reason, BFLPE studies typically use standardized achieve-ment tests as the basis of both individual student and school-average achieve-ment. This, however, creates a potential dilemma in that there is a growingbody of research demonstrating that academic self-concept is more stronglyrelated to school-based measures (e.g., school grades) than to standardizedachievement (Marsh, 1993; Marsh & Craven, 2006). This follows in that schoolgrades are a more direct source of feedback to students about their academicaccomplishments than test scores are—particularly ones that are not part ofthe formal assessment process for which students do not prepare and do noteven receive feedback about their results. Also, school grades are likely to beresponsive to motivational processes (e.g., effort, conscientiousness, appro-priate preparation, self-regulation, etc.) that mediate the relation between self-concept and achievement—particularly when such characteristics are part ofthe basis for assigning school grades. However, school grades typically reflectan idiosyncratic metric that varies depending on the particular teacher, class,or school. In particular, there is a widespread grading-on-a-curve phenome-non such that the percentage of students getting each grade does not varymuch in different schools—even when there are substantial differences in theability levels of students in the different schools; equally able students are

Marsh et al.

4

likely to get lower grades in schools where the school-average achievementlevels are high than in schools where achievement levels of other students are low.

An early BFLPE study (Marsh, 1987) shows how this grading-on-a-curvephenomenon influences the BFLPE. Based on a large representative sampleof U.S. high schools, Marsh (1987; also see Marsh & Rowe, 1996) noted thatthe BFLPE and the grading-on-a-curve effect have a similar rationale and aremutually reinforcing. He found that the effects of school-average achievementon academic self-concept were negative. The results also clarified the dis-tinction between academic ability and grade-point average, their respectiveinfluences on self-concept, and how this influenced the BFLPE. Consistentwith the grading-on-a-curve effect, equally able students have lower grade-point averages in high-ability schools than in low-ability schools. This phe-nomenon is so widespread that university admission officers have to take itinto account when evaluating the high school transcripts of university appli-cants (Espenshade, Hale, & Chung, 2005). However, Marsh demonstrated thatthis frame-of-reference effect influencing grade-point average was separatefrom, but contributed to, the BFLPE on academic self-concept. More recently,Trautwein, Lüdtke, Marsh, and Köller (2006) demonstrated a similar phe-nomenon in German schools.

Stability and Persistence of Effects

A critical issue with theoretical and practical implications is whetherBFLPEs are short term and ephemeral or stable and long lasting. In responseto the Marsh and Hau (2003) OECD PISA study, both Dai (2004) and Pluckeret al. (2004) suggested that the BFLPE might be temporary. Marsh and Haureviewed a number of studies suggesting that the size of the BFLPE remainedstable or even increased in size over time for students who remained in thesame selective school setting. In evaluating the stability of the BFLPE over time(see Figure 1B), school-average ability is related to academic self-concept, col-lected on at least two occasions. Hence, the critical effects are the total, direct,and mediated effects of school-average ability on academic self-concept atTime 2 (T2). If the BFLPE were highly stable over time, then most of the effectof school-average ability on T2 self-concept would be mediated through itseffect on Time 1 (T1) self-concept so that the direct effects of school-averageability on T2 self-concept (represented by the corresponding path coefficientin Figure 1B) would be close to zero. The direct effect of school-average abil-ity could be positive even though the total effect was negative. This wouldindicate that the negative indirect effect of school-average ability that wasmediated through T1 self-concept was offset to some extent by a positive directeffect. To the extent that the direct effect of school-average ability is negative,it would mean that there is a new (additional) negative effect of school-average ability beyond the negative effect that can be explained in terms ofT1 academic self-concept.

Stability of the BFLPE

5

The Marsh, Köller, and Baumert (2001) study was particularly importantin demonstrating the temporal evolvement of the BFLPE. In 1991, former Eastand West German students experienced a remarkable social experiment—thereunification of very different school systems after the fall of the Berlin Wall.Self-concept scores were collected at the start, middle, and end of the firstschool year after reunification. East German students had not previously beengrouped according to achievement. For them, over the three data collectionwaves, the BFLPE was initially small, then moderate, and then substantial bythe end of the year. West German students had attended schools based onachievement grouping for the 2 years prior to reunification. For them, theBFLPE was substantial at all three times. A large East-West difference in thesize of the BFLPE at the start of the year disappeared completely by the endof the year. The BFLPE was stable for West German students who had previ-ously been in academically selective schools, but for East German studentswho had previously been in heterogeneous, mixed-ability schools, the sizeof the BFLPE grew steadily larger during the first year of school following theintroduction of an ability-tracked school system.

Marsh et al.

6

Figure 1. The Big-Fish-Little-Pond Effect (BFLPE). Theoretical predictions: (A)traditional analysis based on a single wave of data, and (B) stability of BFLPEsover time based on longitudinal data.

Time 1 individualstudent

achievement

Time 1 individualstudent

achievement

Time 1 school-average

achievement

Time 1 individualstudent academic

self-concept

+ +

+ + -

Positive effects of Time 1 achievement on academic self-concept

negative effects of Time 1 school-averageachievement on academicself-concept (BFLPE)

A

Time 1 school-average

achievement

+ +

+ +-

Positive effects of Time 1 achievement on academic self-concept

Negative effectsof Time 1 school-averageachievement on academicself-concept (BFLPE)

Time 2 individualstudent academic

self-concept

Time 1 individualstudent academic

self-concept + +

???

???

B

In a large Hong Kong study of students entering selective schools inGrade 7 (Marsh, Kong & Hau, 2000), there was a substantial negative effectof school-average ability in Grade 9 even after controlling for the substantialnegative effects in Grade 8. Also, in the large U.S. High School and BeyondStudy, Marsh (1991) demonstrated that for many outcomes there were newnegative effects of school-average achievement at the end of high school evenafter controlling for those already experienced earlier in high school. Thisresearch suggests that the BFLPE is likely to increase in size during the initialperiod of adjustment after students are introduced to a major shift in the frameof reference and provides no evidence that the effect declines during theperiod students are in the same frame of reference.

Clearly, as long as students remain in the same high school and school-average achievement is relatively stable so that the immediate frame of refer-ence remains reasonably stable, there is ample evidence that the BFLPEpersists or may even increase in size. This is not surprising and is consistentwith the rationale underpinning the imposed social comparison paradigmposited by Diener and Fujita (1997). Thus, for example, in classic research onconformity to group norms, Sherif and Hovland (1961) reported that experi-mentally induced changes in the frame of reference were still evident whenparticipants subsequently made judgments after they were no longer part ofthe original group. However, a more demanding challenge is to evaluate thestability of the BFLPE on academic self-concept several years after graduationfrom high school, when the frame of reference based on other students fromthe same high school is not so salient and is no longer imposed by the imme-diate context. The current study is apparently unique in evaluating the sta-bility of the BFLPE formed in high school for self-concept measures collectedat the end of high school and several years after graduation.

The Present Investigation

Our major focus is to evaluate the long-term stability and persistence ofthe BFLPE. Based on two large, representative studies of German high schoolstudents, we evaluate the effects of high-school-average achievement levelson academic self-concepts at the end of high school and several years aftergraduation, a time interval of 2 years in Study 1 and 4 years in Study 2. Inaddition, because of the highly salient system of school tracking within theGerman education system, we are able to contrast the effects of school-aver-age achievement with those of school type (i.e., the effects associated withattending highly academically selective schools, called the Gymnasium in theGerman system).

Although the primary focus of our study is on individual and school-average achievement based on a standardized achievement test, we alsoexamine the effects of school grades. Of particular interest is the extent towhich the negative effect of school-average achievement on academic self-concept can be explained by the typical grade-on-a-curve phenomenon inwhich equally able students receive lower grades in more academicallyselective settings (Espenshade et al., 2005; Marsh, 1987).

Stability of the BFLPE

7

An important feature of this research is the appropriate implementationof a multilevel model analysis that is suitable for hierarchical data, in whichparticipants (e.g., students) are nested with naturally occurring groups (e.g.,schools). Taking advantage of the appropriate tests of cross-level interactionsthat are possible in multilevel models, we evaluate the generalizability of theBFLPE by testing the extent to which the negative effects of school-averageachievement levels and school type (school-level variables) interact with indi-vidual student gender and student achievement levels (individual student-levelvariables).

Based on research reviewed earlier (also see Figure 1), we propose thefollowing a priori hypotheses and research questions to guide our analysesand presentation of results:

1. Both individual student achievement and school grades at the end of highschool have positive effects on academic self-concept at T1 (the end of highschool) and T2 (after graduation from high school). We ask whether additionaleffects of T1 achievement on T2 academic self-concept remain after control-ling for the effects of T1 academic self-concept (i.e., how changes in academicself-concept over time are related to prior levels of achievement).

2. The effects of high-school-average achievement and high school type are neg-ative for academic self-concept at T1 and T2. The effects of school-averageachievement are predicted to be systematically larger than school-type effectsand to be evident even after controlling for school type. However, we pose aresearch question regarding the size and direction of school-type effects aftercontrolling for school-average achievement—whether the effects are merelydiminished substantially, as might be expected if school-type functions as arough, dichotomous proxy measure of school-average achievement, or whetherattending a prestigious school type contributes positively to self-concept (assim-ilation) after controlling for the negative BFLPE (contrast) associated withschool-average achievement. We also pose the research question of whetherthese effects of school-average achievement and school type on T2 academicself-concept remain after controlling for the effects of T1 academic self-concept(i.e., how changes in academic self-concept over time are related to school-aver-age achievement and school type).

3. School grades are a salient source of feedback to students about their academicaccomplishments. Because teachers typically grade on a curve, there is a frame-of-reference effect for school grades (equally able students get lower schoolgrades in schools and classes where other students are particularly able) that issimilar to the BFLPE. Based on Marsh (1987), we predict that the BFLPE will bepartially mediated by school grades—that the sizes of the negative effects ofschool-average achievement and school type are expected to be statistically sig-nificant but smaller after controlling for the effects of school grades.

4. As a research question, we ask whether the negative effects of school-averageachievement and school type are generalizable over responses by boys and girlsand over different levels of academic achievement. Although cross-level inter-actions between these school-level variables and individual student-level vari-ables are typically small (see Marsh & Craven, 2002), there is not a sufficientbasis for making directional hypotheses. However, the generalizability of theBFLPE across different groups of students has potentially important practicalimplications for understanding the BFLPE.

Marsh et al.

8

Study 1

Method

The German School Setting

The German school system is well known for its early and selective dif-ferentiation of students in different school types. Selection takes place afterGrade 4 (in a few states after Grade 6), when students are about age 10.Although there is considerable variation across the German states in terms ofthe number and quality of tracks (Baumert, Trautwein, & Artelt, 2003;Trautwein et al., 2006), the majority of states have adopted a variant of thetripartite system of Hauptschule (least-demanding track), Realschule (inter-mediate track), and Gymnasium (highest track). Hauptschule and Realschulestudents graduate after Grade 9 or 10 and then typically enter the dual sys-tem, which combines part-time education at vocational school with on-the-job training. Students from traditional Gymnasium graduate after Grades 12or 13, which is a prerequisite for university entrance. To overcome the strictdifferentiation between Gymnasium and the other school types, severalGerman states now allow the best students from different tracks to enterupper-secondary education at the vocational Gymnasium and the compre-hensive school in order to qualify for university. Approximately 30% of stu-dents in the German school system qualify to go to university. Becausenon-Gymnasium students leave secondary school after Grade 9 or 10, it wasnot possible to include these students in this investigation.

Description of the Study and Sample

The data for Study 1 come from a large, ongoing German study (Trans-formation of the Secondary School System and Academic Careers, or TOSCA),conducted at the Max Planck Institute for Human Development, Berlin, andthe Institute for Educational Progress at the Humboldt University, Berlin (seeKöller, Watermann, Trautwein, & Lüdtke, 2004). The data considered here arebased on students from 147 randomly selected upper secondary schools in asingle German state. The schools are representative of the traditional and voca-tional Gymnasium school types, which provide students with the qualificationsto attend university. Students in the traditional Gymnasium are a particularlyhighly selected sample in terms of academic achievement.

A multistage sampling procedure was implemented to ensure that thedata were representative. Schools and students were randomly selected. Theparticipation rate at the school level was 99%, and a satisfactory participationrate of more than 80% was achieved at the student level. At T1, the studentsin the sample were in their final year of upper secondary schooling, with amean age of 19.51 years (SD = 0.77). Two trained research assistants adminis-tered materials in each school between February and May 2002. Students par-ticipated voluntarily, without any financial reward. At T1, all students wereasked to provide written consent to be contacted again later for a second wave

Stability of the BFLPE

9

of data collection. At T2, participants were asked to complete an extensivequestionnaire, taking about 2 hours, in exchange for a financial reward of 10!(about U.S. $12). Because the focus of this investigation is on the stability ofeffects over time, actual analyses are based on responses by the 2,306 (48%)students who completed the math self-concept instrument at T1 (final year ofhigh school) and T2 (2 years after graduation from high school). However, T1school-average achievement was based on responses by all students from eachof the 147 schools who completed the achievement tests administered at T1.Comparing students in the final sample with those in the original sample, stu-dents responding at T2 were more likely to be female, to be younger in age,to be more mathematically able based on both test scores and school grades,to come from a more academically selective school, and to have a higher mathself-concept (all effect sizes were modest, varying between .1 and .3).

Instruments

Math self-concept. Math self-concept was based on the German adapta-tion of the Self Description Questionnaire III, a multidimensional self-conceptinstrument for late adolescents and young adults based on the Shavelson,Hubner, and Stanton (1976; Marsh & Shavelson, 1985) model. In the Germanadaptation (Schwanzer, Trautwein, Lüdtke, & Sydow, 2005), four researcherswith English as a second language translated all original items independentlyof each other. Because the translation was intended to be used with a diversesample of students, there was an emphasis on using simple, common wordsthat would be easily understood by all students. Subsequently, using the assis-tance of a professional translator, the most appropriate translation was chosen(and in some instances refined). Extensive pilot testing resulted in a shortGerman instrument with four items per scale with a 4-point (disagree to agree)response format. The four items chosen per scale emphasized cognitive eval-uations (e.g., “I’m good at mathematics”) rather than affective items (e.g., “Ilike mathematics”) and had the highest factor loadings on their respective fac-tors. The math self-concept items were administered at both T1 and T2. Marsh,Trautwein, Lüdtke, Köller, and Baumert (2006) demonstrated strong supportfor the convergent and discriminant validity of responses to this math self-con-cept scale. Thus, for example, math self-concept was substantially positivelyrelated to math school grades (.71), math standardized achievement test scores(.59), and taking advanced math courses (.51) but was nearly unrelated or evennegatively related to English and German outcomes.

Academic outcome measures. The mathematics achievement test consistedof original items from the Third International Mathematics and Science Study(TIMSS; e.g., Baumert, Bos, & Lehmann, 2000). A total mathematics achieve-ment score was constructed using the original metric of the TIMSS study.Although the test was low stakes (i.e., the results had no consequences and student received no feedback on the results), seperate experimental studies

Marsh et al.

10

separate

with random assignment to conditions showed that the addition of rewardscontingent on informational feedback, grading, and performance had no effecton test performances (Baumert & Demmrich, 2001). However, in a separateexperimental study with random assignment to conditions, Baumert andDemmrich (2001) showed that the addition of rewards contingent on informa-tional feedback, grading, and performance had no effect on test performances.

All schools used a school grade scale that ranged between 0 points (verylow achievement) to 15 points (very high achievement). Grades were basedon the school grades that students had received on their report cards that cov-ered work completed over approximately 6 months. Actual grades (based onofficial school records) were available for students from a majority—but notall—of schools considered in Study 1. The correlation between self-reportedgrades and documented grades was r = .93 (p < .001) for students who hadnonmissing values for both constructs. However, to minimize missing valuesand to maximize the similarity between Studies 1 and 2 (actual grades werenot available in Study 2), self-reported grades were used in both studies.School-average measures of achievement were always based on test scoresrather than on school grades.

Statistical Analyses

Recent BFLPE studies (e.g., Lüdtke, Köller, Marsh, & Trautwein, 2005;Marsh & Hau, 2003) have used multilevel modeling approaches. In most stud-ies conducted in school settings, individual participant characteristics are con-founded with those associated with groups (e.g., classrooms, schools, etc.)because such groups typically are not established according to randomassignment. This clustering effect entails problems with respect to the appro-priate levels of analysis, aggregation bias, and heterogeneity of regression(Raudenbush & Bryk, 2002). Participants in the same group are typically moresimilar to other participants in the same group than they are to participants inother groups. Even when participants are initially assigned at random, theytend to become more similar to each other over time. Furthermore, a variablemay have a very different meaning when measured at different levels. Forexample, the BFLPE research reviewed here suggests that a measure of abilityat the student level provides an indicator of a student attribute, whereasschool-average achievement at the school level becomes a proxy measure ofa school’s normative environment. Thus, the average achievement of a schoolhas an effect on student self-concept above and beyond the effect of the indi-vidual student’s ability. Multilevel modeling is designed to resolve the con-founding of these two effects by facilitating a decomposition of anyobserved relationship among variables, such as self-concept and ability, intoseparate within-school and between-school components (see Goldstein, 2003;Raudenbush & Bryk, 2002; Snijders & Bosker, 1999). The juxtaposition of theeffects of individual achievement and class-average achievement is inherentlya multilevel issue that cannot be represented adequately at either the individual

Stability of the BFLPE

11

or the classroom level. A detailed presentation of multilevel modeling (alsoknown as hierarchical linear modeling) is available elsewhere (e.g., Raudenbush& Bryk, 2002; Snijders & Bosker, 1999).

By basing the analyses on responses by participants with both T1 andT2 data, there were very few missing data (less than 1% missing responses).Although missing data was not an important problem for the final sample,we used the expectation maximization algorithm, a widely recommendedapproach to imputation for missing data, as operationalized in SPSS (Version11.5) to impute missing values for T1 math self-concept (T1MSC), mathgrades, and math test scores. This procedure is preferable to traditional pro-cedures such as listwise and pairwise deletion for missing data.

Based on strong theoretical models, it is appropriate and desirable toposit causal effects and to test causal predictions based on appropriate sta-tistical analyses. However, even in rigorous studies, causality cannot beproven; it can only be shown that the data are consistent with strong tests ofcausality. Hence, we have been careful to use the generic term predictiveeffect, which does not imply causality, when discussing the actual results ofour statistical analyses.

Multilevel Models

All multilevel analyses in this investigation were conducted with the com-mercially available MLwiN statistical package (Version 2; see Rasbash, Steele,Browne, & Prosser, 2005; also see Goldstein, 2003) that is specifically designedto analyze multilevel data. To test the a priori predictions and research ques-tions, a set of eight multilevel models was evaluated. In each model, eitherT1MSC or T2 math self-concept (T2MSC) was the dependent variable. Theoverall strategy was to begin with the most basic tests of the BFLPE and thenevaluate the consequences of adding variables or interaction terms in relationto a priori predictions or research questions. In this sense, all the models area priori models.

In Models 1, 2, and 3, we tested the effects of the BFLPE based on indi-vidual achievement test scores in combination with school-average achieve-ment (Model 1, the traditional basis of the BFLPE), school type (1 = traditionalGymnasium, a selective school; 0 = other), and the combination of bothschool-average achievement and school type. The purpose of these compar-isons was to determine whether attending a selective school has a positive(assimilation) effect on self-concept that offsets the negative (contrast) effectof school-average achievement.

For each of these models, separate analyses were conducted for T1MSC(Models 1A, 2A, and 3A), T2MSC (Models 1B, 2B, and 3B), and T2MSC withT1MSC included as a predictor variable (Models 1C, 2C, and 3C). The pur-pose of these models was to compare the sizes of effects near the end of highschool (T1) and several years after graduation from high school (T2) and todetermine whether the effects of school-average achievement continued tohave negative effects at T2 after controlling for the effects observed at T1.

Marsh et al.

12

Models 4, 5, and 6 paralleled the first three models except that theyincluded both achievement tests scores (as in the first three models) and highschool grades as predictor variables. The purpose of these models was to compare the effects of test scores and school grades on academic self-concept at T1, T2, or T2 controlling for T1. Particularly relevant is the extentto which the BFLPE associated with school-average achievement is mediatedby school grades.

In Model 7, we explore whether the size of the BFLPE based on school-average achievement is moderated by gender or individual achievement,whether the effect is larger or smaller for girls than for boys, and whether theeffect is larger or smaller for students who are initially more able. This isaccomplished by adding appropriate main and interaction effects to those pre-dictor variables in Model 4. In Model 8, we conduct parallel analyses in whichthe BFLPE is based on school type (selective vs. nonselective) rather thanschool-average achievement.

To facilitate the interpretation of the results and to reduce potential mul-ticollinearity problems, we began by standardizing (z scoring) all variables tohave M = 0 and SD = 1 across the entire sample (Appendix, Study 1). School-average achievement was determined by taking the average of achievementscores for students in each school (but not restandardizing these scores so thatindividual student and school-average achievement scores were in the samemetric). Product terms were used to test interaction effects. In constructingthese product variables, we used the product of standardized (z score) vari-ables, but the product terms were not restandardized. In this respect, all para-meter estimates based on multilevel models in this investigation arestandardized parameter estimates in which the standardized metric is based onthe metric of the individual-level variables (for further discussion, see Marsh &Rowe, 1996; also see Aiken & West, 1991; Raudenbush & Bryk, 2002). Totaleffects can be divided into direct and indirect (or mediated) effects when thereis a potential mediating variable. In this investigation, for example, we predictthat the negative effects of school-average ability on T2MSC will be largelymediated by T1MSC. It is, of course, relevant to evaluate the statistical signifi-cance of mediated effects using appropriate procedures such as those outlinedby Krull and MacKinnon (2001) that are applied to multilevel models. In thisinvestigation, whenever we discuss mediated effects, we tested the statisticalsignificance of these effects using what Krull and MacKinnon (p. 259) refer toas the 2-1-1 model in which the first variable in the causal chain is a group-level variable (school-average ability in this investigation).

Preliminary Analyses: School-Level Variation in Test Scores, Grades, and Self-Concept

In preliminary analyses, we evaluated the multilevel structure of mathachievement, math school grades, and math self-concept. Consistent with apriori predictions based on previous research, there was substantial variationbetween schools in terms of math achievement but substantially less variation

Stability of the BFLPE

13

between schools in math school grades and particularly in math self-concept.These differences are illustrated in terms of the caterpillar plots (Figure 2).Thus, for example, in Figure 2A, the 147 schools (each school is representedby one of the 147 vertical lines) are ranked in terms of math achievement, anda 95% error bar shows how variable scores are within a given school. For mathachievement, many schools are clearly above or below the grand mean acrossall schools (zero, because these scores were standardized). School interceptsvary from more than a standard deviation below the mean to half a standarddeviation above the mean. Consistent with the grade-on-a-curve effect, thereis much less variation in school grades (Figure 2B). In contrast to math achieve-ment, there are only a few error bars representing different schools that do notoverlap with the mean grade across all schools. Consistent with the BFLPE,there is even less school-to-school variation in math self-concepts (Figure 2C).The intraschool correlation is an index of the amount of variation in each out-come that can be explained in terms of differences between schools. Whereasthe intraschool correlation is substantial for math achievement (.26), it is muchsmaller for math school grades (.04) and even smaller for math self-concept(.02). In summary, consistent with the grade-on-a-curve effect and the BFLPE,there is substantial school-to-school variation in math achievement scores butmuch less variation in math school grades and math self-concept.

Results and Discussion

The BFLPE

In each of a set of multilevel models, math self-concept responses arepredicted from different combinations of individual-level and school-levelvariables specifically constructed to assess a priori hypotheses and researchquestions. For each model, three sets of analyses were conducted in whichthe dependent variable was T1MSC (Table 1, Models 1A, 2A, and 3A), T2MSC(Table 1, Models 1B, 2B, and 3B), and T2MSC in which T1MSC was includedas a predictor variable to assess change in math self-concept (Table 1, Models1C, 2C, and 3C).

The predictive effect of individual achievement is substantial for T1MSC(.68; Model 1A in Table 1) and for T2MSC (.63; Model 1B, Table 1). In Model1C (Table 1), T1MSC has the largest predictive effect on T2MSC (.73), but thepredictive effect of individual achievement is still highly significant (.15).Hence, individual achievement test scores from T1 have a significant directpredictive effect on T2MSC beyond the substantial indirect predictive effectthat is mediated by T1MSC.

For present purposes, the most important results of Model 1 are the pre-dictive effects of school-average achievement, the BFLPE. This predictiveeffect is statistically significant, substantial, and negative for T1MSC (–.39,Model 1A) and nearly as large for T2MSC (–.34, Model 1B). Hence, there isa substantial negative predictive effect of school-average achievement onT2MSC collected 2 years after graduation from high school. Furthermore,

Marsh et al.

14

Stability of the BFLPE

15

Figure 2. School-to-school variation. (A) Math achievement: Each of the 147 vertical lines represents the school intercept (in terms of math achievement) withan error bar (++/– 1.96 SEs). Schools are ranked in terms of math achievement (fromlowest to highest). (B) Math school grades: Each of the 147 vertical lines repre-sents the school intercept (in terms of math grades) with an error bar (++/– 1.96SEs). (C) Math self-concept: Each of the 147 vertical lines represents the schoolintercept (in terms of math self-concept) with an error bar (++/– 1.96 SEs). Individualstudent scores for achievement, grades, and self-concept were standardized (M ==0, SD == 1) across all students.

even after controlling for the substantial negative predictive effect of school-average achievement on T1MSC, school-average achievement still has asmall, statistically significant, negative predictive effect on T2MSC (–.07, Model1C). Hence, school-average achievements from T1 have a small (statisticallysignificant) negative direct predictive effect on T2MSC beyond the substan-tial indirect negative predictive effect that is mediated by T1MSC. This impliesthat school-average achievement has new negative predictive effects onmath self-concept after graduation from high school, beyond what can beexplained in terms of the earlier negative predictive effects on T1MSC col-lected at the end of high school. Hence, at least from this perspective, thenegative predictive effect of school-average achievement grows larger aftergraduation from high school rather than diminishing.

Model 2 (Table 1) is essentially parallel to Model 1 except that the BFLPEis represented by school type (i.e., attending a highly selective traditionalGymnasium) rather than school-average achievement. Whereas the negativepredictive effects of school type in Model 2 are systematically smaller thanthe corresponding predictive effects based on school-average achievementin Model 1, the pattern and direction of predictive effects are similar. Theonly major difference is that the negative predictive effect of school type onT2MSC after controlling for T1MSC (Model 2C) is not statistically significant(whereas the corresponding predictive effect of school-average achievementin Model 1 is significantly negative).

In Model 3 (Table 1) we included both school-average achievement andschool type as predictor variables. Not surprisingly, given that these variablesare substantially correlated, the unique predictive effects of each are system-atically smaller than in the corresponding analysis, in which only one or theother of these school-level variables was included. However, what is interest-ing is the fact that both of these school-level variables have significantly neg-ative predictive effects on T1MSC (Model 3A) and T2MSC (Model 3B). Thus,for example, whereas the negative predictive effect of school-average achieve-ment was –.39 in Model 1A (which did not include school type), the corre-sponding predictive effect was –.25 in Model 3A (which did include schooltype). Similarly, whereas the negative predictive effect of school type was –.20in Model 2A (which did not include school-average achievement), the corre-sponding predictive effect was –.12 in Model 3A (which did include school-average achievement). In summary, the results of Models 1, 2, and 3 provideclear support for the BFLPE and for the main a priori prediction of the presentinvestigation. The predictive effect of school-average achievement (the BFLPE)is substantially negative at the end of high school and continues to be sub-stantially negative 2 years after graduation from high school. Furthermore,there are new negative predictive effects of school-average achievement onT2MSC beyond the substantial negative predictive effects that are mediated byT1MSC. Hence, the predictive effects of school-average achievement andschool type continue to have substantial negative predictive effects on mathself-concept several years after graduation from high school. On this basis, weconclude that the BFLPE is a persistent, long-lasting phenomenon.

16

Marsh et al.

17

Tabl

e 1

Stu

dy 1

: Sta

bilit

y of

the

Big

-Fis

h-Li

ttle

-Pon

d E

ffec

t: E

ffec

ts o

f Sch

ool-A

vera

ge A

chie

vem

ent,

Sch

ool T

ype,

and

Sta

bilit

y O

ver

Tim

e

Mod

el 1

: Sch

ool-

Mod

el 3

: Sch

ool-

Ave

rage

Ave

rage

Ach

ieve

men

tM

odel

2: S

choo

l Typ

eA

chie

vem

ent a

nd S

choo

l Typ

e

Mod

el 1

AM

odel

1B

Mod

el 1

CM

odel

2A

Mod

el 2

BM

odel

2C

Mod

el 3

AM

odel

3B

Mod

el 3

CT

1MSC

T2M

SC!T

2MSC

T1M

SCT

2MSC

!T2M

SCT

1MSC

T2M

SC!T

2MSC

Var

iabl

esb

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)

Fixe

d ef

fect

sL

evel

1: I

ndiv

idua

l stu

dent

pr

edic

tors

Mat

h te

sts

.68

(.02)

*.6

3 (.0

2)*

.15

(.02)

*.6

5 (.0

1)*

.60

(.02)

*.1

4 (.0

2)*

.68

(.04)

*.6

4 (.0

2)*

.15

(.02)

*T

1 m

ath

self

-con

cept

.71

(.02)

*.7

1 (.0

2)*

.71

(.04)

Lev

el 2

: Sch

ool-

leve

l pr

edic

tors

Scho

ol-a

vera

ge m

ath

test

–.39

(.04

)*–.

34 (.

04)*

–.07

(.03

)*–.

25 (.

04)*

–.23

(.05

)*–.

05 (.

03)

Scho

ol ty

pe–.

20 (.

02)*

–.17

(.02

)*–.

03 (.

02)

–.12

(.02

)*–.

10 (.

02)*

–.01

(.02

)R

esid

ual v

aria

nce

com

pone

nts

Lev

el 2

sch

ool

.02

(.007

)*.0

1 (.0

06)

.00

(.003

).0

1 (.0

06)*

.01

(.006

).0

0 (.0

03)

.01

(.005

).0

0 (.0

05)

.00

(.003

)L

evel

1 s

tude

nts

.63

(.019

)*.6

8 (.0

21)*

.36

(.013

)*.6

3 (.0

19)*

.68

(.021

)*.3

6 (.0

11)*

.63

(.019

)*.6

8 (.0

21)*

.36

(.011

)*D

evia

nce

sum

mar

y5,

530.

05,

673.

24,

176.

05,

530.

75,

679.

04,

178.

05,

505.

25,

656.

34,

175.

6

(con

tinue

d)

18

Tabl

e 1

(con

tinue

d)

Mod

el 4

: Sch

ool-

Mod

el 6

: Sch

ool-

Ave

rage

A

vera

ge A

chie

vem

ent

Mod

el 5

: Sch

ool T

ype

Ach

ieve

men

t and

Sch

ool T

ype

Mod

el 4

AM

odel

4B

Mod

el 4

CM

odel

5A

Mod

el 5

BM

odel

5C

Mod

el 6

AM

odel

6B

Mod

el 6

CT

1MSC

T2M

SC!T

2MSC

T1M

SCT

2MSC

!T2M

SCT

1MSC

T2M

SC!T

2MSC

Var

iabl

esb

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)

Fixe

d ef

fect

sL

evel

1: I

ndiv

idua

l stu

dent

pr

edic

tors

Mat

h te

sts

.46

(.02)

*.4

4 (.0

2)*

.15

(.02)

*.4

4 (.0

1)*

.42

(.02)

*.1

4 (.0

2)*

.46

(.01)

*.4

4 (.0

2)*

.15

(.02)

*M

ath

grad

es.4

9 (.0

2)*

.44

(.02)

*.1

3 (.0

2)*

.49

(.01)

*.4

4 (.0

2).1

3 (.0

2)*

.48

(.01)

*.4

3 (.0

2)*

.13

(.02)

*T

1MSC

.63

(.02)

*.6

3 (.0

2)*

.63

(.02)

Lev

el 2

: Sch

ool-

leve

l pr

edic

tors

Scho

ol-a

vera

ge m

ath

test

–.25

(.03

)*–.

21 (.

04)*

–.06

(.03

)*–.

13 (.

03)*

–.12

(.04

)*–.

04 (.

03)

Scho

ol ty

pe–.

10 (.

01)*

–.11

(.02

)*–.

03 (.

01)*

–.10

(.02

)*–.

08 (.

02)*

–.01

(.02

)R

esid

ual v

aria

nce

com

pone

nts

Lev

el 2

sch

ool

.01

(.005

)*.0

1 (.0

05)

.00

(.003

).0

1 (.0

04)*

.00

(.004

).0

0 (.0

03)

.00

(.004

).0

0 (.0

04)

.00

(.003

)L

evel

1 s

tude

nts

.44

(.013

)*.5

2 (.0

16)*

.35

(.010

)*.4

4 (.0

13)*

.52

(.016

)*.3

5 (.0

10)*

.44

(.013

)*.5

2 (.0

16)*

.35

(.010

)*D

evia

nce

sum

mar

y4,

687.

05,

058.

94,

110.

24,

674.

75,

054.

64,

111.

04,

663.

15,

046.

04,

109.

5

Not

e. T

1MSC

=T

ime

1 m

ath

self

-con

cept

; T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt; !

T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt c

ontr

ollin

g fo

r T

1MSC

; sch

ool-

aver

age

mat

h te

st =

scho

ol a

vera

ge o

f mat

h ac

hiev

emen

t tes

t sco

res;

sch

ool t

ype:

1 =

sele

ctiv

e G

ymna

sium

, 0 =

othe

r. A

ll pa

ram

eter

est

imat

es a

re s

tatis

ti-ca

lly s

igni

fica

nt w

hen

they

dif

fer f

rom

zer

o by

mor

e th

an 2

sta

ndar

d er

rors

. All

outc

ome

and

pred

icto

r var

iabl

es w

ere

stan

dard

ized

(M=

0, S

D=

1) a

t the

indi

vidu

al s

tude

nt le

vel s

o th

at p

aram

eter

est

imat

es a

re s

tand

ardi

zed

in re

latio

n th

e m

ean

and

stan

dard

dev

iatio

n of

indi

vidu

al-l

evel

var

iabl

es. A

naly

ses

are

base

d on

resp

onse

s by

2,3

06 s

tude

nts

who

com

plet

ed th

e m

ath

self

-con

cept

inst

rum

ent a

t Tim

e 2.

* p

< .0

1.

The Predictive Effect of School Grades on the BFLPE

Models 4, 5, and 6 (Table 1) largely parallel Models 1, 2, and 3. The maindifference is that individual student grades are also included in each of themodels as a predictor of math self-concept. Not surprisingly, math gradeshave a substantial positive predictive effect on math self-concept, and a sub-stantial portion of the predictive effect of math achievement on math self-concept (as shown in Model 1) can be explained in terms of math grades.Thus, for example, in Model 4A the combined predictive effect of individualachievement (.46) and grades (.49) is substantially larger than the predictiveeffect of either of these individual student variables considered alone. Of par-ticular relevance is the BFLPE—the predictive effects of the school-level vari-ables. As predicted on the basis of the grade-on-a-curve effect, the BFLPE issystematically smaller in models that include school grades. Equally able stu-dents get lower grades in schools where the school-average achievementlevel is higher and, in this investigation, in academically selective schools (i.e.,the Gymnasium schools). Hence, part of the negative predictive effect of theseschool-level variables is mediated by school grades. However, the predictiveeffects of school-average achievement (Model 4) and school type (Model 5)are still significantly negative for both T1MSC and T2MSC. Furthermore, thepredictive effects of school-average achievement (Model 4C) and school type(Model 5C) continue to have a significant predictive effect on T2MSC evenafter controlling for T1MSC.

Generalizability of the BFLPE: Interactions With Gender and Individual Achievement

In Models 7 and 8 (Table 2), we added individual student gender andinteraction effects involving gender and individual student ability to modelsalready considered. Whereas girls tend to have lower math self-concepts thanboys do, this gender difference only reaches statistical significance in Model7A. Gender does, however, interact with individual student achievement suchthat the predictive effect of math achievement on math self-concept is some-what stronger for girls than for boys. The BFLPE (negative predictive effectsof school-average achievement in Model 7 and school type in Model 8) alsovaries somewhat as a function of gender. In each case, the BFLPE is some-what larger for girls than for boys. Of particular relevance to this investiga-tion, the negative predictive effects of school-average achievement (Model 7)and school type (Model 8) do not vary substantially with individual studentachievement. Whereas there is a marginally significant interaction with schooltype for T1MSC (Model 8A), the predictive effect is not significant for T2MSC,nor is the interaction with school-average achievement significant for T1MSCor T2MSC (Model 7). In summary, the results of these extended modelssuggest that the BFLPE is robust, generalizing reasonably well at different levels of individual student achievement and gender.

Stability of the BFLPE

19

20

Tabl

e 2

Stu

dy 1

: Gen

eral

izab

ility

of t

he B

ig-F

ish-

Litt

le-P

ond

Eff

ect:

Inte

ract

ions

With

Gen

der

and

Indi

vidu

al A

chie

vem

ent

Mod

el 7

: Sch

ool-

Ave

rage

Ach

ieve

men

tM

odel

8: S

choo

l Typ

e

Mod

el 7

AM

odel

7B

Mod

el 7

CM

odel

8A

Mod

el 8

BM

odel

8C

T1M

SCT

2MSC

!! T2M

SCT

1MSC

T2M

SC!! T

2MSC

Var

iabl

esb

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)

Fixe

d ef

fect

sL

evel

1: I

ndiv

idua

l stu

dent

pre

dict

ors

Mat

h te

st.4

5 (.0

2)*

.46

(.02)

*.1

5 (.0

2)*

.44

(.02)

*.4

3 (.0

2)*

.15

(.02)

*M

ath

grad

es.4

9 (.0

2)*

.44

(.01)

*.1

3 (.0

2)*

.48

(.01)

*.4

3 (.0

2)*

.13

(.02)

*T

1MSC

.63

(.02)

*.6

2 (.0

2)*

Sex

(0 =

mal

e, 1

=fe

mal

e)–.

03 (.

01)*

–.03

(.02

)–.

01 (.

01)

–.01

(.02

)–.

01 (.

02)

–.00

(.01

)Se

x "

Mat

h T

est

.04

(.02)

*.0

6 (.0

2)*

.03

(.01)

*.0

4 (.0

2)*

.06

(.02)

*.0

3 (.0

1)*

Lev

el 2

: Sch

ool-

leve

l pre

dict

ors

Scho

ol-a

vera

ge m

ath

test

–.24

(.04

)*–.

20 (.

04)*

–.06

(.03

)Sc

hool

type

.13

(.02)

*–.

11(.0

2)*

–.03

(.01)

*C

ross

-lev

el in

tera

ctio

nsSc

hool

-Ave

rage

Mat

h T

est "

Sex

–.10

(.02

)*–.

10 (.

04)*

–.04

(.03

)Sc

hool

-Ave

rage

Mat

h T

est "

Mat

h T

est

.01

(.03)

.00

(.03)

.00(

.03)

Scho

ol T

ype

"Se

x–.

05(.0

2)*

–.07

(.02)

*–.

04(.0

2)*

Scho

ol T

ype

"In

divi

dual

Ach

ieve

men

t.0

4 (.0

2)*

.03

(.02)

.01

(.01)

Res

idua

l var

ianc

e co

mpo

nent

sL

evel

2 s

choo

l.0

1 (.0

04)*

.01

(.005

).0

0 (.0

03)

.01

(.004

)*.0

0 (.0

04)

.00

(.003

)L

evel

1 s

tude

nts

.44

(.013

)*.5

2 (.0

16)*

.35

(.010

)*.4

4 (.0

13)*

.51

(.016

)*.3

5 (.0

10)*

Dev

ianc

e su

mm

ary

4,67

4.0

5,04

4.7

4,10

4.8

4,64

9.7

5,02

4.6

4,10

0.1

Not

e.T

1MSC

=T

ime

1 m

ath

self

-con

cept

; T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt; !

T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt c

ontr

ollin

g fo

r T1M

SC. S

choo

l-av

erag

em

ath

test

=sc

hool

ave

rage

of m

ath

achi

evem

ent t

est s

core

s; s

choo

l typ

e: 1

=se

lect

ive

Gym

nasi

um, 0

=ot

her.

All

para

met

er e

stim

ates

are

sta

tistic

ally

sig

-ni

fica

nt w

hen

they

dif

fer f

rom

zer

o by

mor

e th

an 2

sta

ndar

d er

rors

. All

outc

ome

and

pred

icto

r var

iabl

es w

ere

stan

dard

ized

(M=

0, S

D =

1) a

t the

indi

vid-

ual s

tude

nt le

vel s

o th

at p

aram

eter

est

imat

es a

re s

tand

ardi

zed

in re

latio

n th

e m

ean

and

stan

dard

dev

iatio

n of

indi

vidu

al-l

evel

var

iabl

es. A

naly

ses

are

base

don

resp

onse

s by

2,3

06 s

tude

nts

who

com

plet

ed th

e m

ath

self

-con

cept

inst

rum

ent a

t Tim

e 2.

*p <

.05.

Study 2

As described earlier, the primary purpose of Study 2 is to extend tests ofthe stability of the BFLPE over a longer period of time. The main differencebetween the two studies is that the T1-T2 interval is 4 years in Study 2 comparedto 2 years in Study 1. Despite this substantially longer interval, a priori predic-tions and research questions are the same in both studies (see earlier discussion).

Method

Description of the Study and Sample

Study 2 is based on a subsample of the longitudinal Learning Processes,Educational Careers and Psychosocial Development in Adolescence study.This investigation was conducted by the Max Planck Institute for HumanDevelopment in Berlin (for more general descriptions of the study and result-ing database, see Baumert et al., 1996). The data analyzed in this article arebased on students from 94 randomly selected upper secondary schools in fourGerman states: two in East Germany and two in West Germany. Schools wererandomly sampled in each participating state. The schools are representative ofthe traditional Gymnasium school type and the comprehensive upper sec-ondary school; both school types provide students with the qualifications toattend university. Gymnasium students in the traditional Gymnasium are a par-ticularly highly selected sample in terms of academic achievement. Trainedresearch assistants administered materials in each school at the end of the1996–1997 school year (T1). At T1, all students were asked to provide writtenconsent to be contacted again for a second wave of data collection. At T2, 4 years later, participants were asked to complete an extensive questionnairetaking about 2 hours in exchange for a financial reward of 15" (about U.S. $18).

Because our focus is on the stability of predictive effects over time,actual analyses are based on responses by the 1,758 (42%) students whocompleted the math self-concept instrument at both T1 (1 year prior to grad-uation from high school) and T2 (3 years after graduation from high school).The mean age of students at T1 was 18.55 (SD = 0.56). However, as in Study1, T1 school-average achievement was based on responses by all studentsfrom each of the 94 schools who completed the achievement tests adminis-tered at T1. As in Study 1, students in the final sample compared to studentsin the original sample were more likely to be female, to be younger in age,to be more mathematically able based on both test scores and school grades,to come from a more academically selective school, and to have a highermath self-concept (all effect sizes were modest, varying between .1 and .3).

Instruments

Math self-concept. T1MSC was assessed using a short, 5-item scale. Thisstandard German measure of the construct has been shown to be reliable andvalid in large-scale survey studies conducted over the past 25 years (Möller &

Stability of the BFLPE

21

Köller, 2001, 2004; Trautwein, Lüdtke, Köller, & Baumert, 2006). An exampleitem is, “Nobody’s perfect, but I’m just not good at math.” Students respondedto each item on a 4-point scale (1 = totally agree to 4 = totally disagree). Totalscores were computed by aggregating all items, resulting in scores rangingfrom 1 (low self-concept) to 4 (high self-concept). T2MSC was assessed bythe German adaptation of the Self Description Questionnaire III math self-concept instrument that was used in Study 1. To test the comparability of themath self-concept scales used at T1 and T2 in Study 2, a small subsequentstudy was conducted. A self-concept questionnaire that included the twoscales was administered to 115 university students (54.8% women) from var-ious fields of study at a Berlin university. The nine math self-concept itemswere subjected to a confirmatory factor analysis in which the five items fromone scale were used to define one latent factor and the four items from thesecond scale defined a second latent factor. The latent correlation betweenthe two factors was r = .97, indicating that both scales measured a similarunderlying math self-concept construct.

Academic outcome measures. The items on the math achievement testwere taken from previous national and international studies, in particular theInternational Association for the Evaluation of Educational Achievement’sFirst and Second International Mathematics Study, the Third InternationalMathematics and Science Study (Beaton et al., 1996; Husén, 1967; Robitaille& Garden, 1989) and an investigation conducted at the Max Planck Institutefor Human Development (Baumert, Roeder, Sang, & Schmitz, 1986).Curriculum experts had assessed the curricular validity of all items before-hand. Individual achievement scores were calculated on the basis of itemresponse theory. Internal consistency estimates of reliability were above .80.School grades in mathematics were based on student self-reports of gradesfrom their previous report cards.

Statistical Analyses

Statistical analyses were based on multilevel analyses of the same set ofa priori path models considered in Study 1. Because we based analyses onresponses by participants with both T1 and T2 data, there were few missingdata, and we again used the expectation maximization algorithm to imputemissing values for T1MSC, math grades, and math test scores. As described inStudy 1, we began by standardizing (z scoring) all variables to have M = 0 andSD = 1 across the entire sample (Appendix, Study 2). School-average achieve-ment was determined by taking the average of achievement scores for studentsin each school (but not restandardizing these scores so that individual studentand school-average achievement scores were in the same metric). Productterms were used to test interaction. Again, intraschool correlations indicatedthat school-level variation was substantial for math achievement test scores(.23) but substantially less for math school grades (.07) and particularly formath self-concept (.02).

Marsh et al.

22

Results and Discussion

The same set of models evaluated in Study 1 is considered in Study 2.Again, the main focus of these analyses is on the predictive effects of school-level variables (school-average achievement and school type) on math self-concept assessed at 1 year prior to the completion of high school (T1) and 3 years after graduation from high school (T2).

The BFLPE

The predictive effect of individual achievement is substantial on T1MSC(.47; Model 1A in Table 3) and T2MSC (.46; Model 1B, Table 3). In Model 1C(Table 3), T1MSC has the largest predictive effect on T2MSC (.61), but the pre-dictive effects of individual achievement are still highly significant (.18). Hence,individual achievement test scores from T1 have a significant direct predictiveeffect on T2MSC beyond the substantial indirect predictive effect that is medi-ated by T1MSC.

The BFLPE (the negative predictive effect of school-average achievementin Model 1) is statistically significant and substantial for T1MSC (–.28, Model1A) and for T2MSC (–.21, Model 1B). Hence, there is a substantial negativepredictive effect of school-average achievement on T2MSC collected 4 yearslater. However, after controlling for the substantial negative predictive effectof school-average achievement on T1MSC, the negative predictive effect ofschool-average achievement (based on T1 achievement) on T2MSC (–.03,Model 1C) is not statistically significant. This implies that school-averageachievement has no additional negative predictive effects on math self-concept after graduation from high school beyond what can be explained interms of the earlier negative predictive effects on T1MSC collected 1 yearbefore the end of high school.

Model 2 (Table 3) is essentially parallel to Model 1 except that the BFLPEis represented by school type (i.e., attending an academically selective highschool) rather than school-average achievement. Whereas the negative pre-dictive effects of school type in Model 2 are systematically smaller than thecorresponding predictive effects based on school-average achievement inModel 1, the pattern and direction of predictive effects are similar.

In Model 3, both school-average achievement and school type areincluded as predictor variables. Because these variables are substantially cor-related, the unique predictive effects of each are smaller than in the parallelanalysis in which the other was not included. Nevertheless, both of theseschool-level variables had significantly negative predictive effects on T1MSC(Model 3A) and T2MSC (Model 3B). Thus, for example, whereas the nega-tive predictive effect of school-average achievement was –.28 in Model 1A(which did not include school type), the corresponding predictive effect was–.24 in Model 3A (which did include school type).

In summary, the results of Models 1, 2, and 3 provide clear support forthe BFLPE and for the main a priori prediction of this investigation. The pre-dictive effects of school-average achievement and school type (the BFLPE) are

Stability of the BFLPE

23

24

Tabl

e 3

Stu

dy 2

: Sta

bilit

y of

the

Big

-Fis

h-Li

ttle

-Pon

d E

ffec

t: E

ffec

ts o

f Sch

ool-A

vera

ge

Ach

ieve

men

t, S

choo

l Typ

e, a

nd S

tabi

lity

Ove

r Ti

me

Mod

el 1

: Sch

ool-

Mod

el 3

: Sch

ool-

Ave

rage

Ave

rage

Ach

ieve

men

tM

odel

2: S

choo

l Typ

eA

chie

vem

ent a

nd S

choo

l Typ

e

Mod

el 1

AM

odel

1B

Mod

el 1

CM

odel

2A

Mod

el 2

BM

odel

2C

Mod

el 3

AM

odel

3B

Mod

el 3

CT

1MSC

T2M

SC!! T

2MSC

T1M

SCT

2MSC

!! T2M

SCT

1MSC

T2M

SC!! T

2MSC

Var

iabl

esb

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)

Fixe

d ef

fect

sL

evel

1: I

ndiv

idua

l stu

dent

pr

edic

tors

Mat

h gr

ades

.47

(.02)

*.4

6 (.0

2)*

.18

(.02)

*.4

5 (.0

2)*

.44

(.02)

*.1

8 (.0

2)*

.47

(.02)

*.4

6 (.0

2)*

.18

(.02)

*T

1MSC

.61

(.02)

*.6

1 (.0

2)*

.61

(.02)

*L

evel

2: S

choo

l-le

vel

pred

icto

rsSc

hool

-ave

rage

mat

h te

st–.

28 (.

06)*

–.21

(.05

)*–.

03 (.

04)

–.24

(.05

)*–.

15 (.

06)*

–.01

(.05)

Scho

ol ty

pe–.

09 (.

02)*

–.08

(.02

)*–.

03 (.

02)

–.04

(.03

)–.

05 (.

02)*

–.02

(.02

)R

esid

ual v

aria

nce

com

pone

nts

Lev

el 2

sch

ool

.01

(.008

).0

0 (.0

06)

.00

(.004

).0

2 (.0

09)*

.00

(.006

).0

0 (.0

01)

.01

(.008

).0

0 (.0

01)

.00

(.004

)L

evel

1 s

tude

nts

.80

(.028

)*.8

1 (.0

28)*

.51

(.017

)*.8

0 (.0

28)*

.82

(.028

)*.5

1 (.0

18)*

.80

(.028

)*.8

1 (.0

27)*

.51

(.018

)*D

evia

nce

sum

mar

y4,

628.

24,

630.

63,

812.

74,

637.

14,

633.

13,

811.

44,

626.

14,

626.

93,

811.

43

(con

tinue

d)

25

Mod

el 4

: Sch

ool-

Ave

rage

M

odel

6: S

choo

l-A

vera

ge

Ach

ieve

men

tM

odel

5: S

choo

l Typ

eA

chie

vem

ent a

nd S

choo

l Typ

e

Mod

el 4

AM

odel

4B

Mod

el 4

CM

odel

5A

Mod

el 5

BM

odel

5C

Mod

el 6

AM

odel

6B

Mod

el 6

CT

1MSC

T2M

SC!T

2MSC

T1M

SCT

2MSC

!T2M

SCT

1MSC

T2M

SC!T

2MSC

Var

iabl

esb

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)

Fixe

d E

ffec

tsL

evel

1: I

ndiv

idua

l stu

dent

pr

edic

tors

Mat

h te

sts

.32

(.02)

*.3

5 (.0

2)*

.16

(.02)

*.3

1 (.0

1)*

.34

(.02)

*.1

7 (.0

2)*

.32

(.02)

*.3

5 (.0

2)*

.17

(.02)

*M

ath

grad

es.4

0 (.0

2)*

.30

(.02)

*.0

6 (.0

2)*

.41

(.01)

*.3

0 (.0

2).0

6 (.0

2)*

.40

(.02)

*.3

0 (.0

2)*

.06

(.02)

*T

1MSC

.58

(.02)

*.5

8 (.0

2)*

.58

(.02)

Lev

el 2

: Sch

ool-

leve

l pr

edic

tors

Scho

ol-a

vera

ge m

ath

test

–.18

(.06

)*–.

13 (.

05)*

–.02

(.04)

–.13

(.06

)*–.

08(.0

6).0

0(.0

5)Sc

hool

type

–.07

(.02

)*–.

06 (.

02)*

–.02

(.02

)–.

04 (.

03)

–.05

(.02

)*–.

03(.0

2)R

esid

ual v

aria

nce

com

pone

nts

Lev

el 2

sch

ool

.02

(.008

)*.0

0 (.0

01)

.00

(.004

).0

2 (.0

08)*

.00

(.001

).0

0 (.0

04)

.02

(.008

)*.0

0 (.0

01)

.00

(.004

)L

evel

1 s

tude

nts

.66

(.023

)*.7

4 (.0

25)*

.51

(.017

)*.6

6 (.0

23)*

.74

(.025

)*.5

1 (.0

17)*

.66

(.023

)*.7

4 (.0

25)*

.51

(.017

)*D

evia

nce

sum

mar

y4,

289.

84,

454.

13,

802.

74,

291.

14,

452.

23,

801.

44,

287.

74,

450.

53,

801.

4

Not

e. T

1MSC

=T

ime

1 m

ath

self

-con

cept

; T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt; !

T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt c

ontr

ollin

g fo

r T1M

SC; s

choo

l-av

er-

age

mat

h te

st =

scho

ol a

vera

ge o

f mat

h ac

hiev

emen

t tes

t sco

res;

sch

ool t

ype:

1 =

sele

ctiv

e G

ymna

sium

, 0 =

othe

r. A

ll pa

ram

eter

est

imat

es a

re s

tatis

tical

lysi

gnif

ican

t whe

n th

ey d

iffe

r fr

om z

ero

by m

ore

than

2 s

tand

ard

erro

rs. A

ll ou

tcom

e an

d pr

edic

tor

vari

able

s w

ere

stan

dard

ized

(M

=0,

SD

=1)

at t

he in

di-

vidu

al st

uden

t lev

el so

that

par

amet

er e

stim

ates

are

stan

dard

ized

in re

latio

n th

e m

ean

and

stan

dard

dev

iatio

n of

indi

vidu

al-l

evel

var

iabl

es. A

naly

ses a

re b

ased

on re

spon

ses

by 1

,758

stu

dent

s w

ho c

ompl

eted

the

mat

h se

lf-c

once

pt in

stru

men

t at T

ime

2.*p

<.0

1.

substantially negative near the end of high school and continue to be sub-stantial and negative 4 years later. Whereas there are no new negative predic-tive effects of school-average achievement or school type on T2MSC aftercontrolling for the negative predictive effects of these school-level variables onT1MSC, there are no indirect positive predictive effects of these school-levelvariables to offset the substantial negative predictive effects that are largelymediated by T1MSC. Hence, the predictive effects of school-average achieve-ment and school type continue to have substantial negative predictive effectson math self-concept 3 years after graduation from high school. On this basis,we argue that the BFLPE is a persistent, long-lasting phenomenon.

The Predictive Effect of School Grades on the BFLPE

In Models 4, 5, and 6 (Table 3) individual student grades are added tovariables considered in Models 1, 2, and 3. Both individual achievement andgrades have substantial positive predictive effects on math self-concept. Thus,for example, the predictive effect of individual math achievement on T1MSCis .47 in Model 1A, but only .32 in Model 4A. What is surprising, perhaps, isthat the predictive effect of math achievement in Model 4A is nearly as largeas the predictive effect of math grades (.40). The combined predictive effectsof individual math achievement and math school grades on math self-conceptare substantially larger than the predictive effect of either of these individualstudent variables considered alone.

For our purposes, the most important component of Models 4, 5, and 6 is the BFLPE—the predictive effects of the school-level variables. As pre-dicted on the basis of the grade-on-a-curve effect, the BFLPE is systemati-cally smaller in models that include school grades. Importantly, however, thepredictive effects of school-average achievement (in Model 4) and schooltype (Model 5) are still significantly negative for both T1MSC and T2MSC.Indeed, whereas the sizes of these negative predictive effects are smaller, thepattern of significant predictive effects is nearly the same in Models 3, 4, and5 (which include math grades) as the corresponding predictive effects inModels 1, 2, and 3 (which do not include math grades).

Generalizability of the BFLPE: Interactions With Gender and Individual Achievement

In Models 7 and 8 (Table 4), we added individual student gender, and inter-action effects involving gender and individual student ability, to models alreadyconsidered. Although (based on previous findings) it is not surprising that girlstend to have lower math self-concepts than boys do, the predictive effect of mathachievement on math self-concept does not vary as a function of gender. Ofmore particular relevance to this investigation is the question of whether theBFLPE (the negative predictive effects of school-average achievement or schooltype) varies as a function of individual student achievement or individual student gender. Of the total of 12 interactions in Models 7 and 8, only 1 is mar-ginally significant. For T1MSC, the negative predictive effect of school-average

Marsh et al.

26

27

Tabl

e 4

Stu

dy 2

: Gen

eral

izab

ility

of t

he B

ig-F

ish-

Litt

le-P

ond

Eff

ect:

Inte

ract

ions

With

Gen

der

and

Indi

vidu

al A

chie

vem

ent

Mod

el 7

: Sch

ool-

Ave

rage

Ach

ieve

men

tM

odel

8: S

choo

l Typ

e

Mod

el 7

AM

odel

7B

Mod

el 7

CM

odel

8A

Mod

el 8

BM

odel

8C

T1M

SCT

2MSC

!! T2M

SCT

1MSC

T2M

SC!! T

2MSC

Var

iabl

esb

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)b

(SE

)

Fixe

d ef

fect

sL

evel

1: I

ndiv

idua

l stu

dent

pre

dict

ors

Mat

h te

st.2

9 (.0

2)*

.33

(.02)

*.1

6 (.0

2)*

.28

(.02)

*.3

2 (.0

2)*

.16

(.02)

*M

ath

grad

es.4

1 (.0

2)*

.30

(.02)

*.0

7 (.0

2)*

.42

(.01)

*.3

0 (.0

2)*

.074

(.02

)*T

1MSC

.58

(.02)

*.5

8 (.0

2)*

Sex

(0 =

mal

e, 1

=fe

mal

e)–.

15 (.

02)*

–.12

(.02

)*–.

04 (.

02)*

–.15

(.02

)*–.

12 (.

02)

–.04

(.02

)*Se

x "

Mat

h T

est

.00

(.02)

.03

(.02)

.03

(.02)

.01

(.02)

.03

(.02)

.02

(.02)

Lev

el 2

: Sch

ool-

leve

l pre

dict

ors

Scho

ol-a

vera

ge m

ath

test

–.18

(.05

)*–.

12 (.

05)*

–.02

(.04

)Sc

hool

type

–.07

(.03

)*–.

06 (.

03)*

–.02

(.02

)C

ross

-lev

el in

tera

ctio

nsSc

hool

-Ave

rage

Mat

h T

est "

Sex

.02

(.05)

–.06

(.05

)–.

06 (.

04)

Scho

ol-A

vera

ge M

ath

Tes

t "M

ath

Tes

t–.

09 (.

04)*

–.05

(.05

).0

0 (.0

4)Sc

hool

Typ

e "

Sex

.00

(.02)

–.02

(.02

)–.

02 (.

02)

Scho

ol T

ype

"In

divi

dual

Ach

ieve

men

t–.

02 (.

02)

–.01

(.03

).0

1 (.0

2)R

esid

ual v

aria

nce

com

pone

nts

Lev

el 2

sch

ool

.01

(.007

)*.0

0 (.0

01)

.00

(.004

).0

2 (.0

07)*

.00

(.001

).0

0 (.0

04)

Lev

el 1

stu

dent

s.6

4 (.0

22)*

.72

(.024

)*.5

0 (.0

17)*

.64

(.022

)*.7

2 (.0

24)*

.50

(.017

)*D

evia

nce

sum

mar

y4,

232.

24,

416.

13,

795.

14,

238.

44,

416.

53,

794.

0

Not

e. T

1MSC

=T

ime

1 m

ath

self

-con

cept

; T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt; !

T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt c

ontr

ollin

g fo

r T1M

SC; s

choo

l ave

r-ag

e m

ath

test

=sc

hool

ave

rage

of m

ath

achi

evem

ent t

est s

core

s; sc

hool

type

: 1 =

sele

ctiv

e G

ymna

sium

, 0 =

othe

r. A

ll pa

ram

eter

est

imat

es a

re st

atis

tical

ly si

g-ni

fica

nt w

hen

they

dif

fer f

rom

zer

o by

mor

e th

an 2

sta

ndar

d er

rors

. All

outc

ome

and

pred

icto

r var

iabl

es w

ere

stan

dard

ized

(M=

0, S

D=

1) a

t the

indi

vidu

alst

uden

t lev

el s

o th

at p

aram

eter

est

imat

es a

re s

tand

ardi

zed

in r

elat

ion

the

mea

n an

d st

anda

rd d

evia

tion

of in

divi

dual

-lev

el v

aria

bles

. Ana

lyse

s ar

e ba

sed

onre

spon

ses

by 1

,758

stu

dent

s w

ho c

ompl

eted

the

mat

h se

lf-c

once

pt in

stru

men

t at T

ime

2.*p

<.0

5.

achievement is greater for more able students. This predictive effect is not,however, significant for T2MSC, nor do the negative predictive effects ofschool type vary with individual levels of student achievement. None of theinteractions with individual student gender is significant, indicating that thesize of the BFLPE is similar for boys and for girls. In summary, the results ofthese extended models suggest that the BFLPE is robust, generalizing reason-ably well at different levels of individual student achievement and gender.

General Discussion for Studies 1 and 2

Is the BFLPE a Persistent, Long-Lasting Phenomenon?

The overarching purpose of this investigation is to test a priori predic-tions that the BFLPE associated with school-average achievement and schooltype is clearly evident at the end of high school and that these effects are stillevident for participants several years after graduation from high school. Inboth studies, the BFLPE is substantial for both T1MSC and for T2MSC. Hence,there is clear support for a priori predictions that the BFLPE is a persistent,long-lasting phenomenon.

A critical and apparently unique contribution of this investigation is theevaluation of the BFLPE (negative predictive effects of school-averageachievement and school type) on academic self-concept while students werein high school (T1) and again several years after they had graduated from highschool (T2). Obviously, this is a more demanding test of the BFLPE than sim-ply assessing the size of the BFLPE on a single occasion while students are inhigh school or even tests of the BFLPE based on more than one occasion dur-ing high school. Whereas there now exists clear support for the BFLPE dur-ing high school, a focus of this investigation was whether this predictive effectremained after students had graduated from high school and were no longerin the high school setting in which the BFLPE was established. Although itmight be reasonable to suggest that the BFLPE should dissipate over time, theresults of the present investigation show that the predictive effects are per-sistent and continue to have a substantial predictive effect on academic self-concepts long after graduation from high school.

In evaluating the stability of the BFLPE, it is critical to interpret carefullythe results from Model A (based on T1MSC), Model B (based on T2MSC), andModel C (based on T2MSC controlling for the predictive effects of T1MSC).Results based on Models A and B show that the negative predictive effects ofthe BFLPE are statistically significant and substantial near the end of high schooland again several years after graduation. If the BFLPE was a short-term, transi-tory phenomena, one might expect that the predictive effect of school-levelachievement and school type would only be substantial for T1MSC (Model A)but not for T2MSC (Model B). However, once the negative predictive effects ofthese school-level variables on T1MSC had been controlled, a substantialdiminution of the BFLPE should result in a positive direct effect of these school-level variables for T2MSC (Model C, also see earlier discussion of Figure 1B).

Marsh et al.

28

Thus, the indirect negative BFLPE mediated by T1MSC would be offset to alarge extent by a positive, direct effect of high-school-average achievement onT2MSC. Hence, even though the BFLPE might be negative for T2MSC whenT1MSC is not controlled, the direct predictive effect of school-average abilitymight be positive for T2MSC after controlling for the negative predictive effectsmediated through T1MSC. There was, however, absolutely no support for suchalternative speculations. Indeed, for Study 1 there was a small but statisticallynegative predictive effect of school-average achievement (Model 1C, Table 1)on T2MSC even after controlling for the substantial negative predictive effectsof school-average achievement on T1MSC. This implies that there were new,more negative predictive effects of school-average achievement after gradua-tion from high school (T2) in addition to the substantial negative predictiveeffects already experienced during high school (T1). Whereas this pattern hasbeen found in previous research based on two data collections during highschool (e.g., Marsh, 1991), this is apparently the first such finding followinggraduation from high school.

In Study 2, the corresponding negative direct predictive effect on T2MSCafter controlling for T1MSC was not statistically significant (Model 1C in Table3). Importantly, however, there was no statistically significant positive pre-dictive effect of school-average achievement, as would be expected if therewere a systematic diminution of the BFLPE over time. Hence, whereas therewas no evidence that the size of the BFLPE increased during the period fol-lowing high school in Study 2, it did not decrease. Furthermore, there are dif-ferences between the two studies that might account for these differentresults. In particular, the time lag in Study 1 was only about 2 years (from finalyear of high school to 2 years after graduation from high school), whereas thetime lag in Study 2 was nearly twice as long (from the year before the finalyear in high school to 3 years after graduation from high school).

School Grades, Grading on a Curve, and the BFLPE

The BFLPE presents a dilemma for academic self-concept researchers. Onone hand, there is clear evidence that academic self-concept is more stronglyrelated to school-based performance measures, such as school grades, com-pared to standardized test scores. On the other hand, because grades tend tobe idiosyncratic to a particular setting and teachers tend to grade on a curve,school grades do not provide a common metric that is generalizable over dif-ferent schools and classes. Hence, most BFLPE studies have been based onstandardized achievement test scores. However, Marsh (1987) noted that theBFLPE and a grading-on-a-curve effect have a similar rationale and are mutu-ally reinforcing, such that the BFLPE is mediated in part by school grades.

Results of the present investigation replicate these earlier results but alsoextend them in some interesting ways. Not surprisingly, in both Studies 1 and2, school grades have a substantial influence on T1MSC beyond the substan-tial predictive effect of individual student achievement. What may be moresurprising is that the achievement test scores continue to have such a sub-stantial predictive effect on academic self-concept beyond the predictive

Stability of the BFLPE

29

effect of school grades. This suggests that students know their relative abili-ties in relation to a broader, more generalizable frame of reference in addi-tion to the more narrowly focused frame of reference provided by otherstudents in their school. It is also interesting to note that the relative contri-bution of test scores is as high or higher for models based on T2MSC, col-lected several years after graduation from high school (e.g., Models 4B and4C), than for models based on T1MSC, collected near the end of high school(e.g., Model 4A). Nevertheless, it is relevant to note that both school gradesand, in particular, test scores continue to have significant predictive effects onT2MSC even after controlling for T1MSC. As such, the results of our studydemonstrate that each of these sources of information about achievementcontinues to have substantial predictive effects on academic self-concept.Whereas the predictive effects of test scores and grades in T2MSC are similarin Study 1, the predictive effects of test scores are significantly larger than highschool grades after students have graduated from high school. Because testscores reflect a broader frame of reference than school grades that are highlydependent on the achievement levels of other students in the same highschool, it is not surprising that test scores are strongly related to academic self-concept after graduation from high school.

The BFLPEs, the negative predictive effects of school-average achieve-ment and school type, are substantially smaller after controlling for schoolgrades. This implies that the grading-on-a-curve effect and the BFLPE are sub-stantially overlapping, mutually reinforcing processes that have independentpredictive effects on academic self-concept. Hence, results based on these twolarge, representative samples of German high schools and the distinctive formof tracking students into different school types in the German school systemreplicate Marsh’s (1987) results based on a large representative sample of U.S.high schools. Although beyond the scope of our study, a useful direction forfurther research would be to disentangle the apparently confounded effects ofthese two processes. Whereas typically these processes are positively related,it should be possible to find naturally occurring situations in which the twoprocesses are not confounded (e.g., where school grades are based onabsolute criteria or are externally moderated in relation to external criteria thatare measured along a common metric) or, perhaps, to experimentally manip-ulate grades in a laboratory setting so that they are independent of achieve-ment test scores. The critical issue is the extent to which the size of the BFLPEvaries as a function of the grading standards.

Generalizability of the Predictive Effects

The results of this investigation indicate that the BFLPE is reasonablyrobust over two different studies, over time, over gender, and over individ-ual student ability levels. The major focus of this investigation is to demon-strate that the BFLPE that is widely demonstrated in high school settings islong lasting and persistent after students have graduated from high school.We were also interested, however, in determining how robust the BFLPE is

Marsh et al.

30

over gender and individual student achievement levels. The inclusion of addi-tional variables representing gender, and interactions between individualachievement with gender and with school-average achievement, had almostno effect on the size of the BFLPE in either Study 1 or Study 2. Although theBFLPE was marginally larger for girls than for boys, this interaction was onlysignificant for some models in Study 1 and was not statistically significant forany analyses in Study 2. Consistent with previous results, the BFLPE did notvary much as a function of individual student achievement. In Study 1, indi-vidual student achievement did not interact significantly with school-averageachievement in Models 6A, 6B, or 6C, but there was a marginally significantinteraction with school type in Model 7A (but not 7B or 7C), suggesting thatthe BFLPE was slightly smaller for more able students. In Study 2, individualstudent achievement did not interact significantly with school type in Models7A, 7B, or 7C, but there was a marginally significant interaction with school-average achievement in Model 6A (but not in 6B or 6C), suggesting that theBFLPE was slightly larger for more able students. Although the precise natureof the few small interactions that reached statistical significance (due in partto the large sample sizes) was not entirely consistent across the two studies,both studies provided reasonable support for the robustness of the BFLPEover time, gender, and individual student achievement levels.

Given that Studies 1 and 2 were based on different materials, differentcohorts of students, and different time frames, there was good consistency inthe pattern of results. In particular, both studies clearly demonstrate that theBFLPE is persistent in that the predictive effects are clearly evident even sev-eral years after graduation from high school. However, even though the pat-terns of predictive effects are largely similar in the two studies, the predictiveeffects are systematically larger in Study 1 than in Study 2. The positive pre-dictive effects of individual student characteristics (student achievement andschool grades) and the negative predictive effects of school-level characteris-tics (school-average achievement and school type) are all larger in Study 1than in Study 2. Although there are several potential sources of difference, themost likely seems to be the time frame of the two studies. Study 2 began ear-lier (T1 was the second-to-the-last year of high school) than Study 1 (T1 wasthe last year of high school) and lasted longer (the T1-T2 interval was 2 yearsin Study 1 and 4 years in Study 2). It is not, perhaps, surprising that the pre-dictive effects of school grades and achievement in high school have smallerpredictive effects on self-concept measures collected 4 years later in Study 2than parallel predictive effects after only 2 years in Study 1. More surprising,perhaps, is the finding that the predictive effects in Study 2 are also smallerat T1. The results may reflect the fact that the academic self-concepts of students near the end of high school (Study 1) are more closely aligned toobjective indicators of academic accomplishment (including the relative per-formances of classmates) than they are in the penultimate year of high school(in Study 2). However, the smaller BFLPE on T1 self-concept in Study 2 thanin Study 1 may also reflect the different self-concept instruments used at T1in the two studies. Although clearly beyond the scope of this investigation, it

Stability of the BFLPE

31

would be useful to assess the strength of the BFLPE more frequently over alonger span of time to test the predictions that (a) the BFLPE grows larger thelonger the same students remain in the same school and (b) that the BFLPEis reasonably stable over time even after students have graduated from highschool.

Limitations and Directions for Further Research

It is relevant to address potential limitations of this investigation in termsof their implications for interpretation of our results and directions for futureresearch. BFLPE theory offers causal predictions about the effects of school-average ability, and the combination of sophisticated statistical analyses andlongitudinal data provide strong tests of these causal predictions. It is, never-theless, inappropriate to claim that our results prove that the predictive effectsthat we have found represent true causal effects. This limitation appears tobe an inevitable consequence of the nature of this research, but convergenceof results based on alternative experimental designs and even stronger statis-tical techniques would strengthen the interpretation of the results.

In this investigation, we considered math self-concept only, and this hasbeen the focus of a number of other BFLPE studies (e.g., Marsh & Craven,2002). On the basis of existing theory and a very limited amount of research,there is no reason to expect that the BFLPE does not apply equally well toother academic domains. Because school-average abilities in different acade-mic domains (e.g., school-average abilities in mathematics, science, history,language) are likely to be very highly correlated, it is unlikely that differentschool-average abilities can be readily differentiated in the general population.It is possible, however, that schools that are highly selective in a specific aca-demic domain (e.g., mathematics, sport, performing arts) will have BFLPEs thatare specific to that domain (e.g.., Marsh, 1994; also see Marsh & Craven, 2002)that will undermine some of the domain-specific attributes that such schoolsseek to reinforce. A number of issues related to the representativeness of thesample limit the generalizability of the results. Although the original samplesin both studies were representative of German students attending the final 2years of high school, typically in preparation for subsequent university atten-dance, this only represents about 30% of the total population of this age cohort.As is inevitable in large-scale longitudinal studies—particularly ones thatattempt to follow high school students after graduation from high school—therepresentativeness of the sample was further compromised by nonresponse inthe second waves of each of the studies.

Although beyond the scope of this investigation, it would also be usefulto pursue how academic self-concept and the BFLPE affected the subsequentlife choices that students make after high school graduation and how, in turn,these decisions affect subsequent self-concept. Thus, for example, Marsh andO’Mara (2006) report that there is a reciprocal pattern of relations betweenacademic self-concept in high school, high school achievement, and educa-tional attainment 5 years after high school.

Marsh et al.

32

Implications for Policy Practice

In many educational systems across the world, there is an ongoing pol-icy debate about the provision of highly segregated educational settings forvery bright students. This policy direction is based in part on a labeling theoryperspective, suggesting that bright students will have higher self-concepts andexperience other psychological benefits from being educated in the companyof other academically gifted students. Yet, our BFLPE and empirical evaluationof the predictive effects of academically selective settings (e.g., Marsh et al.,1995) shows exactly the opposite pattern of results. Placement of gifted stu-dents in academically selective settings results in lower academic self-concepts,not higher academic self-concepts. Coupled with other research showing thatschool-average ability has negative effects on other educational outcomes(coursework selection, educational aspirations, effort; see Marsh, 1991) andthat academic self-concept has reciprocal effects with achievement for studentsof all ability levels (Marsh & Craven, 2006), this finding has important policyimplications.

Whereas not all gifted and talented students will suffer lower academicself-concepts when attending academically selective high schools, many will.BFLPE research, however, provides an important alternative perspective toexisting policy directions that have not been adequately evaluated in relationto current educational and psychological research. Hence, we urge parents,policy makers, and practitioners to think carefully about the implications ofschool placements and to reflect on potential negative side effects of currentpolicy toward segregation of school and classes on the basis of academic abil-ity. A compromise position might be to recognize the negative implications ofthe BFLPE and to develop policies to try to counter the negative effects of high-school-average achievement. Hence, Marsh and Craven (2002) suggest that theBFLPE is reinforced by highly competitive classroom environments thatemphasize normative feedback that rank orders students. In the present inves-tigation, the grade-on-the-curve effect, such that students in academicallyselective schools get lower school grades than they would get if they were inmixed-ability schools, also reinforces the BFLPE. Hence, even though theBFLPE appears to be stable and pervasive, it may be possible to alter theschool environment in such a way as to undermine its negative effects.

The focus of our investigation has been on academically advantagedstudents. However, the BFLPE has implications for special education at bothends of the ability spectrum. In particular, consistent with BFLPE predictions,research with academically disadvantaged students shows that moving aca-demically disadvantaged students from special classes with other disadvan-taged students to mixed-ability classes (mainstreaming or inclusion) lowersnot only math, verbal, and academic self-concepts but also social self-con-cept (Marsh, Tracey, & Craven, 2006; Tracey, Marsh & Craven, 2003; also seeChapman, 1988).

Stability of the BFLPE

33

34

APP

END

IXC

orre

lati

ons,

Mea

ns, a

nd S

tand

ard

Dev

iati

ons

for

Var

iabl

es C

onsi

dere

d in

the

Tw

o St

udie

s

Stud

y1

12

34

56

78

910

1112

MSD

1T

1MSC

1.00

.79

.14

.57

.64

–.16

.50

–.07

–.03

–.03

–.04

.07

.00

1.00

2T

2MSC

.79

1.00

.14

.54

.59

–.15

.47

–.05

–.02

–.03

–.05

.06

.00

1.00

3S-

AM

.14

.14

1.00

.52

.07

–.20

.15

.03

.19

–.42

.25

–.40

–.07

.54

4M

Tes

t.5

7.5

4.5

21.

00.3

6–.

35.7

2–.

11.0

7–.

23.0

5–.

12.0

01.

005

Mat

h gr

ade

.64

.59

.07

.36

1.00

.01

.20

–.06

–.01

.03

–.01

.08

.00

1.00

6Se

x–.

16–.

15–.

20–.

35.0

11.

00–.

21.1

9–.

02.0

7–.

00.0

5.0

01.

007

ST.0

3.0

5.6

5.3

6.0

3.0

0.0

5.0

5.2

6–.

42–.

00–.

23.0

01.

008

Sex

"M

Tes

t–.

07–.

05.0

3–.

11–.

06.1

9–.

121.

00.4

6–.

26.3

5–.

12–.

35.9

69

Sex

"S-

AM

–.03

–.02

.19

.07

–.01

–.02

–.03

.46

1.00

–.37

.57

–.20

–.11

.50

10S-

AM

"M

Tes

t–.

03–.

03–.

42–.

23.0

3.0

7–.

03–.

26–.

371.

00–.

25.6

8.2

8.5

511

Sex

"ST

–.04

–.05

.25

.05

–.01

–.00

–.06

.35

.57

–.25

1.00

–.40

.00

1.00

12M

Tes

t "ST

.07

.06

–.40

–.12

.08

.05

.04

–.12

–.20

.68

–.40

1.00

.36

.97

35

Stud

y 2

12

34

56

78

910

1112

MSD

1T

1MSC

1.00

.68

.06

.41

.50

–.21

.01

–.04

.01

–.03

–.01

–.00

.00

1.00

2T

2MSC

.68

1.00

.10

.42

.41

–.19

.02

–.02

.03

–.05

–.03

.01

.00

1.00

3S-

AM

.06

.10

1.00

.42

.03

–.03

.53

–.03

–.12

–.02

–.05

–.31

–.03

.45

4M

Tes

t.4

1.4

2.4

21.

00.3

2–.

18.2

3–.

10.0

7–.

05–.

05–.

09.0

01.

005

Mat

h gr

ade

.50

.41

.03

.32

1.00

–.04

.02

–.02

.04

–.03

–.02

.02

.00

1.00

6Se

x–.

21–.

19–.

03–.

18–.

041.

00.0

4.0

9–.

05.1

2–.

03–.

05.0

01.

007

ST.0

1.0

2.5

3.2

3.0

2.0

41.

00–.

06–.

30–.

05–.

12–.

59.0

01.

008

Sex

"M

Tes

t–.

04–.

02–.

03–.

10–.

02.0

9–.

061.

00–.

04.4

3.2

6.0

8–.

181.

019

Sex

"S-

AM

.01

.03

–.12

.07

.04

–.05

–.30

–.04

1.00

–.14

–.01

.57

.18

.46

10S-

AM

"M

Tes

t–.

03–.

05–.

02–.

05–.

03.1

2–.

05.4

3–.

141.

00.5

3.0

1–.

01.4

411

Sex

"ST

–.01

–.03

–.05

–.05

–.02

–.03

–.12

.26

–.01

.53

1.00

–.03

.04

1.03

12M

Tes

t "ST

–.00

.01

–.31

–.09

.02

–.05

–.59

.08

.57

.01

–.03

1.00

.23

1.11

Not

e. T

1MSC

=T

ime

1 m

ath

self

-con

cept

; T2M

SC =

Tim

e 2

mat

h se

lf-c

once

pt; S

-AM

=sc

hool

-ave

rage

mat

h ac

hiev

emen

t (te

st s

core

s); M

Tes

t =m

ath

achi

evem

ent (

test

sco

res)

; mat

h gr

ade

=sc

hool

mar

k; s

ex =

gend

er (1

=m

ale,

2 =

fem

ale)

; ST

=sc

hool

type

(sel

ectiv

e sc

hool

: 1 =

no, 2

=ye

s). F

or p

urpo

ses

of a

naly

sis

(see

Met

hod

sect

ion

for f

urth

er d

iscu

ssio

n), a

ll in

divi

dual

stu

dent

term

s w

ere

stan

dard

ized

(M=

0, S

D=

1). C

ross

-pro

duct

and

sch

ool a

vera

ges

of in

divi

dual

stu

dent

resp

onse

s w

ere

base

d on

sta

ndar

dize

d in

divi

dual

stu

dent

sco

res

but n

ot re

stan

dard

ized

(so

that

they

are

in th

e sa

me

met

ric

as th

e in

di-

vidu

al s

tude

nt s

core

s).

Note

Work on this investigation was conducted, in part, while Professor Marsh was a visitingscholar at the Center for Educational Research at the Max Planck Institute for HumanDevelopment and was supported in part by the University of Western Sydney and the Max PlanckInstitute.

References

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting inter-actions. Newbury Park, CA: Sage.

Baumert, J., Bos, W., & Lehmann, R. (Eds.). (2000). TIMSS/III: Dritte InternationaleMathematik- und Naturwissenschaftsstudie—Mathematische und naturwis-senschaftliche Bildung am Ende der Schullaufbahn [TIMSS/III: Third internationalmathematics and science study—Students’ knowledge of mathematics and sci-ence at the end of secondary education]. Opladen, Germany: Leske and Budrich.

Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills:The effects of incentives on motivation and performance. European Journal ofPsychology of Education, 16, 441–462.

Baumert, J., Roeder, P. M., Gruehn, S., Heyn, S., Köller, O., Rimmele, R., et al. (1996).Bildungsverläufe und psychosoziale Entwicklung im Jugendalter [Educationalcareers and psychosocial development during adolescence]. In K. P. Treumann,G. Neubauer, R. Möller, & J. Abel (Eds.), Methoden und Anwendungen empirischpädagogischer Forschung (pp. 170–180). Münster, Germany: Waxmann.

Baumert, J., Roeder, P. M., Sang, F., & Schmitz, B. (1986). Leistungsentwicklung undAusgleich von Leistungsunterschieden in Gymnasialklassen [Development ofachievement and leveling of achievement differences in Gymnasium classes].Zeitschrift für Pädagogik, 32, 639–660.

Baumert, J., Trautwein, U., & Artelt, C. (2003). Schulumwelten—institutionelleBedingungen des Lehrens und Lernens [School environments—Institutionalconditions for learning and instruction]. In J. Baumert, C. Artelt, E. Klieme, M.Neubrand, M. Prenzel, U. Schiefele, et al. (Eds.), PISA 2000—Ein differenzierterBlick auf die Länder der Bundesrepublik Deutschland (pp. 259–330). Opladen,Germany: Leske + Budrich.

Beaton, A. E., Mullis, I. V. S., Martin, M. O., Gonzales, E. J., Kelly, D. L., & Smith, T.A. (1996). Mathematics achievement in the middle school years: IEA’s third inter-national mathematics and science study. Chestnut Hill, MA: Boston College.

Branden, N. (1994). Six pillars of self-esteem. New York: Bantam.Chapman, J. W. (1988). Learning disabled children’s self-concepts. Review of Educational

Research, 58, 347–371.Dai, D. Y. (2004). How universal is the big-fish-little-pond effect? American

Psychologist, 59, 267–268.Diener, E., & Fujita, F. (1997). Social comparison and subjective well-being. In

B. P. Buunk & F. X. Gibbons (Eds.), Health, coping, and well-being: Perspectivesfrom social comparison theory (pp. 329–358). Mahwah, NJ: Lawrence Erlbaum.

Espenshade, T. J., Hale, L. E., & Chung, C. Y. (2005). The frog pond revisited: Highschool academic context, class rank, and elite college admission. Sociology ofEducation, 78, 269–293.

Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7,117–140.

Goldstein, H. (2003). Multilevel statistical models (3rd ed.). London: Hodder Arnold.Husén, T. (1967). International study of achievement in mathematics: A comparison

of 12 countries (Vols. 1–2). Stockholm: Almqvist & Wiksell.

Marsh et al.

36

James, W. (1963). The principles of psychology. New York: Holt, Rinehart & Winston.(Original work published 1890)

Jerusalem, M. (1984). Reference group, learning environment and self-evaluations: Adynamic multi-level analysis with latent variables. In R. Schwarzer (Ed.), The selfin anxiety, stress and depression (pp. 61–73). New York: Elsevier North-Holland.

Köller, O., Watermann, R., Trautwein, U., & Lüdtke, O. (2004). Wege zur Hochschulreifein Baden-Württemberg. TOSCA—Eine Untersuchung an allgemein bildenden undberuflichen Gymnasien [Educational pathways to college in Baden-Württemberg.TOSCA—A study at upper secondary level of traditional and vocationalGymnasiums]. Opladen, Germany: Leske + Budrich.

Krull, J. L., & MacKinnon, D. P. (2001). Multilevel modeling of individual and grouplevel mediated effects. Multivariate Behavioral Research, 36, 249–277.

Lüdtke, O., Köller, O., Marsh, H. W., & Trautwein, U. (2005). Teacher frame of ref-erence and the big-fish-little-pond effect. Contemporary Educational Psychology,30, 263–285.

Marsh, H. W. (1974). Judgmental anchoring: Stimulus and response variables.Unpublished doctoral dissertation, University of California, Los Angeles.

Marsh, H. W. (1984a). Self-concept: The application of a frame of reference model toexplain paradoxical results. Australian Journal of Education, 28, 165–181.

Marsh, H. W. (1984b). Self-concept, social comparison and ability grouping: A replyto Kulik and Kulik. American Educational Research Journal, 21, 799–806.

Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journalof Educational Psychology, 79, 280–295.

Marsh, H. W. (1991). The failure of high ability high schools to deliver academic ben-efits: The importance of academic self-concept and educational aspirations.American Educational Research Journal, 28, 445–480.

Marsh, H. W. (1993). Academic self-concept: Theory, measurement and research. InJ. Suls (Ed.), Psychological perspectives on the self (Vol. 4, pp. 59–98). Hillsdale,NJ: Lawrence Erlbaum.

Marsh, H. W. (1994). Using the National Educational Longitudinal Study of 1988 toevaluate theoretical models of self-concept: The Self-Description Questionnaire.Journal of Educational Psychology, 86, 439–456.

Marsh, H. W. (2005). Big fish little pond effect on academic self-concept. GermanJournal of Educational Psychology, 19, 141–144.

Marsh, H. W., Chessor, D., Craven, R. G., & Roche, L. (1995). The effects of giftedand talented programs on academic self-concept: The big fish strikes again.American Educational Research Journal, 32, 285–319.

Marsh, H. W., & Craven, R. (1997). Academic self-concept: Beyond the dustbowl. In G. Phye (Ed.), Handbook of classroom assessment: Learning, achievement,and adjustment (pp. 131-198). Orlando, FL: Academic Press.

Marsh, H. W. & Craven, R. (2002). The pivotal role of frames of reference in acade-mic self-concept formation: The big fish little pond effect. In F. Pajares & T. Urdan (Eds.), Adolescence and education (Vol. 2, pp. 83–123). Greenwich,CT: Information Age.

Marsh, H. W., & Craven, R. G. (2006). Reciprocal effects of self-concept and perfor-mance from a multidimensional perspective: Beyond seductive pleasure and uni-dimensional perspectives. Perspectives on Psychological Science, 1, 133–163.

Marsh, H. W., & Hau, K. T. (2003). Big fish little pond effect on academic self-concept:A crosscultural (26 country) test of the negative effects of academically selectiveschools. American Psychologist, 58, 364–376.

Marsh, H. W., Köller, O., & Baumert, J. (2001). Reunification of East and West Germanschool systems: Longitudinal multilevel modeling study of the big fish little pond

Stability of the BFLPE

37

effect on academic self-concept. American Educational Research Journal, 38,321–350.

Marsh, H. W., Kong, C.-K., Hau, K.-T. (2000). Longitudinal multilevel modeling of thebig fish little pond effect on academic self-concept: Counterbalancing socialcomparison and reflected glory effects in Hong Kong high schools. Journal ofPersonality and Social Psychology, 78, 337–349.

Marsh, H. W., & O’Mara, A. (2006). Reciprocal effects between academic self-concept,self-esteem, achievement and attainment over seven adolescent-adult years:Unidimensional and multidimensional perspectives of self-concept. Unpublishedmanuscript, Department of Educational Studies, Oxford University, UK.

Marsh, H. W., & Parker, J. (1984). Determinants of student self-concept: Is it better tobe a relatively large fish in a small pond even if you don’t learn to swim as well?Journal of Personality and Social Psychology, 47, 213–231.

Marsh, H. W., & Rowe, K. J. (1996). The negative effects of school-average ability onacademic self-concept: An application of multilevel modeling. Australian Journalof Education, 40, 65–87.

Marsh, H. W., & Shavelson, R. (1985) Self-concept: Its multifaceted, hierarchical structure.Educational Psychologist, 20, 107–125.

Marsh, H. W., Tracey, D. K., & Craven, R. G. (2006). Multidimensional self-conceptstructure for preadolescents with mild intellectual disabilities: A hybrid multi-group-mimic approach to factorial invariance and latent mean differences.Educational and Psychological Measurement, 66, 795–818.

Marsh, H. W., Trautwein, U., Lüdtke, O., Köller, O., & Baumert, J. (2006). Integration ofmultidimensional self-concept and core personality constructs: Construct validationand relations to well-being and achievement. Journal of Personality, 74, 403–455.

Möller J., & Köller, O. (2001). Frame of reference effects following the announcementof exam results. Contemporary Educational Psychology, 26, 277–287.

Möller, J., & Köller, O. (2004). Die Genese akademischer Selbstkonzepte: Effektedimensionaler und sozialer Vergleiche [On the development of academic self-concepts: The impact of social and dimensional comparisons]. PsychologischeRundschau, 55, 19–27.

Plucker, J. A., Robinson, N. M., Greenspon, T. S., Feldhusen, J. F., McCoach, D. B.,& Subotnik, R. F. (2004). It’s not how the pond makes you feel, but rather howhigh you can jump. American Psychologist, 59, 268–269.

Rasbash, J., Steele, F., Browne, W., & Prosser, B. (2005). A user’s guide to MLwiN - Version3.0. Bristol, UK: University of Bristol.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applicationsand data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.

Robitaille, D., & Garden, R. (1989). The IEA study of mathematics: II. Contents andoutcomes of school mathematics. Oxford, UK: Pergamon.

Schwanzer, A. D., Trautwein, U., Lüdtke, O., & Sydow, H. (2005). Entwicklung einesInstruments zur Erfassung des Selbstkonzepts junger Erwachsener [Developmentof a questionnaire on young adults’ self-concept]. Diagnostica, 51, 183–194.

Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Validation of construct inter-pretations. Review of Educational Research, 46, 407–441.

Sherif, M., & Hovland, C. W. (1961). Social judgment. New Haven, CT: Yale UniversityPress.

Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basicand advanced multilevel modeling. London: Sage.

Tracey, D. K., Marsh, H. W., & Craven, R. G. (2003). Self-Concepts of preadolescentstudents with mild intellectual disabilities: Issues of measurement and educationalplacement. In H. W. Marsh, R. G. Craven, & D. M. McInerney (Eds.), Internationaladvances in self research (Vol. 1, pp. 203–230). Greenwich, CT: Information Age.

Marsh et al.

38

Trautwein, U., Lüdtke, O., Marsh, H. W., & Köller, O. (2006). Tracking, grading, andstudent motivation: Using group composition and status to predict self-conceptand interest in ninth-grade mathematics. Journal of Educational Psychology, 98,788–806.

Trautwein, U., Lüdtke, O., Köller, O., & Baumert, J. (2006). Self-esteem, academicself-concept, and achievement: How the learning environment moderates thedynamics of self-concept. Journal of Personality and Social Psychology, 90,334–349.

Zeidner, M., & Schleyer, E. J. (1999). The big-fish-little-pond effect for academic self-concept, test anxiety and school grades in gifted children. ContemporaryEducational Psychology, 24, 305–329.

Manuscript received May 15, 2006Revision received September 12, 2006

Accepted September 15, 2006

Stability of the BFLPE

39