language ability of young english language learners: definition, configuration, and implications

http://ltj.sagepub.com/Language Testing

http://ltj.sagepub.com/content/early/2014/07/31/0265532214542670The online version of this article can be found at:

DOI: 10.1177/0265532214542670

published online 4 August 2014Language TestingLin Gu

configuration, and implicationsLanguage ability of young English language learners: Definition,

Published by:

http://www.sagepublications.com

can be found at:Language TestingAdditional services and information for

http://ltj.sagepub.com/cgi/alertsEmail Alerts:

http://ltj.sagepub.com/subscriptionsSubscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://ltj.sagepub.com/content/early/2014/07/31/0265532214542670.refs.htmlCitations:

What is This?

- Aug 4, 2014OnlineFirst Version of Record >>

at TEXAS SOUTHERN UNIVERSITY on November 24, 2014ltj.sagepub.comDownloaded from at TEXAS SOUTHERN UNIVERSITY on November 24, 2014ltj.sagepub.comDownloaded from

http://ltj.sagepub.com/

http://ltj.sagepub.com/content/early/2014/07/31/0265532214542670

http://www.sagepublications.com

http://ltj.sagepub.com/cgi/alerts

http://ltj.sagepub.com/subscriptions

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsPermissions.nav

http://ltj.sagepub.com/content/early/2014/07/31/0265532214542670.refs.html

http://ltj.sagepub.com/content/early/2014/07/31/0265532214542670.full.pdf

http://online.sagepub.com/site/sphelp/vorhelp.xhtml



Language Testing 1 –18

© The Author(s) 2014Reprints and permissions:

sagepub.co.uk/journalsPermissions.navDOI: 10.1177/0265532214542670

ltj.sagepub.com

Language ability of young English language learners: Definition, configuration, and implications

Lin GuEducational Testing Service, USA

AbstractIn this study I examined the dimensionality of the latent ability underlying language use that is needed to fulfill the demands young learners face in English-medium instructional environments, where English is used as the means of instruction for teaching subject matters. Previous research on English language use by school-age children provided evidence that language proficiency for academic studies relates to, yet differs from, the language ability needed for social communications. Focusing on learners of English as a foreign language (EFL), I investigated the nature of language proficiency of school-age EFL learners in light of their learning experience.

Analyses were based on test performance from the TOEFL Junior® Comprehensive test a proficiency assessment of English as a foreign language for young learners between the ages of 11 and 15, developed by the Educational Testing Service. The results showed that the two ability constructs (i.e., academic and social language), although theoretically distinct and educationally relevant, were statistically indistinguishable based on EFL learners’ test performance. It was also found that the test performance could be explained best by a higher-order model, indicating that the language ability of these young EFL learners was structurally similar to that usually found with adult learners in a foreign language environment.

The outcomes highlight the interrelatedness of learning environment, age and language proficiency. On the one hand, the nature of the ability construct can vary across groups of learners due to differences in learning environments. On the other hand, learners of different ages who share similar learning environments could be similar in terms of the latent representation of their language proficiency. The study concludes that the interpretation of young EFL learners’ language proficiency needs to take into consideration how language components are developmentally related to each other as a function of learning experience in a foreign language environment.

KeywordsAcademic language, language ability, learning environment, social language, young learners

Corresponding author:Lin Gu, English Language Learning and Assessment, Educational Testing Service, MS 04-R, Princeton, NJ, USA. Email: [email protected]

542670 LTJ0010.1177/0265532214542670Language TestingGuresearch-article2014

Article

at TEXAS SOUTHERN UNIVERSITY on November 24, 2014ltj.sagepub.comDownloaded from


2 Language Testing

Young learners’ English language proficiency

In the discussion of the nature of English language proficiency developed in young school-age learners, the focus has been placed on the independence between academic and social language.

Cummins (1979) proposed the initial Basic Interpersonal Communicative Skills (BICS) and Cognitive Academic Language Proficiency (CALP) distinction. The former refers to conversational fluency in everyday informal speech. The latter is a dimension of language strongly related to general cognitive skills and academic performance. Evidence in support of this distinction comes from the observation that native-speaking children acquire conversational fluency at an early age, while their academic language use develops throughout their schooling and beyond. Cummins further argued that the independence evidenced in first language acquisition also applies to language acquisi-tion in a naturalist second language environment. By studying immigrant children in Canada, Cummins (1981a) found an average gap of several years between obtaining peer-appropriate levels in social aspects (e.g., oral fluency, phonology) of English and reaching grade norms in academic aspects (e.g., reading skills) of English. The findings from Hakuta, Butler, and Witt (2000), based on test performance in the USA and Canada, corroborated this difference in the time needed for achieving targeted proficiency levels in social and academic language. In their study, social language was evaluated through the use of an oral proficiency test and academic English was measured via an English reading achievement test. Together, these studies showed that in both monolingual con-texts with native-speaking children and multilingual contexts with immigrant children, the developmental patterns for academic and social language tend to diverge. In other words, they are inclined to develop at different rates and become specialized to different degrees. Carroll (1983) pointed out that developmental differences suggest that there exist different aspects of language ability that can be separately recognized and meas-ured. Following Carroll’s argument, academic and social language use can be considered as two distinct components of young learners’ language ability. Cummins (2008) cau-tioned against treating this distinction as an overall theory of language proficiency, but advocated that nonetheless it is developmentally meaningful and relevant to young lan-guage learners.

In a similar vein, Bailey (2007) defined academic language in contrast to the everyday informal speech used outside the classroom environment. Academic language aligns with classroom discourse, textbooks, educational standards, and content-area assess-ments. This distinction received empirical support from two studies examining students’ concurrent performance on an English proficiency test for measuring social language competence and on academic achievement tests for measuring academic language ability (Butler & Castellon-Wellington, 2000/5; Stevens, Butler, & Castellon-Wellington, 2000). Based on the evaluation of test reliability, test discrimination, and correlations between the social and academic measures, these studies demonstrated the differences between the specialized language of classroom instruction measured by content assessments and everyday English measured by traditional language tests.

Bailey (2007) further argued that “[i]n some regard, it is not meaningful to conceive of language as either social or academic, rather it is the situation that is either



Gu 3

predominantly social or academic” (p. 9). This viewpoint underlined the role of situation in shaping the nature and parameters of language use. By Bailey’s definition, language use is seen as inseparable from the contexts in which it is manifested. This view is con-sistent with the situation-based approach to understanding language proficiency pro-posed by Chapelle, Grabe, and Berns (1997), which makes an explicit distinction between the situation of language use and the internal capacities of individual language users. The proposed model portrays the relationship between the two as dynamic and integrated, through the belief that “the features of the context call on specific capacities defined within the internal operations” (p. 4).

While the distinction between academic and social language has been repeatedly sup-ported based on observations of and empirical data from native-speaking children and immigrant children, to the author’s knowledge, no previous studies have researched this distinction with young EFL learners. Learning experiences in an EFL environment differ from those in an ESL or a native learning environment in many aspects, among which the most salient is the target language contact. Language contact is a concept developed by study-abroad researchers (e.g., Freed, Dewey, Segalowitz, & Halter, 2004) and specifies the nature and intensity of language learners’ out-of-class contact with the target lan-guage. It is reasonable to assume that EFL learners’ target language contact is much more limited in terms of scope and intensity as compared to what is often experienced by learners living in an English-speaking environment. Owing to differences in learning environments, the language ability of EFL learners may have a different underlying con-figuration than the one found with ESL learners. Therefore, empirical evidence is needed to evaluate the meaningfulness and appropriateness of using the academic and social language distinction to conceptualize the nature of language proficiency of young EFL learners.

Language models based on assessing adult EFL learners

Although there is a lack of studies examining language proficiency developed by young EFL learners, a search of literature reveals that an extensive body of research has exam-ined the dimensionality of language ability in an EFL context by assessing adult learners.

Some researchers, most notably Oller (e.g., 1979), have proposed the unitary compe-tence model, which views language ability as indivisible. The theory widely used by the proponents of the unidimensional hypothesis comes from cognitive science. Based on Spearman’s (1904) thinking, a general factor of intelligence dominates most of the vari-ance in human performance.

Contrary to the unidimensional view, many researchers hold the belief that language ability is multidimensional in nature. According to this school of thought, language abil-ity can be specified as a series of skills or knowledge components. Carroll (1965), for example, proposed a four-skills approach to conceptualize language ability based on the assumption that the four skills of listening, reading, speaking, and writing are distin-guishable areas of performance. Results of empirical research have largely supported the multi-component view and have shown that the relationships among the different ability components can be captured by two competing models: a correlated-trait model and a



4 Language Testing

higher-order model. The former asserts that latent ability components have correlational relationships with one another. The latter imposes a higher-order factor whose existence explains the relationships among the first-order ability components.

The two hypotheses were investigated by Fouly, Bachman, and Cziko (1990), who found that both models fit the data well, and concluded that both distinct skill factors and a general language factor existed. By using two large-scale test batteries, Bachman, Davidson, Ryan, and Choi (1995) identified a correlated four-factor solution with speak-ing, listening, and test-specific writing skills. Owing to high inter-factor correlations, they transferred this model to a higher-order model and concluded that the higher-order model with four first-order factors best represented the construct that was being meas-ured. When choosing between two statistically equivalent models, a correlated three-factor model and a higher-order model with three first-order skill factors, both Shin (2005) and Sasaki (1993) selected the latter as the baseline model for further analyses in their studies. In the context of TOEFL testing using task-based scores, Stricker and Rock (2008) found that a correlated four-factor model and a higher-order model with four first-order skill factors fit the data similarly well. They concluded that the latter was the better model because it had fewer parameters to estimate and therefore was more parsimonious. Sawaki, Stricker, and Oranje (2009) explored the same data set used by Stricker and Rock, but conducted the analysis based on item-level scores. Although they found that the correlated four-factor model fit better than the higher-order model based on the result of a chi-square difference test, they concluded that the higher-order model was preferred because it was more parsimonious. In both TOEFL studies, the first-order factors corre-sponded to the four skills: reading, listening, speaking, and writing.

In sum, all three models provide potential factorial solutions to account for test per-formance of adult learners: a unidimensional model, a higher-order model, and a corre-lated-trait model.

Can one assume that these models also apply to young learners? Harley, Cummins, Swain, and Allen (1990) cautioned that factor structures based on adult learners may not apply to young learners with different experiences. Learning experiences of adult and young learners could be accounted for differently by age-related variables, such as the starting age of learning and the length of learning. Therefore, empirical investigations should be called for to evaluate the extent to which using a model based on adult learners to portray language proficiency of young learners can be warranted.

The review of literature reveals two current trends. First, inquiries about young learn-ers’ language proficiency have been carried out mostly with native-speaking children and with language learners who are immersed in the target language environment. While there has been evidence supporting the use of the two ability constructs, academic and social language, to conceptualize these young learners’ English language ability, this distinction has rarely been researched in the young EFL population. Second, within the EFL learner population, our current understanding of the nature of language ability is largely based on adult learners’ test performance.

The configuration of language ability may vary as a function of learning experience, which differs across an EFL and an ESL environment. The variations in learning experi-ences could also be described in terms of age-related variables across young and adult learners. Building upon what we have learned about language proficiency in various



Gu 5

learner groups and taking into account the variations in learning experiences among the groups, in this study I examined the nature of English proficiency of young learners who learn the target language in a foreign language environment.

Purpose of the study

The overarching goal of the study was to investigate the nature of language proficiency of school-age EFL learners in light of their learning experience.

The purpose of the study was twofold. The first was to investigate empirically the distinction between the two constructs, academic and social language, in a foreign lan-guage environment through a latent factor analytic approach. By adopting the situation-based approach (Bailey, 2007; Chapelle et al., 1997), the operational definitions of academic and social language in this article were language use in academic situations and language use in social situations. Methodologically speaking, this investigation dif-fered from the previous efforts of differentiating between social and academic language in that it implemented a latent factor analytic approach to model the relationship between the two theoretical constructs by taking measurement errors into consideration. A dis-tinction between latent constructs can be made if the criteria to evaluate a model that hypothesizes such a distinction can be satisfied. The results may shed light on the poten-tial impact of the two learning environments, that is, ESL and EFL, on the configuration of young learners’ language ability.

The second focus of the study was to inspect whether previously confirmed language proficiency models based on adult EFL learners are applicable to EFL learners of younger ages. The results from this investigation, combined with the ones from the first research inquiry, may illuminate the extent to which age interacts with learning environment in defining young learners’ English language proficiency.

The two research questions proposed were as follows:

RQ1: To what extent does the distinction between academic and social language that has been found with young learners immersed in the target language environment also exist with young learners in a foreign language environment?RQ2: To what extent are the previously established language ability models based on adult EFL learners’ test performance applicable to young EFL learners?

Method

The TOEFL Junior® Comprehensive Test

The TOEFL Junior Comprehensive Test, developed by the Educational Testing Service, was used in the study to elicit data to answer the research questions. This is a proficiency assessment of English as a foreign language for young learners between the ages of 11 and 15. It measures both social and academic language uses in English-medium instruc-tional environments, using tasks representative of the school context.

This study used a pilot form of the test, which consisted of four sections: Reading Comprehension (Reading), Listening Comprehension (Listening), Speaking, and



6 Language Testing

Writing. The reading items were distributed over four testlets. A testlet is a set of items associated with the same stimulus input. The listening section had five testlets and some independent items. Each listening and reading item was dichotomously scored. The speaking section had four tasks, among which two were integrated tasks, requiring test takers first to listen and then to speak. The writing section had five tasks, among which two were integrated tasks, one requiring listening preceding writing, and the other requiring reading preceding writing. Each speaking or writing task was scored on a scale of 0–4. The score report provides an overall score level and a separate scale score on each of the four sections.

Participants

A total of 498 participants took this pilot test form in the fall of 2011. The participants were asked to provide background information, including age, gender, native country, native language, grade level, the amount of time they spent studying English both at school and outside of school, the amount of time they spent living in an English-speaking country, and so on. Their answers showed that they were between the ages of 11–15, evenly distributed across gender, and were from 15 different countries. Slightly more than half of them were native Korean speakers. About 10% of the participants indicated that they had lived in an English-speaking country, for a time period lasting from less than three months to over one year, upon test taking. Since our study focused on EFL learners, we decided to exclude from the analysis learners who had been exposed to the target language environment. The remaining 436 participants constituted the analysis sample for the study.

Analysis

To answer the first research question, the impact of testlets in Listening and Reading was first examined. Both sections grouped items by common input stimulus. A dependence structure could exist among items nested within the same testlet (Bradlow, Wainer, & Wang, 1999). Research has shown that ignoring this dependence structure may induce biases in item parameter estimates, including intercepts and loadings (Bradlow et al. 1999; DeMars, 2006; Rijmen, 2010), which in turn could result in model misspecification.

A unidimensional structure and a bi-factor structure within Reading and Listening, respectively, were imposed. Potential testlet effects were then examined by evaluating model–data fit. A bi-factor model consists of both a general dimension and testlet-specific dimensions. Incorporating testlet-specific dimensions permits an examination of the conditional dependencies between items pertaining to the same testlets (Rijmen, 2010).

In the Reading bi-factor model, all items loaded on both the target modality factor of reading (R) and one of the testlet-based factors (Pk). In the Listening bi-factor model, all items (except for the independent items, which had loadings only on the target modality factor) loaded on both the target modality factor of listening (L) and one of the



Gu 7

testlet-based factors (Pk). Schematic representations of the bi-factor models are shown in Figures 1 and 2.

Analysis continued with item categorization. Academic language differs from social language in terms of the following parameters of a language use situation: purpose, con-tent, setting, and register. A correlated two-factor model was then tested via confirmatory factor analysis. In this model, items eliciting language use in academic situations were modeled to load on one common factor, presumably academic language ability (A), and items eliciting language use in social contexts were specified to load on the second com-mon factor, presumably social language ability (S). Figure 3 shows a schematic represen-tation of this model.

To answer the second research question, language ability models based on assessing adult learners were examined to determine whether they applied to young learners. Model testing included three competing models: a unidimensional model, a higher-order model, and a correlated four-factor model. The unidimensional model (Figure 4) asserted that a single general language ability factor (G) accounted for test performance. The higher-order model (Figure 5) consisted of four independent skill factors, Listening (L), Reading (R), Speaking (S), and Writing (W), conditional on a general language ability factor (G). This model corresponded to the test’s scoring scheme, which provides four skill scores and an overall score level. In the four-factor model (Figure 6), the four skill factors correlated with one another. This model corresponded to the section structure of the test. All three models were fitted to the data to determine which model best repre-sented the test construct.

Figure 1. Schematic representation of the Reading bi-factor model.



8 Language Testing

Figure 2. Schematic representation of the Listening bi-factor model.

Figure 3. Schematic representation of the model with academic and social language factors.



Gu 9

Some speaking and writing items were integrated in nature and required the execution of two skills simultaneously or in close succession. Such items could load on the target modality, or both associated modalities. Owing to the possibility that cross-loadings might occur with integrated items, two modeling series were conducted. In one series, items were specified to load on their target modality only. In the other series, integrated items loaded on both the target modality and the associated secondary factor. For exam-ple, in the latter series, an integrated speaking item loaded on not just the speaking factor, but also the listening factor.

The analyses were based on item-level raw scores. Latent analyses were performed using Mplus version 6.1 (Muthén & Muthén, 2010). The data set contained both binary and ordinal variables at the item level. Finney and DiStefano (2013) suggested treating ordered categorical data with very few categories as categorical and using robust diago-nally weighted least squares (DWLS) estimators to adjust the parameter estimates, stand-ard errors, and fit indices for the categorical nature of the data. In this study I treated all variables as categorical and used the WLSMV estimator, a robust DWLS estimator pro-vided by Mplus. The adequacy and appropriateness of the latent models were evaluated based on three criteria: (a) values of selected global model fit indices; (b) individual parameter estimates; and (c) the principle of parsimony.

Figure 4. Schematic representation of the unidimensional model.



10 Language Testing

The following DWLS-based global fit indices were employed: Chi-square (χ2), com-parative fit index (CFI), and root mean square error of approximation (RMSEA). A sig-nificant χ2 value signals a poor model fit, although this value should be interpreted with caution because it is highly sensitive to sample size. Yu and Muthén (as cited in Finney & DiStefano, 2013) suggested that guidelines similar to those used for maximum likeli-hood based fit indices can apply to DWLS-based indices. Following their suggestions, a CFI value larger than .94 and a RMSEA value smaller than .06 indicate good model-data fit. Individual parameter estimates were also examined for appropriateness and signifi-cance. Previous researchers (Gu, 2014; Sawaki et al., 2009; Stricker & Rock, 2008) employed a correlation of. 9 to detect extremely high inter-factor correlations. This cri-terion was adopted to screen out models with extreme factor dependency. The principle of parsimony favors a simpler model (with more degrees of freedom) over a more satu-rated one (with fewer degrees of freedom) if the two models fit equivalently and it was implemented when choosing between competing models with similar fits.

Results

Examining the testlet effects

A unidimensional model and a bi-factor model were imposed within the Reading and Listening section, respectively, to examine the impact of the testlets on the internal

Figure 5. Schematic representation of the higher-order model.



Gu 11

structure of the section. The estimation of the bi-factor model did not succeed in either case. Testing the one-factor model was successful in both sections. The fit indices sum-marized in Table 1 indicated that the one-factor model fit the data adequately. Failing to confirm the bi-factor models suggested a lack of supporting evidence of testlet-based dimensions. In the following analyses, testlet-based dimensions were not mod-eled, meaning that all items were treated as independent of one another with regard to the testlets.

Item content analysis

The demands of language that young learners need to fulfill in an English-medium school context require them to use English in both academic and social situations. Test items were categorized based on the language use situation, which contextualized each item as social or academic.

Previous researchers (Bailey, 2007; Bailey & Heritage, 2008; Chamot & O’Malley, 1994; Schleppegrell, 2001) attempted to define academic language in contrast to social language regarding purpose, content, setting, and register. In terms of purpose, a common defining feature of academic language is that it is used for teaching and learning academic content. Language content used in academic situations is predomi-nantly textbook- and lecture-based. Such language-use episodes mostly take place during instructional time. Determined by the purpose, content, and setting, the aca-demic language register requires formal, precise, and highly structured use of lan-guage. Sample test items are as follows: listening to a teacher presenting information

Figure 6. Schematic representation of the correlated four-factor model.



12 Language Testing

on an animal in a science class; reading about information on a recreational sport in a social science class; and summarizing orally and in writing the academic content learned during class time.

By contrast, social language is used to interact in a broader school setting and involves personal, family, and school lives. Most of such language episodes occur during non-instructional time. This register is predominantly informal. Sample test items are as fol-lows: listening to an announcement from a school principal about a schedule change; reading a school event poster; orally summarizing the logistics about an upcoming field trip; and writing an email to peers about sports clubs.

Although both types of items were designed to represent a school context, they dif-fered with regard to the purpose, content, setting, and register of language use. Table 2 summarizes the number of items associated with academic and social language use respectively by skill section.

Modeling the relationship between the language-use-in-context factors

A correlated model with two latent factors, the ability to use English in academic situa-tions and the ability to use English in social contexts, was tested. The resulting fit indices (χ2 = 2352.622, df = 2014; CFI = 0.985; RMSEA = 0.020) indicated that at the global level, the model fit the data well. Loadings on the target factors were all significant. However, the latent factor correlation was estimated to be as high as 0.976, which sig-naled that the two latent factors were not statistically distinguishable.

Testing skill-based language ability models

Based on the previous research assessing adult learners as well as the design and scoring scheme of the TOEFL Junior Comprehensive test, three competing models were fit to the data: a unidimensional model, a higher-order model, and a correlated four-factor model.

When testing the models with cross-loadings allowed, the loadings on the secondary factors were all found to be non-significant. All models with cross-loadings were accord-ingly discarded. The models with only one loading allowed for each manifest variable were also tested. In these models, integrated speaking and writing items loaded on their target modality. Table 3 summarizes the results. According to the fit indices, all models fit adequately. However, the correlations among the four skill factors were either larger than or close to 0.9 (Table 3), suggesting a common underlying dimension across the skill factors. This model was consequently discarded.

Table 1. Summary of testing the one-factor models in Reading and Listening.

Skill Models Chi-square df CFI RMSEA

Reading Unidimensional 460.283 350 0.975 0.027 (0.020–0.033)Listening Unidimensional 413.353 350 0.990 0.020 (0.010–0.028)



Gu 13

In choosing between the remaining two models, the unidimensional model and the higher-order model, a χ2 difference test was conducted. The result was significant (Δχ2 = 72.247, Δdf = 4, p = .000), indicating that the more complicated higher-order model per-formed much better than the more restricted unidimensional model. Accordingly, adopt-ing the unidimensional model could not be justified based on the principle of parsimony. The higher-order model was therefore chosen as the final model which, among all the models being tested, best represented the relationships among the items on the test. This model had four first-order factors (reading, listening, speaking, and writing), whose rela-tionships could in turn be accounted for by a higher-order factor, presumably reflecting general language ability.

Discussion and implications

In this section, the study results are discussed, with an emphasis on the relationships among learning environment, age, and the nature of language ability developed in young EFL learners.

Two theoretical constructs, academic and social language abilities, were hypothesized in the study. The results based on young EFL learners’ test performance showed that

Table 2. Summary of items associated with social and academic language use by skill section.

Social Academic Total

Reading 12 16 28Listening 20 8 28Speaking 2 2 4Writing 3 2 5Total 37 28 65

Table 3. Summary of testing the three competing skill-based models.

Model Chi-square df CFI RMSEA

Unidimensional 2359.239 2015 0.984 0.020 (0.016–0.023)Higher-order 2239.772 2011 0.990 0.016 (0.012–0.020)Four-factor 2230.256 2009 0.990 0.016 (0.011–0.020)

Factor correlations in the four-factor model

Reading Listening Speaking Writing

Reading 1 Listening 0.928 1 Speaking 0.826 0.864 1 Writing 0.928 0.901 0.878 1



14 Language Testing

these two were not distinct enough to be considered as separate factors. This finding accordingly did not agree with the outcomes of past studies that were based on native-speaking and immigrant children. It was shown previously that these two ability con-structs were not just conceptually distinct, but also empirically separable.

Although the outcomes from studies based on different learner groups varied, the seemingly contradictory results can be reconciled if we view language proficiency in light of the environment in which learning takes place.

As mentioned earlier, an EFL environment differs from an ESL or a native language learning environment, not just quantitatively, but also qualitatively. In the native or immigrant environment, social language input is abundant. Out-of-school experience provides children with ample opportunities to practice social language use. On the con-trary, academic language is mostly experienced and learned in a limited classroom set-ting. Owing to the characteristics of this acquisition environment, it is typical that the social language develops and matures in early years, while the academic language con-tinues to evolve throughout the school years and beyond. At any point in this develop-mental process, if assessed concurrently, there is a great chance that the two ability constructs can be separately recognized, and considered to be distinct. Therefore, it is reasonable to argue that the distinction of academic language ability and social language ability can be attributed to their developmental differences.

However, what holds in one learning environment might not apply to the developmen-tal relationships among language components in other learning conditions (Cummins, 1981b, 2000). Harley et al. (1990) called for interpreting language proficiency within a developmental context by taking into account learners’ interaction with the target lan-guage. Unlike the native-speaking or immigrant children, there is limited exposure to social English use both within-school and outside-of-school for typical foreign language learners. The opportunities to practice and develop social language use through contact with the target language community are constrained, and in most cases, largely unavaila-ble. Both social and academic language uses are restricted to classroom settings and are learned mainly through a formal classroom approach. In other words, the acquisition of both social and academic language is an instructed experience for EFL learners. In such an environment, with language input and output opportunities for both social and aca-demic language being relatively equal, it is sensible to assume that the growth patterns for the two constructs are similar in nature; that is, they develop at similar rates and become specialized to similar degrees during the learning process. This study found that academic language ability and social language ability in an EFL environment were statistically indistinguishable, which could be attributed to their similar developmental patterns, shaped by the foreign language learning environment. The results imply that one cannot assume that young EFL learners’ social language ability is already developed by the school age. Nor can one assume that social language develops quicker, or easier than academic language in a foreign language environment. Therefore, efforts should be made with equal force to develop both academic and social language with young EFL learners.

However, it is worth noting that a psychometric lens provides only one partial per-spective on the nature of language proficiency. Although not found to be statistically distinct, both social and academic language are educationally relevant. For the purpose of assessing young EFL learners’ proficiency, a test must strive to include both in pursuit of positive washback effects on teaching and learning.



Gu 15

In response to the second research question, the study demonstrated that the language ability of young EFL learners was structurally similar to that usually found with adult EFL learners.

Language learners in the cited factor analytic studies were all adults. In their study, Fouly et al. (1990) reported that the median age of the foreign students admitted to a US university was 26. The median age of the Japanese students learning English in their home country was 20 in Sasaki (1993). Other cited studies (Bachman et al., 1995; Sawaki et al., 2009; Shin, 2005; Stricker & Rock, 2008) all used test instruments that were designed for adult EFL learners. In contrast, the learners in this study were between the ages of 11 and 15.

The results of the analyses indicated that the structure of young EFL learners’ lan-guage proficiency consisted of four first-order factors subsumed under a higher-order factor. The four first-order factors corresponded to the four skills measured by the test: reading, listening, speaking, and writing. This hierarchical model with skill-based com-ponents was often found to best represent adult learners’ test performance (e.g., Sawaki et al., 2009; Stricker & Rock, 2008). Stated differently, language proficiency developed in adult and young EFL learners were comparable in terms of the nature of the latent ability components as well as the manner in which these components interact. For both groups, the target language is learned mainly in a foreign language context. The struc-tural resemblance across the groups could, in part, be attributed to the similarities in the learning environment despite the age-related differences.

The final higher-order model intimated that skill-based learning curricula often adopted in foreign language classrooms could lead to a differentiation of skills. It further suggested that parallel developmental patterns for the skills in a foreign language context could give rise to the general language factor accounting for the strong associations among the skills.

From a test validation point of view, this finding provided validity evidence based on the test’s internal structure in support of the test design and score reporting practice. The higher-order model summarized the relationships among the test items as the test devel-oper anticipated. The finding of the four first-order skill factors corroborated the approach of organizing the test content by skill and the decision to report skill-based section scores. The strong associations among the four skills led to the rejection of the correlated four-factor model and to the emerging of the higher-order factor in the final model. The hier-archical nature of the final model supported the decision of reporting both skill-based scores and an overall score level.

Conclusions

In this study, the conceptualization of young EFL learners’ English language proficiency was attempted in light of the variability in learning experiences. The outcomes high-lighted the interrelatedness of learning environment, age, and language proficiency.

Failing to confirm the hypothesized distinction between academic and social lan-guage with young EFL learners underscored the role of the learning environment in defining the language ability construct during its various developmental stages. Being able to fit the hierarchical model with skill-based components to the test performance from young EFL learners further stressed the importance of the learning experience in



16 Language Testing

shaping the configuration of language proficiency across different age groups. While the latent structure of language proficiency can vary across learner groups owing to differ-ences in learning environments, learners of different ages who otherwise share similar learning environments might be comparable in terms of the nature of their language proficiency. The study concludes that the interpretation of young EFL learners’ language proficiency needs to take into consideration how language components are developmen-tally related to each other as a function of the learning experience in a foreign language environment.

A few study limitations need to be pointed out. First, the data came from a pilot test instead of an operational test, and hence the test takers may not have been as motivated as regular test takers, and may not have been representative in their backgrounds.

Second, the test instrument used in the study may have limited the extent to which social language use was operationalized. Owing to the purpose of the TOEFL Junior Comprehensive Test, that is, to measure language proficiency in situations and tasks representative of an English-medium instructional environment, both academic and social language use elicited by the test items was embedded in a school context. Therefore, the use of social language outside of school in the society at large was not represented by the test instrument. Future researchers are encouraged to use different test instruments, especially those that operationalize a wide range of language use for social interactions, to investigate the nature of the language ability of young EFL learners.

Furthermore, the study simply compared, informally but not analytically, the configu-ration of the language ability construct across different learner groups. Future studies that formally contrast learners of different experiences via multi-group invariance analy-ses would help develop a stronger argument for the impact of learning experience on the nature of language ability.

Last but not least, albeit developmental in perspective, this study was limited in that language progression was not monitored over time. Nor was detailed learner background information, for example, learners’ interaction with social and academic language and with the four language skills, collected to better capture the richness of the learning pro-cess. A longitudinal design with detailed accounts of learning conditions and experiences would further an understanding of how language ability configures and develops in dif-ferent learner groups.

Funding

This research was internally funded by Educational Testing Service, the author’s employer.

References

Bachman, L. F., Davidson, F., Ryan, K., & Choi, I.-C. (1995). An investigation into the compara-bility of two tests of English as a foreign language: The Cambridge-TOEFL comparability study. New York: Cambridge University Press.

Bailey, A. L. (2007). Introduction: Teaching and assessing students learning English in school. In A. L. Bailey (Ed.), The language demands of school: Putting academic English to the test (pp. 1–26). New Haven, CT: Yale University Press.

Bailey, A. L., & Heritage, H. M. (2008). Formative assessment for literacy, grades K-6: Building reading and academic language skills across the curriculum. Thousand Oaks, CA: Corwin Press.



Gu 17

Butler, F. A., & Castellon-Wellington, M. (2000/5). Students’ concurrent performance on tests of English language proficiency and academic achievement. In The validity of administering large-scale content assessments to English language learners: An investigation from three perspectives (CSE Tech. Rep. No. 663). Los Angeles, CA: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).

Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.

Carroll, J. B. (1983). Psychometric theory and language testing. In J. W. Oller, Jr. (Ed.), Issues in language testing research (pp. 80–107). Rowley, MA: Newbury House.

Chamot, A. U., & O’Malley, J. (1994). The CALLA handbook: Implementing the cognitive aca-demic language learning approach. Reading, MA: Addison-Wesley.

Chapelle, C., Grabe, W., & Berns, M. (1997). Communicative language proficiency: Definition and implications for TOEFL 2000. (TOEFL Monograph Series Report No. 10). Princeton, NJ: Educational Testing Service.

Cummins, J. (1979). Cognitive/Academic language proficiency, linguistic interdependence, the optimum age question and some other matters. Working Papers on Bilingualism, 19, 121–129.

Cummins, J. (1981a). Age on arrival and immigrant second language learning in Canada: A reas-sessment. Applied Linguistics, 11, 132–149.

Cummins, J. (1981b). The role of primary language development in promoting educational success for language minority students. In California State Department of Education (Ed.), Schooling and language minority students: A theoretical framework (pp. 3–49). Los Angeles, CA: National Dissemination and Assessment Center.

Cummins, J. (2000). Language, power, and pedagogy: Bilingual children in the crossfire. Clevedon, UK: Multilingual Matters.

Cummins, J. (2008). BICS and CALP: Empirical and theoretical status of the distinction. In B. Street & N. H. Hornberger (Eds.), Encyclopedia of language and education, vol. 2: Literacy (2nd ed., pp. 71–83). New York: Springer Science + Business Media LLC.

DeMars, C. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43, 145–168.

Finney, S. J., & DiStefano, C. (2013). Nonnormal and categorical data in structural equation mod-els. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed., pp. 439–492). Charlotte, NC: Information Age.

Fouly, K. A., Bachman, L. F., & Cziko, G. A. (1990). The divisibility of language competence: A confirmatory approach. Language Learning, 40, 1–21.

Freed, B. F., Dewey, D. P., Segalowitz, N., & Halter, R. (2004). The language contact profile. Studies of Second Language Acquisition, 26, 349–356.

Gu, L. (2014). At the interface between language testing and second language acquisition: Language ability and context of learning. Language Testing, 31, 111–133.

Hakuta, K., Butler, Y. G., & Witt, D. (2000). How long does it take English learners to attain proficiency? (Policy Reports 2000–1). University of California Linguistic Minority Research Institute, University of California, Berkeley.

Harley, B., Cummins, J., Swain, M., & Allen, P. (1990). The nature of language proficiency. In B. Harley, P. Allen, J. Cummins, & M. Swain (Eds.), The development of second language proficiency (pp. 7–25). New York: Cambridge University Press.

Muthén, L. K., & Muthén, B. O. (2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Authors.Oller, J. W., Jr. (1979). The factorial structure of language proficiency: Divisible or not? In J.

W. Oller, Jr. (Ed.). Language test at school: A pragmatic approach (pp. 423–458). London: Longman.



18 Language Testing

Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, 361–372.

Sasaki, M. (1993). Relationships among second language proficiency, foreign language aptitude, and intelligence: A structural equation modeling approach. Language Learning, 43, 313–344.

Sawaki, Y., Stricker, L. J., & Oranje, A. H. (2009). Factor structure of the TOEFL Internet-based test. Language Testing, 26, 5–30.

Schleppegrell, M. J. (2001). Linguistic features of the language of schooling. Linguistics and Education, 12, 431–459.

Shin, S.-K. (2005). Did they take the same test? Examinee language proficiency and the structure of language tests. Language Testing, 22(1) 31–57.

Spearman, C. (1904). “General Intelligence,” objectively determined and measured. The American Journal of Psychology, 15(2), 201–292.

Stevens, R., Butler, F. A., & Castellon-Wellington, M. (2000). Academic language and con-tent assessment: Measuring the progress of ELLs (CSE Tech. Rep. No. 552). Los Angeles, CA: University of California at Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).

Stricker, L. J., & Rock, D. A. (2008). Factor structure of the TOEFL Internet-Based Test across subgroups. (TOEFL iBT Research Report No. 07; ETS Research Report No. 08–66). Princeton, NJ: Educational Testing Service.

Yu, C., & Muthén, B. (2002, April). Evaluation of model fit indices for latent variable models with categorical and continuous outcomes. Paper presented at the meeting of the American Educational Research Association, New Orleans, LA.



language ability of young english language learners: definition, configuration, and implications

Documents