results - tokyo university of foreign...

33
CHAPTER FOUR RESULTS This chapter reports the results o f quantitative and examinees’perf()rmances and their attitudes toward testi mode and face-to-face mode. In order to answer the research study, data were gathered from multiple sources:(1)examinees (2)examinees’speech samples, and(3)po st-exam questionnaire The analyses fbcused primarily on the differences in te and examinee attitudes between the computer mode and Quantitatively, t-test and克ctor analysis results are relationships between test scores across modes. The resu to explore the relationships between delivery mode and speec and chi-square test results from analysis of question Qualitatively, examinees’comments on open-ended question This chapter is corpposed of fbur main sections. The fi results of analyzhlg the comparability ofraw scores across section reports the comparability of underlying constr Examinees’speech samples are examined in the third section, the effect of computer delivery mode on speech samples an delivery mode and examinees’proficiency are reported. T questio皿aire results. Specifically, examinees’respon regarding the two modes are analyzed. The comments 64 東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Upload: lethu

Post on 05-Aug-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

CHAPTER FOUR

RESULTS

     This chapter reports the results o f quantitative and qualitative analyses o f

examinees’perf()rmances and their attitudes toward testing speaking in the computer

mode and face-to-face mode. In order to answer the research questions posed in this

study, data were gathered from multiple sources:(1)examinees’scores on the two tests,

(2)examinees’speech samples, and(3)po st-exam questionnaire results.

     The analyses fbcused primarily on the differences in test scores, speech samples

and examinee attitudes between the computer mode and the face-to-face mode.

Quantitatively, t-test and克ctor analysis results are examined to determine the

relationships between test scores across modes. The results of ANOVAs are discussed

to explore the relationships between delivery mode and speech sample. Moreover, t-test

and chi-square test results from analysis of questionnaire items are reported.

Qualitatively, examinees’comments on open-ended questions are discussed.

     This chapter is corpposed of fbur main sections. The first section presents the

results of analyzhlg the comparability ofraw scores across the two modes. The second

section reports the comparability of underlying constructs measured across modes.

Examinees’speech samples are examined in the third section, in which results regarding

the effect of computer delivery mode on speech samples and the relationship between

delivery mode and examinees’proficiency are reported. The final section deals with

questio皿aire results. Specifically, examinees’responses to questionnaire items

regarding the two modes are analyzed. The comments from the examinees on

64

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 2: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

comparisons o f the two modes are categoriZed and illustrated in detail.

4.1 Comparing the magnitude of raw scores

     This section first reports the rater reliability fbr ratings assigned to the monologic

tasks delivered in the computer mode and the face-to-face mode. It then examines the

existence of order effect, which is a concem when using a counterbalanced design.

Finally, it describes the results ofcomparing the mean scores ofratings across modes.

4.1.1  1nter-rater reliability

     In speaking test scores that were obtained from two raters, a source of error

typically lies in the inconsistency of the ratings. Thus, inter-rater reliabilities using two

types of indexes were calculated to measure the consistency between the two ratings

awarded to each rating element f()r each task. First, Pearson product moment correlation

coefficients were computed between the ratings. In addition, given that a high

correlation coefficient could be obtahled despite relatively different rat01gs being

awarded by the two raters, the inter-rater agreement percentage was also calculated.

Exact agreement indicates that the two raters assigned the same score;a(lj acent

agreement means that the rating differenc e between the two raters was one.

     As can be seen in Table 4.1, the inter-rater reliability estimates(Pearson

correlation coefficients)fbr the computer mode ranged from.52 to.75. Except fbr

vocabulary(r=.52), fluency(r=.53)and pronunciation(r=.54)fbr the opmion task,

these estimates are sufficiently high. Moreover, the agreement between the ratings

awarded by the two raters showed satisfactory results in total agreement, ranging from

96.2%to 100%fbr all cases, with moderately high exact agreement percentages

65

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 3: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

(49.5%-72.2%)and a(lj acent agreement percentages(26.6%-45.6%).

Table 4.11nter-rater reliabilities fbr the ratings in the computer mode

Rating element Task   Exact   A(lj acent   Totalb

agreement%agreement%agreement%

Grammar

Vocabulary

Fluency

Pronunciation

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

Na1Tative

Opinion

5482939476657555

72.2

58.2

57.0

49.4

70.9

59.5

62.0

55.7

26.6

38.0

40.5

36.7

29.1

30.4

38.0

40.5

98.8

96.2

97.5

86.1

100

89.9

100

96.2

N・te. N =79. apears・n c・rrelati・n c・efficient between the tw・ratings. bT・tal

agreement percent is the sum of the exact and a(lj acent agreement percent.

     Table 4.2 displays the results of inter-rater reliabilities fbr the face-to-face mode.

Pearson correlation coefficients ranged from.60 to.74. Except fbr pronunciation figures

fbr the opinion task(r=.60)being slightly low, other figures are suf五ciently high.

Similar to fmdings for the computer mode, rater agreement tumed out to be satisfactory

fbr all cases in terms of exact agreement(54.4%-68.4%), a(ljacent agreement

(29.1%-45.6%),and total agreement(96.2%-100%).

     The results of rater agreement indicate that the two ratings assigned to both

modes were almost all within one score difference. Taking the results ofboth types of

inter-rater reliability indexes into account, the consistency ofthe ratings in both modes

Was considered to be acceptable fbr this study. Thus, the two ratings awarded to each

rating element were averaged for each task. They are named element scores in this study

and are used in the fo llowing quantitative analyses.

66

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 4: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

Table 4.21nter-rater reliabilities for the ratings in the face-to--face mode

Rating element Task a   Exact   A(lj acent   Totalb

agreement%agreement%agreement%

Grammar

Vocabulary

Fluency

Pronunciation

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

2304454077777676

60.8

65.8

54.4

60.8

68.4

67.1

65.8

55.7

39.2

31.6

45.6

39.2

31.6

29.1

34.2

41.8

100

97.4

100

100

100

96.2

100

97.5

N・te. N-79. apears・n c・rrelati・n c・efficient between the tw・ratings. bT・tal

agreement percent is the sum of the exact and a(lj acent agreement percent.

4.1.2  0rder effect

     Prior to comparing raw scores awarded to the two modes, order effect was

examined to address the concem about the counterbalanced research design that was

adopted in this study. Table 4.3 presents the means and standard deviations fbr the

average ofelement scores across the two tasks and total scores by mode and order. The

means oftest scores in Table 4.3 showed that the two groups assigned to different test

orders seemed to perf()rm differently acro ss the two modes. The different trends for the

two groups were most evident in the total scores. That is, fbr the group who took the

computer mode first, the total score was much higher in the face-to-face mode(M=

9.05,5D=2.44)than that in the computer mode(M=7.68, SD=2.58). However, total

scores fbr the other group indicate only a small discrepancy between the computer mode

(ルf=7.55,SD=2.30)and the face-to-face mode(M=7.59,5D=2.55). It seems that a

practice effect occurred for the group who took the computer mode first.

67

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 5: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

Table 4.3 Descriptive statistics ofscores by mode and order

Rating element/Order Computer Face-to-face

M SD M SDGrammar         C→F

         F→C

Vocabulary         C→F

         F→C

Fluency         C→F

         F→C

Pronunciation

         C→F

         F→C

Total score

         C→F

         F→C

1.90

1.83

2.06

1.98

1.79

1.89

1.94

1.84

7.68

7.55

0.72

0.60

0.70

0.63

0.62

0.70

0.67

0.54

2.58

2.30

2.19

1.86

2.48

2.04

1.99

1.70

2.39

1.99

9.05

7.59

0.67

0.70

0.73

0.71

0.67

0.60

0.55

0.69

2.44

2.55

Note. C→F:Computer test first/face-to・・face test second;F→C:Face-to-face test

first/computer test second.

     In order to test whether the practice effect was statistically significant, repeated

measures ANOVAs were carried out on element scores across tasks and total score.

Table 4.4 shows that the mode-by-order interactions were statistically significant on all

types ofscores. There were also significant main effects of mode on the element scores

except fbr fluency and total score. However, in this case, the interpretation of

interactions between order and mode should take precedence over the main effect.

Table 4.4 ANOVA results ofscores by mode and order

Rating element Mode Order Mode*order

F F FGrammar

Vocabulary

Fluency

Pronunciation

Total score

1  14.62  .00

1  21.22  .00

1   0.01  .92

1  41.95  .00

1  25.04   .00

1.83 .18

3.13 .08

0.47 .49

3.67 .06

2.22 .14

 9.31 .00

12.04   .00

16.71  .00

11.09   .00

22.31   .00

68

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 6: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

4.1.3 Comparing test scores

     Given the significant interaction between delivery mode and test order, it was

decided not to comi)ine the data from the two administrations of the tests in different

orders. Instead, in order to compare the magnitude of test scores across modes,

independent t-tests were conducted separately on the scores of the first and the second

test administered to the examinees.

     Table 4.5 presents the results ofthe t-test on the first test. As shown in Table 4.5,

fbr the narrative task, the means of all the element scores in the computer mode were

slightly higher than those in the face-to-face mode. On the other hand, the opinion task

showed an opposite trend;that is, the means ofall the element scores were higher in the

face-to-face mode. For the total score, the mean was higher in the computer mode(M=

7.68)than in the face-to-face mode(M=7.59). Further, all the differences in the means

of element scores and total scores between the two modes were small, being no greater

than O.21.

     The t-test results confirmed that none of the differences in the means of element

scores and total scores between the two modes were significant. This indicates that fro m

the data of the first test administered, delivery mode did not make a difference on the

magnitude ofexaminees’ test scores in terms ofeither element scores or total score.

69

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 7: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

Table 4.5 lndependent t-test results ofscores ofthe first test administered acro ss modes

Rating element TaskComputer(n=41)

Face-to-face

(n=38)t

M SD M SDGrammar

Vocabulary

Fluency

Pronunciation

Total score

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

2.05

1.74

2.21

1.91

2.01

156

2.05

1.83

7.68

0.81

0.71

0.77

0.77

0.79

0.54

0.68

0.71

2.58

1.86

1.87

2.05

2.03

1.80

1.59

2.05

1.92

7.59

0.76

0.72

0.72

0.75

0.63

0.62

0.75

0.69

2.55

1.09

-0.77

0.92

-0.65

1.29

-O.24

-0.02

・・n.58

0.17

846201867243528958

     On the other hand, the results of the t-test based on the data of the second test

administered revealed a different pattem from that ofthe frrst test. Table 4.6 shows that

there were significant differences acro ss modes in the scores ofgrammar on the opinion

task and hl those of vocabulary and pronunciation on both the narrative task and the

opinion task. The total score was also significantly different across modes. These results

were considerably different丘om tho se of the first test, where no significant difference

was fbund in any type of scores across modes. The disparity of the results seems to

provide evidence fbr the concern about the interaction between the delivery mode and

test order. Thus, to evaluate the effects ofthe delivery mode on the magnitude ofthe test

score, it was decided to only use the results from the analysis ofthe frrst test, which are

considered to be more valid, being without the contamination ofthe order effect.

70

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 8: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

Table 4.6 lndependent t-test results of scores ofthe second test administered acro ss

        modes

Rating element TaskComputer(n=38)

Face_to_face

(n== 41)t

M 5D M SDGrammar

Vocabulary

Fluency

Pronunciation

Total score

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

1.91

1.75

2.12

1.84

2.08

1.71

1.97

1.71

7.55

0.62

0.70

0.70

0.71

0.81

0.74

0.52

0.64

2.30

2.15

2.23

2.51

2.45

2.10

1.89

2.40

2.38

9.05

0.74

0.73

0.79

0.76

0.78

0.70

0.57

0.62

2.44

155

2.97

2.34

3.66

0.10

1.11

3.48

4.69

2.82

302027001100092000

4.2 Comparing psychometric constructs

     One ofthe purposes ofthe present study was to investigate the effect ofcomputer

delivery mode on the underlying constructs in comparison to the face-to-face mode. To

this end, a series of exploratory factor analyses were perf()rmed to explore statistically

whether there are co mponents that are shared in common by the mono lo gic tasks

delivered in the two modes. In this section, first, the results ofchecking the assumptions

of the exploratory factor analysis are presented, and then the results of the exploratory

factor analyses are reported.

     Given that analysis in the previous section revealed a practice effect, to deal with

this proble叫the original data was transfbrmed by subtracting it from the mean scores

on each variable fbr each group assigned to the different test廿lg orders. For reportmg

purposes, all the eight variables fbr each mode used in the fbllowing analyses were

71

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 9: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

assigned labels as shown in Table 4.7. For example, CGN refers to the grammar score

f()rthe narrative task of the computer mode, and FVO represents the vocabulary score

for the opinion task ofthe face-to-face mode.

Table 4.7 Measured variables used in factor analysis

Measured variable Task Label

Computer〃mode

  Grammar scores in

  Vocabulary scores in

  Fluency scores in

  Pronunciation scores in

Face-to-face〃mode

  Grammar scores in

  Vocabulary scores in

  Fluency scores in

  Pronunciation scores in

Narrative

QpinionNarrative

QpinionNarTative

QpinionNarrative

Qpinion

Narrative

QpinionNarrative

QpinionNarrative

OpinionNarTative

Qpinion

CGNCGOCVNCVOCFNCFOCPNCPO

㎝GOWW日m州mFFFFFFF

4.2.1 Preliminary data analyses

     Table 4.8 presents descriptive statistics ofall the variables. They were computed

to check the assumptions ofthe exploratory factor analysis.

     Univariate normality of the 16 0bserved variables was assessed through

examination ofthe skewness and kurtosis fbr each variable. As seen in Table 4.8, none

of the observed variables deviated from normality, with values fbr the skewness and

kurtosis within an acceptable range of-2 to 2. The Pearson product-moment correlations

72

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 10: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

among all the variables were calculated(see Appendix I). The correlation figures ranged

from 56 to.84, indicating that all the variables correlated fairly well with each other

and none of the correlation coefficients were particularly large. This suggests that

multicollinearity is not a problem fbr the present data. Univariate outliers were also

examined;no subject was fbund to be a univariate outlier(z<-30r・z>3)fbr all the

variables.

Table 4.8 Descriptive statistics for all variables

Variable Min Max M SD Skewness Kurtosis

CGNCGOCVNCVOCFNCFOCPNCPOFGNFGOFVNFVOFFNFFOFPNFPO

25

S6

ノ〉と)旋≧.1V=79.

4.2.2  EXPRoratory factor analyses

     First, an exploratory factor analysis, using the principal axis method(Principal

factor analysis)and a varimax rotation pattem, was carried out to explore the number of

factors underlying the eight observed variables fbr the computer mode. The solution

73

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 11: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

revealed only one factor. The factor was produced based on eigenvalues greater than 1.O

as shown in Table 4.9. The scree plot(see Figure 4.1)confirmed the mea血1gfUhless of

the factor. The factor accounted fbr about 75%of the total variance. As seen in Table

4.10,the magnitudes ofall the factor loadings were substantial, ranging丘om.82 to.90.

This suggests that all the eight variables, which represent fbur rating elements in each

task, are reasonably good indicators ofthis factor.

Table 4.9 Exploratory factor analysis o f data from the computer mode

Factor Eigenvalue Percentage of variance Cumulative percentage12345678

6.04

0.74

0.41

0.30

0.17

0.17

0.10

0.08

75.45

9.23

5.14

3.73

2.12

2.09

1.29

0.95

75.45

84.68

89.82

935595.67

97.76

99.05

100.00

4u肩きω

                   1’_s45678                               Factol lmmber

            Figure 4. l Scree plot for data from the computer mode

Table 4.10 Factor loading for the data from the computer mode

74

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 12: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

Variable Factor loading

CGNCVNCFNCPNCGOCVOCFOCPO

7748992088888889

     Aprincipal factor analysis with varimax rotation was also conducted for the

face-to-face mode, and the results obtained were similar to those fbr the computer mode.

As indicated in Table 4.11, the results showed that only one factor with eigenvalues

greater than l.O was extracted. The scree test also suggested one factor(see Figure 4.2).

Table 4.11 shows that about 77%of the variance was explained by this factor. Table

4.12 presents the factor loadings ofthe variables, which demonstrate that the factor was

well defined by the variables since factor loadings were high within a range of.83

to.91.

Table 4.11Exploratory factor analysis o f data from the face-to-face mode

Factor Eigenvalue Percentage ofvariance Cumulative percentage

6.12

0.62

0.43

0.25

0.19

0.16

0.13

0.09

76.54

7.70

5.36

3.18

2.42

1.96

1.66

1.19

76.54

84.24

89.60

92.77

95.19

97.16

98.81

100.00

75

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 13: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

ω肩≧ω置

        1-d,45678                   FactOi’ ntvanb er

Figure 4.2 Scree plot for data from the face-to-face mode

Table 4.12 Factor loading for the data from the face-to-face mode

Variable Factor loading

㎝W“日G。vomp。FFFFFF

.87

.90

.86

.85

.90

.91

.83

.86

     The analyses to this point revealed that both modes seemed to measure only one

factor. Further, given that factor loading re flects the portion of the total variance that

each variable contributes to the factor, a comparison o f the factor loadings o f 8 variable s

across modes shows that each pair of observed variables generally has equivalent

loading on the factor.

76

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 14: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

     In order to support the above analyses fbr the two modes, another principal factor

analysis with varimax rotation was perf()rmed with the 16 observed variables o f both the

computer mode and the face-to-face mode. Again, the solution produced only one

component on the basis of eigenvalues greater than l.0(see Table 4. 13), which was

confirmed by the scree plot in Figure 4.3. This single factor accounted f()r most of the

total variance(71%). As can be seen in Table 4.14, all the variables loaded highly on

the factor, ranging from.78 to.88. Further, they seemed to contribute similarly to the

major component with similarly high values of factor loadings when coml)ared across

modes.

     Taken together, the results described in this section indicate that monologic tasks

delivered in the computer mode and the face-to-face mode seem to measure the same

psychometric construct.

Table 4.13 Exploratory factor analysis o f combined data from both modes

Factor Eigenvalue Percentage ofvariance Cumulative percentage

11 71.12

5.41

4.97

3.79

2.78

2.55

2.03

1.40

1.17

1.07

0.92

0.83

0.62

0.52

0.42

0.40

71.12

76.53

81.50

85.29

88.07

90.62

92.65

94.05

95.22

96.29

97.21

98.04

98.65

99.17

99.60

100.00

77

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 15: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

・肩旨u躍

            1  _  ⊃  4  5  6  7  8  9 10 11 1⊃ 13 14 15 16

                     Factor ntu lb er

      Figure 4.3 Scree plot f()r the combined data fro m both modes

Table 4.14 Factor loading for the combined data from both modes

Variable Factor loading

ぽ㎝㎝㎜㎝σ℃σ℃蕊蕊㎜㎜㌶

78

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 16: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

4.3 Comparing speech samples

     In this section, reliabilities ofthe codings of speech samples between two coders

are first reported. The results of grouping examinees according to their scores on

computer-delivered tasks are then introduced. Finally, the results of comparing speech

samples between the two modes are presented.

4.3.1  1nter-coder reliability

     The inter-coder reliability was examined through agreement between the codings

from the two coders. Table 4.15 summarizes the inter-coder reliabilities for all the

coding units. As can been seen in Table 4.15, the achieved levels were high in almost all

cases, ranging fヒom 87%to 99%.

Table 4.15 Summary o f inter-coder reliability

Category Coding units Inter-coder agreement(%)

Fluency

Accuracy

Complexity

Speech time

Length ofpause time

Unfilled pauses

Filled pauses

Words of repetition

Words of self-correction

Words of false starts

Total words

AS-unit

㎞dependent clause

Subordinate clause

ErTors

TypeTokenGrammatical words

Lexical words

High-frequency lexical words

Low-frequency lexical words

00000000

ハUOOO

000000

8799977798999989

つ」88∠0

79

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 17: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

4.3.2 Grouping of examinees based on test scores

     In order to examine a possible interaction effect between language proficiency

and delivery mode, participants were categorized into three groups based on their total

score on computer-delivered tasksl7. As shown in Table 4.16, those who scored over

66.6%were assigned to the high proficiency group(n==26;、ルf=9.69), and those who

scored between 66.6%and 33.3%were assigned to the middle proficiency group(n=

26;、M=15.17). The rest were assigned to the low proficiency group(n=27;M=

20.63).The results ofaone-way ANOVA revealed a significant effect f()r the placement

in a proficiency group,、F(2,76)=227.82, p<.00. Post hoc analyses(Tukey)indicated

that each group was significantly different in their total scores at p<.00.

Table 4.16Descriptive statistics fbr proficiency groups

Pro ficiency group M 5D Min MaxLowMidHighTotal

9.69

15.17

20.63

15.23

1501.27

2.54

4.87

8.00

13.00

17.50

8.00

12.00

17.50

26.50

26.50

4.3.3 Effect of delivery mode and interaction With proficiency

     In order to examine the effect ofdelivery mode on examinees’speech sample and

interaction ofdelivery mode with examinees’proficiency, repeated measures ANOVAs

were conducted. The results were presented in the respect of fluency, accuracy, and

cornplexity.

17fiven that the GTEC fbr STUDENTS is a computer-delivered test and the correlation figure between

 the total scores for the computer and face-to-face version was quite high(r=.95), it was decided to use

 the total scores from the computer version.

                                   80

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 18: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

4.3.3.1  Fluency

     Table 4.17 displays the means and standard deviations for fluency measures by

delivery mode and proficiency level. As can be seen in Table 4.17, fluency is higher in

computer-delivered monologic tasks fbr measures of speech rate and dysfluent words,

whereas it is higher in the face-to-face mode regarding filled pauses and unfilled pauses.

Table 4.17 Descriptive statistics o f fluency measures by mode and proficiency

   Measure

(per 60 seconds)

Computer Face_to_face

Pr(~f M SD M SDNo. of words

No. ofunfilled pauses

No. of filled pauses

No. of dysfluent words

No. ofrepetition words

No. of self-correction words

No. of false start words

㎞隠M㎞㎜麗㎞㎜盟㎞㎜鵠㎞嚇鵠㎞堀鵠㎞醐盟 51

U4

V9

U5

16

P0

P3

P7

S」

47

U0

V8

U2

12

P3

Q0

Q0

Note.」Prof=Proficiency groups.

81

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 19: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

     Table 4.18 summarizes the statistics o f measures of fluency by means of the

repeated measures ANOVA. As shown in Table 4.18, a main effect of delivery mode

was found fbr three measures, and the results were somewhat mixed. There were

significant differences in the number o f dysfluent words per 60 seconds(F(1,2)=15.16,

p<.00)and the number ofrepetition words per 60 seconds(F(1,2)=16.05, p<.00).

That is, examinees used more repetition words in face-to-face monologic tasks than in

those delivered by coMI)uter. This means that examinees were more fluent with the

computer mode than with the face-to-face mode. A significant difference was also

observed for the measure of filled pauses per 60 seconds(F(1,2)=7.55,p<.01)but in

the oppo site direction. That is, examinees used more filled pauses in computer-delivered

tasks, indicating that they were more fluent in the face-to-face mode. There was no

significant hlteraction effect between delivery mode and proficiency level, suggesting

that fluency of examinees’speech produced with the two modes was not affected

differently by their proficiency.

Table 4.18 ANOVA results for fluency measures

   Measure

(per 60 seconds)

Mode Level Mode*level

F F FNo. ofwords

No. ofunfilled pauses

No. of filled pauses

No. ofdysfluent words

No. ofrepetition words

No. of self-correction words

No. of false start words

2.14

0.11

7.55

15.16

16.05

2.83

2.06

222222(∠

38.46

36.33

0.92

0.77

1.35

0.27

0.05

2222222

0.16

0.48

1.69

0.32

0.14

0.41

1.26

82

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 20: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

4.3.3.2 Accuracy

     Accuracy was examined in terms oftwo measures:ratio of error-free clauses and

ratio of error-free AS-units. Table 4.19 presents the means and standard deviations fbr

accuracy measures by delivery mode and proficiency level. According to Table 4.19,

the face-to-face tasks yielded a slightly higher accuracy on both measures. However, the

results of the repeated measures ANOVAs failed to show these differences to be

statistically significant(see Table 4.20). This means that linguistic accuracy was not

affected by the delivery mode. In addition, no interaction effect between delivery mode

and proficiency level was statistically significant. This demonstrates that linguistic

perf()rmance across modes was not different in the aspect of accuracy among the three

proficiency groups.

Table 4.19Descriptive statistics ofaccuracy measures by mode and proficiency

Measure Computer Face_to .. face

Prof M SD M SDPercentage of error-free clauses

Percentage of error-free AS-units

LowMidHighTotal

LowMidHigh

Total

0.28

0.47

0.65

0.47

0.18

0.35

0.54

0.36

0.21

0.19

0.19

0.24

0.19

0.21

0.21

0.25

6679667922272227

0.28

0.54

0.63

0.49

0.19

0.44

0.56

0.40

0.24

0.19

0.18

0.25

0.23

0.19

0.18

0.25

6679667922272227

N()te.」Pr(’f=Proficiency groups.

Table 4.20 ANOVA results for accuracy measures

Measure Mode Level Mode*level

F F FPercentage oferror-free

clauses

Percentage of error- free

AS-units

1  0.51  .48

1  2.69  .10

230.05.00

2  30.41  .00

〔∠(∠

1.50

1.13

?」つJ〔∠2」

83

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 21: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

4.3.3.3 Comple】dty

     Complexity was measured in terms of both syntactic complexity and lexical

coMI)lexity. Table 4.21 shows descriptive statistics fbr complexity measures by delivery

mode and pro ficiency level. For the measures ofsyntactic complexity, the means for the

coMI)uter mode were slightly higher than those fbr the face-to-face mode. Lexical

complexity also increased in the computer mode but only with respect to Guiraud’s

Index, while two other measures oflexical density went in the opposite direction.

Table 4.21 Descriptive statistics ofcomplexity measures by mode and proficiency

Measure Computer Face_to-face

Prof M SD M SDPercentage of clauses

Percentage of subordinate clauses

No. ofwords

Guiraud’s Index

Lexical density

Weighted lexical density

㎞㎜鴇㎞隠霊㎜鵠㎞㎜麗㎞隠M㎞㎜麗

ノVと)te.」Pr(~∫=Proficiency group s.

84

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 22: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

     As shown in Table 4.22, the repeated measures ANOVAs revealed these

differences not to be significant in terms ofthe main effect ofthe delivery mode. That is,

there was no significant difference in syntactic complexity and lexical complexity o f the

language produced in the two modes. Again, no significant interaction effect between

proficiency level and delivery mode could be established. This implies that examinees

at different proficiency levels did not perform differently in terms of linguistic

complexity across modes.

Table 4.22 ANOVA results for complexity measures

Measure Mode Level Mode*level

F F FPercentage of clauses

Percentage of subordinate

clauses

No. ofwords

Guiraud’s Index

Lexical density

Weighted lexical density

0.62

0.01

1.16

0.07

1.67

1.94

(∠(∠ 

35.17

27.15

40.33

50.59

11.77

9.42

ハUO 

OハUOO

OハU 

OOO∩U

(∠(∠ 

0.53

1.96

0.14

0.25

1.73

1.83

く∨- 

4.4 Comparing examinee attitudes

     In order to explore the face validity o f the computer-delivered speaking test from

the examinees’perspective, two questionnaires, described in Section 3.2.3, were

administered immediately after the two tests were completed. The questionnaires aimed

to collect infbrmation in two areas:general attitudes toward speaking tests delivered in

the computer and the face-to-face modes(Questionnaire l)and a direct comparison of

the two modes(Questionnaire 2)(see ApPendix E and ApPendix F fbr the

questionnaire s).

85

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 23: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

4.4.1 Examinee attitudes toward the two modes

     Table 4.23 presents the means and standard deviations for the five statements in

Questionnaire l on examinee attitudes and perceptions regarding the computer mode

and the face-to-face mode. As shown in Table 4.23, mean scores for examinees’

responses were all above 3, except for those on favorableness(Q4)fbr the computer

mode(M=2.95), indicating that examinees generally showed agreement with the

statements fbr both modes. Specifically, examinees reported that they felt nervous on

both tests. They considered both tests to be difficult but fair. They held a slightly neutral

position toward the computer mode but showed favorable attitudes toward the

face-to-face mode. Finally, they perceived both tests to be accurate measures of their

spoken English.

Table 4.23 comparative results on Questionnaire 1

Statement Computer Face_to_face t

M SD M SD

Ifelt nervous when I was taking the test.

1 feel this test was difficult.

Ifeel the test was fair.

Iliked the format ofthe test.

The test reflects accurately how well I

speak English.

3.13

3.68

3.57

2.95

3.14

1.23

1.07

0.89

1.05

0.96

3.56

3.82

3.76

3.16

3.40

1.17

0.99

0.83

0.98

0.92

2.78**

1.49

1.97

2.20*

2.32*

64./0106

No te.*p<.05;**liP<.01.

     In order to evaluate differences in examinees’responses to the statements

regarding the two modes, a dependent t-test was carried out. The results in Table 4.23

revealed that there were significant differences in examinees’responses to three

statements. That is, examinees reported a lower level of nervousness in the computer

86

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 24: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

mo de(M=3.13,5Z)=1.23)than in the face-to-face mo de(M=3.56,5D=1.17). The

computer mode was viewed as less favorable(M=2.95,5D=1.05)than the

face-to-face mode(ル1=3.16,5D=0.98)and less accurate in reflecting the廿English

speaking level(M=3.14,5D=0.96)than the face-to-face mode(M=3.40, SD=0.92).

However, the two modes were not fbund to be significantly different in test difficulty

and test fairness.

4.4.2 Direct comparisons of the two modes

     Questio皿aire 2 aimed to gather infbrmation enabling a direct co叫)arison of

examinees’attitudes and perceptions conceming testing speaking in the computer mode

and the face-to-face mode. Table 4.24 presents the portion of the examinees that chose

each ofthree options hl the six questions.

     Chi-square tests were performed to statistically test the difference in percentages

ofexaminees that chose each option. The results in Table 4.24 showed that the observed

frequencies differed from the expected frequencies at a statistically significant level fbr

all the questions except that on test difficulty. Specifically, more frequently than

expected, examinees preferred the face-to-face mode and fbund it more favorable and

more valid though also more nerve-racking. As fbr Question 2, examinees did not find

the tests in the two modes to be significantly different in difficulty. Regarding Question

3,the results revealed that significantly more examinees than expected considered the

fairness of the two tests to be the same. overall, the results of Questio皿aire 2

corroborate those ofQuestionnaire 1.

87

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 25: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

Table 4.24 comparative results on Questionnaire 2

Question

         Face-to-  BothComputer           face  the same  (%)           (%)  (%)

ρ

1Which test did you feel more nervous

 taking?

2 Which test did you find more difficult?

3 Which test did you feel was fairer?

4Which test do you like better?

5Which test do you think re且ected your

  English level more accurately?

6 Which type oftest do you prefer to take

  in the fUture?

19.1

26.3

34.1

30.9

11.6

30.2

51.1

31.6

10.6

49.0

50.5

61.6

29.8

42.1

55.3

19.1

37.9

8.2

14.89* 94

3.68

25.51*

13.68*

22.51*

554.50/80/0ノ

37.28* 86

Note.*P<.05.

     In the following, responses to each open-ended question in Questio皿aire 2 are

categorized by means ofcontent analysis, and the results f()r each question are reported

with examples ofcomments from examinees.

ρ1.JW2ic乃test・di吻oufeel〃iore nervous taking2

     As can be seen in Table 4.24, a majority of the examinees(51.1%)felt more

nervous in taking the face-to-face test, while a minority(19.1%)perceived the computer

mode to be more anxiety-provoking. Table 4.25 presents the percentage of detailed

comments given飴r each option.75%of those who gave comments on choosing the

飴ce-to-face mode attributed the仕higher level of anxiety to the presence of the

mterviewer. In contrast, conrments from tho se who cho se the co卿uter mode focused

primarily, and surprisingly, on the facilitative role of reactions from the interviewer as

opposed to the one-sided nature of the computer mode(57%). The fbllowing are some

88

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 26: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

examples ofthe comments from the examinees:

a.During the face-to-face test, I cared about the expressio n o f the interviewer and

  worried about whether I was speaking well or not. But when taking the

  computer-delivered speaking test, I don’t need to face an interviewer directly,

  which made me feel quite relaxed.(Face-to-face)

a.Ifelt assured during the face-to-face test when the interviewer gave reactions

  with eye contact and backchannels. However, when there was no reaction丘o m

  computer,1 felt somewhat tense.(Computer)

a.Rather than to the computer,1 prefer speaking in front o f a human being, since

  I felt so mehow she could understand what 1 was talking about.(Computer?

Table 4.25 summary o f comments on nervousness(Q 1)

ReasonsFrequency(N == 68)

Percentage   (%)

Computer〃mode

 a.no reactions from computer

 b.being the first test having taken

 c.no oPPortunity to clarify unclear instructions

 d.distracted by the timer on the screen

 e.concemed with quality ofrecording with computer

Both theぷame

  a.having the same task contents

  b.no confidence in English speaking ability

Face-to-face〃mode

  a.the presence of the interviewer

 b.being the first test having taken

  c.unable to concentrate on practice

  d.unable to practice loudly

28

3

37

1

〔∠-

8弓111

2

41

4

54

57

Q5V74

713/0つ⊃

く」0ノ『11且

89

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 27: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

ρ2.〃hich test did you∫〕nd〃zore 4頒cμ1τ2

According to Table 4.24,42.1%ofthe examinees thought that the two tests were

not very different in difHculty. Table 4.26 indicates that among examinees who

commented on this question, those who chose“Both theぷαme”gave as their main

reason the fact that the content and procedure of the two types of tests were the same

(86%).Here are some example s ofthe comments:

a.Since the format ofthe two t)?es oftests and instnlctions廿om the interviewer

  and the computer were all the same,1 did not feel any difference in difficulty o f

  the tests.(Both the 50〃1(…ソ

a.There is no difference jn task contents                                  ,

  does not differ.(Both theぷαη2Cノ

so I feel the difficulty ofthe test itself

Table 4.26 summary o f comments on test difficulty(Q2)

ReasonsFrequency

(N=60)

Percentage

   (%)

Computer mode                                  20

  a.feeling nervous without any reactions from computer

  b.being the first test having taken

  c.unable to ask questions

  d.unfamiliar with recording their voices on a microphone

  e.unconfident to communicate with voice only

f. feeling unmotivated to perform better

Both the same                              21

  a.having the same task contents

  b.no confidence in English speaking ability

Face-to-face mode                           19

  a.more anxiety-provoking

  b.being the first test having taken

  c.less control ofthe pace of test taking

002」1

1

33

35

32

0く」ハU

/04.QO11

06/010

ρ3.〃hich test didyoufeel wasfairer2

Table 4.24 shows that most ofthe examinees(55.3%)chose“Both the same”fbr

90

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 28: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

fairness ofthe two types of speaking test. Also, it should be noted that about three times

as many(34.1%vs.10.1%)fe lt that the computer mode was fairer. Table 4.27

summarizes the percentage ofdetailed comments given for each option. As indicated in

Table 4.27, the fact that the test contents and procedures were the same across modes

was the main reason examinees chose“Both the same”(87%). In addition, examinees

mentioned the absence of influence by the interviewer in the computer test as the

primary reason for its fairness(72%). The detailed comments are as fo llows:

a.No matter which type of speaking test it is, since the preparation and response

  time, the content of the tasks, and test conditions were the same, I thmk the

  飽irness were the same.(Both theぷ醐く劾

a.Ithink some interviewers may make the examinee feel nervous or relaxed. So,

  the atmosphere of the interviewer may well influence how the examinee

  performs in the face-to-face test.(Computer?

a.The interviewer looked quite kind, so I was pretty relaxed during the

  face-to-face test. But when I took EIKEN in the past, the interviewer at that

  time looked quite harsh and unfHendly. So I was quite scared and couldn’t

  speak well at that time.1℃omputer?

Table 4.27 summary o f comments on test fairne ss(Q3)

ReasonsFrequency  Percentage(ノV=48)     (%)

Computer mode a.no influence ofthe interviewer

 b.administration under the same conditions

 c.less anxiety-provoking

Both the same

  a.having the same test contents and test procedures

  b.feeling no anxiety on both tests

Face-to-face〃zode

  a.having oPPortunity to talk to real people

  b.possible to clarify instructions with the interviewer

25

16

7

8 4.31

421

く∨〔∠

52

33

15

71111

丹13811

-且0/7’〔∠

91

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 29: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

ρ4.〃乃ich test did you 1’んθZ)et旋ir2

     Table 4.24 indicates that almost half of the students(49%)reported that they

liked the face-to-face test better than the computer test, while 30.9%favored the

computer test. As can be seen in Table 4.28, comments regarding the affective appeal o f

the face-to-face test f()cused primarily on the fb llowing:(a)it felt natural to talk in the

presence of the interviewer(27%);(b)it was less anxiety-provoking(20%);(c)it was

similar to a real communication situation(20%);(d)it was pleasant to talk with people

(16%);(e)it was po ssible to clarify unc lear instructions with the interviewer(11%);and

(りthey felt motivated to use facial expressions and gestures to communicate(7%).

Tho se who cho se the comr)uter mode gave the fact that it was le ss anxiety-provokhlg as

the main reason(54%). They also mentioned better concentration(12%)and better

control ofpace of test-taking in the co卿uter mode(12%). Comments from examinees

shed some light on these丘ndings:

     a.V》hen I speak㎞」丘ont ofaperson, I feel like talking to that person. So, I feel I

       could speak naturally m the face-to-face test.(Fac(ヲィo吻c¢ノ

     a.Ilike the face-to-face test because I fbel someone is listening to what I am

       talking about. On the contrary, I feel lonely in the computer test.(Face-to-fac¢ノ

     b.Ifbel more relaxed when I talk to someone than when I talk to a machine.

       (Face-to-face?

     e.Ithmk it is practical to test how we speak when someone is present. Personally,

       Idon’t like taking a test on the computer because 1 feel that computer is easy to

       break down. But in the face-to-face test, the interviewer can deal with problems

       that may happen during the test. So 1 feel it is more flexible.(Face・・to-face?

     e.In the face-to-face test, when there is anything I don’t understand or I want to

       clarify, I could ask questions. The intcrviewer could help me cope with the

       situation.(Fa ce-to-face?

92

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 30: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

fWhen the interviewer is present,1 feel motivated to use non-verbal means such

  as facial expression and gesture to express what 1 want to say.(Face-to-face?

a.Ife lt quite nervous in the face-to-face test. But in the cornputer-delivered

  speaking test, since I don’t need to talk in front of someone, I could keep calm

  and speak as usual.(Computer)

b.During the computer test, there was no one around. So compared with the

  face-to-face test, I was able to concentrate on thinking and giving responses to

  the tasks.(Computer?

c.Icould give answers to the questions on my own pace in the computer test.

  (Computer)

Table 4.28 summary o f comments on affective apPeal ofthe test modes(Q4)

ReasonsFrequency

(1V=78)

Percentage   (%)

Computer〃mode

 a.less anxiety-provoking

 b.better concentration on thinking and responding

 c.better control ofthe pace of test taking

 d.being a fair test

 e.being able to practice loudly

 fgetting used to working on the computer

 9.test-like

 h.having spare time during the tasks

 i.not getting shy

26

41

33

54

P2

P2S44444

Both the same

  a.having the same task contents

  b.having both advantages and disadvantages

74.「づ

9弓1354.

Face-to-face〃mode

  a.feeling natural to talk in front of the interviewer

  b.less anxiety-provoking

  c.similar to the real communication situation

  d.pleasant to talk to real people

  e.possible to clarify instructions with the interviewer

  fmotivated to use expression and gestures

4512

X9753

587’0061

93

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 31: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

ρ5.Mtich teぷt do you t励k reflecte吻o〃γEη91」ぷ〃evel〃20re accuratelγ2

As shown in Table 4.24, half of the students(50.5%)considered the face-to-face

test as giving better representation of their actual speaking English level, whereas only

11.6%chose the coml)uter mode. Table 4.29 presents the percentage ofcomments given

for each option. According to Table 4.29, tho se who gave comments on choosing the

face-to-face mode mainly believed that it was similar to a real communication situation

(55%)and it was less anxiety-provoking(16%). Here are some examples of the

comments:

a.In the situation of speaking English, it usually involves conversation between

  people. So, I think the face-to-face test, which is more similar to real

communication than the computer test can measure my English speaking

  ability more accurately.(Face-to-face)

a.When we actually speak, there are always other people present to listen to us or

  talk to us. 1 think it is meaningless to talk to a computer.(Face-to-face?

Table 4.29 summary o f comments on test validity(Q5)

ReasonsFrequency  Percentage(ノV『=54)     (%)

Computer〃mode

 a.less anxiety-provoking

 b.no influence of the interviewer

 c.test_like

Both theぷame

  a.having the same task contents

  b.having both advantages and disadvantages

.Face-to三face〃iode

ahCdef9 similar to the real communication situation

less anxiety-provoking

possible to clarify instructions with the interviewer

feeling natural to talk in front of the interviewer

nonverbal performance should also be evaluated

motivated to talk more

providing good extent of pressure to concentrate

11

5

38

弓づ(∠

2

20

9

70

つ⊃00  9一/-1

0(U/04.

く∨/0く」-

94

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 32: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

ρ6.〃hich type of test would)ノou」prefer to take in theノ’uture 2

     Table 4.24 revealed that two-thirds of the examinees(61.6%)would prefer the

face-to-face test when given a choice, while only 30%of the students chose the

computer test. As indicated in Table 4.30, tho se who preferred the face-to-face test gave

the fbllowing reasons:(a)they felt less nervous(31%);(b)it was similar to real

communication(29%);and(c)it was pleasant to talk to real people(17%). Interestingly,

those who chose the cornputer test also mentioned feeling less anxious as the primary

reason(77%). Specific comments included the fo llowing:

     a.Since I felt quite calm in the face-to-face test, I thmk this type of speaking test

       SUitS me Well.(Face-to-face?

     b.Although 1 felt a little nervous in the face-to-face test, I think it is similar to the

       actual communication situation where we need to talk to native speakers. Also,

       taking the test seems to be good practice. That’s why 1 prefer the face-to-face

       test.(‘Face-to-face)

     b.Although in the daily life, we have a chance to speak English in front of

       someone, we seldom need to talk to a computer. So the face-to-face test seems

       to be the more natural one.(Face-to-face?

     a.If the only purpose of taking the test is to pass it, I would prefer the

       cornputer-delivered speaking test because I didn’t feel very nervous during the

       test.(Compute1ジ

     a.Since I felt very nervous du血g the face-to-face test, I prefer the computer test

        in which I could perfbrm to the best as I usually do.(Comρuter?

95

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

Page 33: RESULTS - Tokyo University of Foreign Studiesrepository.tufs.ac.jp/bitstream/10108/51459/11/dt-ko-0106009.pdf · results of analyzhlg the comparability ofraw ... Specifically, examinees’responses

Table 4.30 summary o f comments on preference ofthe test modes(Q6)

ReasonsFrequency  Percentage(ノVF=72)     (%)

Computer〃zode                                  22

 a.less anxiety-provoking

 b.being a fairer test without the interviewer’sinfluence

 c.because he(she)is good at it

 d.better concentration

Both the same

  a.having both advantages and disadvantages

Face-to-face〃mode

  a.less anxiety-provoking

  b.similar to the real communication situation

  c.pleasant to talk to real people

d.motivated to strive for better performance

  e. possible to clarify instructions with the interviewer

  f. concerned with quality of recording with computer

9.getting used to the face-to-face test format

2

48

1

2

31

3

67

77X95

100

31

Q9P7

W644

96

東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)