copyright 1988 andrew h. morris

SUPPORTING ENVIRONMENTAL SCANNING AND ORGANIZATIONAL

COMMUNICATION WITH THE PROCESSING OF TEXT:

THE USE OF COMPUTER-GENERATED ABSTRACTS

by

ANDREW H. MORRIS, B.S., M.B.A.

A DISSERTATION

IN

BUSINESS ADMINISTRATION

Submitted to the Graduate Faculty of Texas Tech University in

Partial Fulfillment of the Requirements for

the Degree of

DOCTOR OF PHILOSOPHY

Approved

May, 1988

ACKNOWLEDGMENTS

The opportunity to undertake a task such as this is a

great privilege; not many people are given the freedom to

spend the time necessary, nor the guidance needed to

achieve it. I have been fortunate in both respects.

I would like to first acknowledge my wife, Rebecca,

who has supported me in many ways during these years, but

especially for allowing me the liberty of many hours

dedicated to this work. Her contribution was vital to the

realization of the goal.

Secondly, I want to express my appreciation to George

Kasper. He has been a good friend as well as a mentor, and

his contributions to this research are both numerous and

substantive. Much of what is good in the dissertation can

be attributed to him.

I would also like to express appreciation to those who

served on my committee or as readers, Drs. Peter Westfall,

Grant Savage, Ritch Sorenson, and Van Wood. Each has been

patient with me, and willingly gave their counsel when it

was needed.

1 1

TABLE OF CONTENTS

ACKNOWLEDGMENTS ii

ABSTRACT vii

LIST OF TABLES ix

LIST OF FIGURES . xi

CHAPTER

I. INTRODUCTION 1

Research Overview 1

Problem Statement 4

Research Objectives 13

Definition of Important Terms 14

Chapter Outline 17

Summary of Introduction 18

II. LITERATURE REVIEW 19

Theory of Inquiring Systems 21

Leibnizian Inquiring Systems . . . . 21

Lockean Inquiring Systems 23

Kantian Inquiring Systems 25

Hegelian Inquiring Systems 26

Singerian Inquiring Systems . . . . 28

Application of Inquiring Systems to Information Systems Research . 29

111

Text-Based Information Systems 31

Computer-Mediated Communication

Systems 33

Document-Based Systems 40

Text-Based Decision Support

Systems 43

Automatic Abstracting 46

Abstracting Concepts 49

Natural Language Processing

Techniques 51

Automatic Extracting Techniques . . 53

Summary of Related Research 57

III. MODEL OF A TEXT-BASED DECISION

SUPPORT SYSTEM 59

Objectives of the System 59

System Processes 60

Singerian Component 66

Automatic Indexing and Abstracting . . . 67

Summary of System Features 69

IV. MODEL VALIDATION 70

Research Model 71

Research Question 74

Research Hypotheses 75

Treatments 79

Full Text Treatment 83

Abstract Treatment 84 Extract Treatments 84 IV

An Extracting Algorithm 85

Experiment Design 91

Dependent Variables 92

Subjects 94

Procedures 95

Summary of Experimental Methodology . . . 97

V. ANALYSIS 99

Overview of Analysis 99

Analysis Related to Comprehension . . . . 106

Main Effects Analysis 107

Multiple Comparisons of Means . . . Ill

Influential Test Items 113

Analysis Related to Reading Time . . . . 119


Multiple Comparisons of Means . . . 122

Analysis Related to Reading Difficulty . 126



Analysis of Information Availability . . 133



Summary of Analysis 139

VI. CONCLUSION 142

Implications of the Experimental Results 142

v

Implications for Text-Based

Information Systems 144

Implications for Organizational Management 146

Implications for Future Research . . . . 147

Limitations of the Research 149

Summary of Conclusions and Final Remarks 152

REFERENCES 153

APPENDICES

A. INSTRUMENTS USED IN EXPERIMENT 164

B. ADDITIONAL DATA TABLES 209

VI

ABSTRACT

This research proposes a model text-based decision

support system designed to support the activities of

environmental scanning and organizational communication by

actively filtering and condensing text. To filter text-

based information requires the use of automatic routing

schemes; to condense text requires the use of computer-

generated abstracts or extracts. A key element in the

model system is the ability of the computer to condense

text by generating short abstracts of documents.

Two approaches to condensing text have been proposed:

(1) using natural language processing techniques to

construct a knowledge base of the document contents, from

which to write an abstract, and (2) employing algorithm-

based extracting systems to generate extracts of important

sentences and phrases. Systems using natural language

techniques are still being researched; most are successful

only in limited domains. Systems using extracting

algorithms have been researched, but have not been applied

to the problem of information overload in an organizational

decision-making context. These two approaches were tested

in a laboratory setting with student subjects.

Vll

An algorithm for generating extracts was developed

based on the combined work of previous researchers, and

tested against an expertly written abstract such as might

be constructed by a non-domain specific artificial

intelligence system if one is developed in the future.

Results of the study indicate that there was no difference

in comprehension of the documents when the information was

presented with the full text, by extract, or by abstract.

These results demonstrate that an algorithm for computer-

generated extracts can be successfully applied to text,

reducing reading time and document length without

significantly reducing comprehension of the information

contained in the original text.

Vlll

LIST OF TABLES

4.1 Order of treatment and passage pairs in the experimental design 93

5.1 Sample means and standard errors by treatment and by passage for four dependent variables . . 102

5.2 Frequency tables for comprehension score results 104

5.3 Main effects analysis for comorehension score . 108

5.4 Bonferroni 95% simultaneous confidence intervals for six pairwise comparisons for treatment comprehension score means 112

5.5 Correct responses to individual test items by treatment 114

5.6 Main effects analysis for reading time . . . . 121

5.7 Bonferroni 95% simultaneous confidence intervals for six pairwise comparisons for treatment reading time means 124

5.8 Number of words and percentage reduction in text by passage 127

5.9 Main effects analysis for reading difficulty . 129

5.10 Bonferroni 95% simultaneous confidence intervals for six pairwise comparisons for treatment reading difficulty scale means . . . 132

5.11 Main effects analysis for information availability 135

5.12 Bonferroni 95% simultaneous confidence intervals for six pairwise comparisons for treatment information availability scale means 138

5.13 Summary of hypotheses tests 141

B.l Fog indices for full text treatment passages . 210

IX

B.2 Comprehension score results by subject for passage A 211

B.3 Comprehension score results by subject for passage B 212

B.4 Comprehension score results by subject for passage C 213

B.5 Comprehension score results by subject for passage D 214

B.6 Comprehension score results by subject across treatment 215

B.7 Comprehension score results by subject across passage 216

B.8 Reading time results by subject across

treatment 217

B.9 Reading time results by subject across passage 218

B.IO Reading difficulty results by subject across treatment 219

B.ll Reading difficulty results by subject across passage 220

B.12 Information availability results by subject across treatment 221

B.13 Information availability results by subject across passage 222

B.14 Mean and standard error by passage controlling for treatment for four dependent variables . . 223

B.15 Mean and standard error by treatment controlling for passage for four dependent variables 224

X

LIST OF FIGURES

1.1 Information flows in management planning

and control 7

3.1 Model of a text-based decision support system . 62

4.1 A behavioral model of CMCS performance . . . . 72

5.1 Plot of residual errors versus predicted values for comprehension score model 110

5.2 Plot of residual errors versus predicted values for reading time model 123

5.3 Plot of residual errors versus predicted values for reading difficulty model 131

5.4 Plot of residual errors versus predicted

values for information availability model . . 136

A.l Experience and background questionnaire . . . . 165

A.2 Instructions displayed by computer program

to subjects 168

A.3 Passage A--full text treatment 169

A.4 Passage A--long extract treatment 171

A.5 Passage A--short extract treatment 172

A.6 Passage A--abstract treatment 173

A.7 Passage A--comprehension test questions . . . . 174

A.8 Passage B--full text treatment 176

A.9 Passage B--long extract treatment 178

A.10 Passage B--short extract treatment 179

A.11 Passage B--abstract treatment 180

A.12 Passage B--comprehension test questions . . . . 181

XI

A.13 Passage C--full text treatment 183

A.14 Passage C--long extract treatment 185

A.15 Passage C--short extract treatment 186

A.16 Passage C--abstract treatment 187

A.17 Passage C--comprehension test questions . . . . 188

A.18 Passage D--full text treatment 190

A.19 Passage D--long extract treatment 192

A.20 Passage D--short extract treatment 193

A.21 Passage D--abstract treatment 194

A.22 Passage D--comprehension test questions . . . . 195

A.23 Reading difficulty scale 197

A.24 Information availability scale 198

A.25 Experiment program header file listing . . . . 199

A.26 Main experiment program listing 200

Xll

CHAPTER I

INTRODUCTION

Research Overview

The optimal amount of information needed in a given

decision-making situation lies somewhere along a continuum

from "not enough" to "too much." Over two decades ago,

Ackoff (1967) pointed out that management information

systems (MIS) will often hinder the decision-making process

by creating information overload. To deal with this

problem, he called for systems that could filter and

condense data so that only the relevant information reached

the decision-maker's desk. In the years since Ackoff's

challenge, the rapid growth of the information processing

industry has reinforced the importance of filtering and

condensing data. However, as was quickly pointed out by

Rappaport (1968) in a rebuttal to Ackoff's article, there

also exists the danger of over-filtering and over-

condensing data. When this happens, information at the

decision-maker's level is restricted to only that data

which the agent responsible for filtering and condensing

determines relevant.

The potential for information overload is especially

critical in text-based information (Hiltz and Turoff,

1985). Communication is arguably the most critical process

in organizations (Culnan and Bair, 1983); it is a vital

supporting routine which permeates the process of solving

unstructured problems (Mintzberg, et al., 1976). On the

other hand, the amount of attention that managers can

devote to information in any form is a scarce resource, and

an organization can only perform a finite amount of

information processing (Simon, 1973a). Computer-mediated

communication systems (CMCS) are rapidly becoming

commonplace (Valle, 1984; Kerr and Hiltz, 1982), and just

as the computer has the ability to rapidly and efficiently

process large amounts of data, CMCS can result in an

increase in the volume of text messages among

organizational members. Unless filtering and condensing

tools are developed, this increase can place an excessive

strain on the limited resource of managerial attention

(Denning, 1982).

In addition to the problem of increased communication

within the organization which accompanies computer-mediated

communication systems, external sources of text-based

information are becoming of greater importance to the

management of the firm (Lenz and Engledow, 1986a). Any

organization that does not engage in an ongoing process of

examining the environment to determine the conditions under

which it must operate invites disaster (Mitroff, 1985), and

yet because of the limited amount of attention that can be

given to external information (Simon, 1973a), managers are

forced to restrict their scanning activities to a small

subset of the potential information sources (El Sawy,

1985).

Thus text-based information, from both within and

without the firm, threatens to overload the capability of

an organization to effectively process that information. As

indicated by Ackoff (1967), information overload needs to

be addressed by systems designed to filter and condense.

However, most MIS research and implementation has ignored

the potential for supporting managers with the processing

of text, and has focused instead on quantitative data,

particularly accounting or financial data (Huber, 1984;

Ariav and Ginzberg, 1985). Strategic decision making,

however, relies heavily on qualitative, text-based data

(Schwenk, 1984; Blair, 1984b), and the potential pay-off in

more effective and efficient decisions that could be

realized with the support of systems designed for filtering

and condensing text is great.

The purpose of this research is to present a model of

a text-based decision support system. The system supports

the scanning of the external environment and the flow of

communication within the organization by applying computer-

based filtering and condensing techniques to text. These

techniques were developed by information scientists

originally for maintaining the familiar secondary source

literature databases, and have been largely ignored by MIS

research. In particular, the utility of computer-generated

abstracts or extracts as a means to condense text-based

information has received little attention by researchers or

practitioners in the field of organizational information

systems. Since an important feature of the model system is

the ability to generate abstracts or extracts of text-based

information, this research presents a computer algorithm

for creating extracts which was developed and empirically

tested to determine if existing techniques could be used in

the implementation of the system.

Problem Statement

The goal of all information systems is to improve the

performance of knowledge workers in organizations (Sprague,

1980). Implied in this goal are both a criterion for

evaluating the success of a particular system and a

challenge for continued research and development. We can

examine the success of a system in terms of the performance

of the system's users, and we can continue to develop new

applications of information technology aimed at further

improvements in performance.

MIS research in recent years has focused on Decision

Support Systems (DSS), which were defined by Ginzberg and

Stohr (1981, p.8) as systems "used by decision makers to

support their decision-making activities in situations

where it is not possible or desirable to have an automated

system perform the entire decision process." This focus

has evolved with the recognition that the essence of

management is decision-making (Simon, 1960), and that many

decisions are so ill-structured and unique that an

automated decision-making system cannot or should not be

developed. The concept of DSS is to provide tools to

support and augment the decision-making process; in other

words, to improve the performance of the class of knowledge

workers whose primary responsibility is decision making

(i.e., managers).

The model of organization management described by

Anthony (1965) has been widely used by MIS/DSS theorists as

a framework for understanding the role of information

systems in the context of the organization. In Anthony's

model, there are three levels of management: operational

control, management control, and strategic planning. The

operational control level is the lowest management level,

which functions as a direct control over the day-to-day

operations and transactions of the firm. The next highest

level, management control, allocates resources to meet

objectives, controls budgeting, and the like. The time

horizon of the management control level is much larger than

the day-to-day focus of operational control, and may be

measured in months or perhaps a few years. At the apex of

the management structure is the strategic planning level,

which sets organizational objectives and long-range plans.

Anthony pointed out that at the strategic planning level,

the required tasks often involve novel, unique challenges

which are more likely to depend upon information from

sources external to the organization than do those tasks

associated with the lower levels. One of the stated goals

of DSS is to support tasks associated with the highest

level of Anthony's pyramid (Gorry and Scott Morton, 1971;

Sprague and Carlson, 1982; Bonczek, Holsapple and Whinston,

1981).

Building on Anthony's work, Swanson and Culnan (1978)

presented a model of information flows in an organization

(see Figure 1.1). In the figure, the transaction

processing level interacts with the environment external to

the organization through the day-to-day operations. This

activity creates the data that are summarized and passed up

the pyramid of control levels. Within the management

pyramid itself, there is a constant flow of information and

data among the decision makers and managers. The strategic

planning level also interacts with the environment, in

The Strategic Environment

Management Control

L Operational Control

7 Transaction Processing

Other Organizations and Individuals

Figure 1.1. Information flows in management planning and control (taken from Swanson and Culnan, 1978).

8

order to be aware of changing forces or developments that

may affect the business of the firm.

Environmental scanning is a necessary and critical

function of strategic management. At the highest level of

the management pyramid a greater percentage of the

information necessary for decision making must come from

external sources (Anthony, 1965). External environmental

scanning has received considerable attention in recent

years; in fact, many corporations have established

strategic scanning units (Lenz and Engledow, 1986a, 1986b).

However, there exist an overwhelming and growing

number of potential sources for external environmental

information. As a result, the manager is forced by the

constraints of time to limit his or her scanning to a

selected group of favored sources (El Sawy, 1985). It has

been shown that executives in high performing organizations

engage in more effective and broader scanning activities

than do those in low performing companies (Daft, et al.,

1987). If an information system could "pre-scan" the

potential sources, by automatically filtering and

condensing the information, then managers would be able to

reduce the time spent in scanning activities, increase the

number of information sources covered, and better focus

their scanning efforts.

Just as external sources are proliferating, the

intensity of internal communication threatens to result in

another source of information overload (Hiltz and Turoff,

1985). As more and more organizations install computer-

mediated communication systems (CMCS) (Kriebel and Strong,

1984), we can expect the overload of internal communication

to grow. Denning (1982) described this increase of

unwanted computer mail as "electronic junk." Communication

is vital to the decision-making process (Mintzberg, et al.,

1976), but tools to control electronic communication need

to be in place to prevent this potential deluge (Hiltz and

Turoff, 1985).

To deal with information overload, Ackoff (1967)

suggested the use of systems that filter and condense. To

filter information is to limit the received information by

discarding irrelevant information; to condense is to take

the relevant information and reduce it to a more compact,

summarized format.

In the case of the external and internal sources of

text described above, filtering would involve pre-scanning

for coverage of relevant topics, and directing documents to

the appropriate recipients. This could make use of

automatic routing schemes (Tsichritsis, 1984), perhaps

based on system-maintained profiles of the users' areas of

interest (Ewusi-mensah, 1981; Kasper and Morris, 1986).

10

There exist automatic indexing systems (van Rijsbergen,

1979; Dillon and Gray, 1983) which are able to classify

documents according to the topics addressed within them.

The techniques in these systems can be utilized to match

documents to the needs of the users based on the users'

interest profiles.

Condensing text-based data is more difficult to

achieve: in order for a human processor to condense text

information (e.g., create an abstract of the document), he

or she must understand the essence of both the directly

stated as well as the implied, contextual meaning

(Cremmins, 1982). Teaching the computer to "understand"

the content of a document is not an easy task (van

Rijsbergen, 1979).

On the other hand, it may not be necessary to wait

until natural language processing (NLP) systems can mimic

the human's abstracting capabilities before systems to

condense text-based information can be employed. NLP

systems are based on recent developments in artificial

intelligence (Al) research; however, there are systems for

condensing text which do not use Al. Luhn (1958) pioneered

these systems by developing a method for generating

"automatic abstracts" (actually extracts of important

sentences) based on the frequency of word-stems in a

document. This method has been refined and extended (Earl,

11

1970; Edmunson, 1969; Rush, et al., 1971; Mathis, et al.,

1973; Pollack and Zamora, 1975; Paice, 1981) and results

have been achieved that may be suitable for use in a text-

based DSS designed to support scanning and communication

activities in business. Borko and Bernier (1975) suggested

that these automatic abstracts should preferably be called

"computer-generated extracts," to distinguish them from

abstracts created by intelligent reading and writing;

nevertheless, the term automatic abstract is commonly used

in the literature.

In the prior published research on automatic extracts,

a common difficulty has been that the extracting algorithms

tend to work well only in a certain domain (Pollack and

Zamora, 1975; Paice, 1981). Part of the reason for this

may well be that the researchers were focusing on the

problem of generating extracts of journal articles in the

scientific literature, for use within the secondary source

database industry. To increase the usefulness of an

extracting algorithm, however, it should be parameterized

to the extent that it could be applied in many different

settings and to different types of text sources. Given the

flexibility of modern computer hardware, multiple versions

of extracting algorithms could be maintained within one

system and selectively applied as indicated by the

characteristics of the source documents.

12

Automatic extracts have never been used in any

production system in the secondary source databases

(Cremmins, 1982; Borko and Bernier, 1975), nor have they

been tested or applied to the problem of information

processing in the management decision-making domain. Their

possible utility for condensing text-based information in a

business document base remains unexamined. Further, the

extracting algorithms may have utility as an important

intermediary for NLP systems: when the extracts are first

generated, there exists the possibility that they will seem

disjointed and difficult to read if they consist of

sentences selected from diverse parts of the document. NLP

techniques can be applied to the extracting algorithm to

make the extract more readable and improve its quality; one

successful system was able to do this through structural

analysis (Mathis, et al., 1973). Since the extracting

approach uses considerably less processing than the NLP

techniques, a successful combination of the two methods

provides a text-condensing system which is more cost-

effective than unaided NLP techniques, while at the same

time providing a more readable extract than is possible

with simple extracting.

To recap the problems discussed in this section,

consider the following: (1) the goal of DSS is to support

decision making (i.e., managing), particularly for the less

13

well-structured decisions which are commonly found in the

upper management levels, (2) text-based information

processing has been largely overlooked by most MIS/DSS

research as a potential tool for supporting decision making

even though it is one of the major activities of managers,

(3) existing systems which do support the flow of text-

based information are passive, and do not contribute much

to the activities of management in the decision-making

process, and (4) the amount of external and internal

communication flow in organizations has created a potential

for information overload. To address these problems, a

system designed to support the communication and text-based

information flow that is vital to decision making and which

can meet the goals of filtering and condensing information

offers great potential for improving the performance of the

knowledge workers at the higher levels of the management

pyramid.

Research Objectives

The objectives of this research are as follows:

1. A system designed to support environmental

scanning and organizational communication activities is

presented. Information in the form of text will provide

the basis for the system, and the goals of filtering and

condensing will be achieved by the use of automatic

14

indexing, routing, and extracting techniques. As part of

this system, an extracting algorithm which combines

features of previous extracting systems and makes limited

use of existing NLP techniques to improve the effectiveness

and readability of the extracts is developed and presented.

2. To investigate the effectiveness of computer-

generated extracts, and thus to validate the model system,

an empirical study was performed. The study compares the

comprehension of text when presented in the complete

document (the control treatment) with that achieved through

abstracts written by an expert and extracts produced by a

simulated computer algorithm. Results of the study provide

information regarding the potential utility of automatic

extracting and abstracting techniques in supporting the

scanning and communication activities of management.

Definition of Important Terms

In this section, definitions of important terms are

presented, beginning with terms taken from the title of the

dissertation. Definitions presented here will be used

consistently throughout the remaining chapters.

Environmental scanning is an activity in which the

managers of an organization seek information about the

events, trends, and relationships in an organization's

external environment, the knowledge of which helps identify

15

potential problems or opportunities (El Sawy, 1985). A key

concept behind scanning activities is the realization that

the organization is an open system, subject to constantly

changing and undetermined inputs, and therefore must pay

attention to the environment in which the system operates.

Organizational communication is one of the most basic

activities of any organization. For the purposes of this

research, it is defined as the flow of decision-relevant

data among the decision-makers (i.e., the managers of the

organization). Studies have shown that as much as 90% of

managers' time is spent in communication activities, either

verbal or written (Mintzberg, 1973; Rice and Bair, 1983;

Kurke and Aldrich, 1983).

Text-based information is information presented to

decision makers in the form of natural language text. This

is opposed to the structured data found in typical MIS/DSS

systems, whether that data is numeric or alphanumeric.

Only unstructured natural language is considered text-based

information for purposes of this research.

Text processing in this research is limited to those

systems which process text after it has been created, and

which treat text-based information as data which may be

manipulated by the computer. This excludes such systems as

text-editing systems, word processing systems, spelling and

grammar-checking systems, and so forth. These systems are

16

analogous to the transaction processing systems in business

which generate large amounts of quantitative data; just as

transaction processing systems generate the raw data for

the data-based decision support systems and management

information systems, the text-editing systems generate the

raw data for text-based decision support systems and

communication systems. For the purposes of this research,

text-editing systems are not considered in the discussion

of text-processing techniques.

Computer-generated abstracts and extracts are defined

as follows. In this paper, the term "extract" will mean a

set of sentences or phrases selected verbatim from a

document (allowing for possible minor transformations on

the words in the selected phrases), while the term

"abstract" will be used to indicate a summary or

condensation intended to convey the meaning of a document

and written by the author or some other intelligent agent.

As mentioned above, there has been inconsistent use of

these terms in the literature, and the term "automatic

abstract" has often been applied to computer-generated

extracts.

Computer-mediated communication systems (CMCS) are

systems designed to support communication among individuals

and groups in an organization or organizations, and which

use the computer as the medium or channel for communication

17

(Kerr and Hiltz, 1982). In simple CMCS, the computer may

act as does a postal system, merely passing the message

faithfully from sender to recipient. This is typically the

case in electronic mail systems. However, the use of the

computer as a medium allows us to consider the

possibilities for intelligently processing the message,

inserting pre-programmed logic to aid and enhance the

communication. For the purposes of this research, the term

CMCS is intended to include those systems which are capable

of more than simple message transmittal.

Chapter Outline

Chapter II presents a selected review of important

related research. The model of a text-based decision

support system v;hich supports environmental scanning and

organizational communication with the processing of text

follows in Chapter III. In Chapter IV, the experiment

designed to validate the model by testing the extracting

and abstracting techniques is presented, along with a

research model which has applicability to other CMCS

research questions. The analysis of the data collected in

that experiment is presented in the following chapter.

Last, there is a final chapter for the presentation of the

conclusions and implications of this research.

18

Summary of Introduction

In the first part of this chapter, the idea of

filtering and condensing to reduce the problem of

information overload is recalled and applied to text-based

information systems. The goal of decision support systems

has been to address needs of top management; however, their

promise has been largely unfulfilled. This chapter

presents the concept of supporting upper management

decision-making by actively filtering and condensing text-

based information. Both environmental scanning and

organizational communication, which represent the internal

and external text-based information needed in decision

making, are given as examples of functions which can be

supported with this concept. This chapter also presented

the research objectives, definitions of important terms,

and concluded with an outline of the following chapters.

CHAPTER II

LITERATURE REVIEW

In this chapter a selected review of related research

is presented. The chapter is divided into three sections.

First, the theory of inquiring systems is discussed, which

forms the theoretical basis for the design of the model

system presented in this research. Second, there is a

review of the important related research in text-based

information systems, which provides many of the tools and

techniques that are incorporated in the model system. The

final section of this chapter discusses the research in

computer-generated abstracts and extracts, an important

capability of the model system. The prior research in

extracting and abstracting provides a basis for the

extracting algorithm presented and tested by this research

as a partial validation of the model system.

The theory of inquiring systems as presented by

Churchman (1971) provides a useful framework for developing

systems to support organizational management in a complex

and ever-changing world. In the section that follows, each

of Churchman's inquiring system designs is presented and

discussed, and important concepts for the application of

inquiring systems theory to the development of management

19

20

information systems will be discussed. These concepts

provide a basis for considering the structure and design of

the system described in the following chapter.

The second section of this chapter reviews the

research on text-based information systems. Text-based

systems are easily divided into two functional categories:

systems that support communication of text-based

information, and systems that support document archiving

and retrieval. The additional category of text-based

decision support systems is introduced and discussed, and

the model system presented in the following chapter is

shown to belong to this category.

An important part of the model system presented in

this research is the ability to filter and condense text-

based information by indexing and abstracting/extracting

documents. Computer-based indexing has been researched

more extensively than computer-based abstracting or

extracting (Paice, 1981): automatic indexing and

classification systems exist and have been described in the

literature (Smeaton and van Rijsbergen, 1986; Dillon and

Gray, 1983). The model system uses automatic indexing

techniques as a filtering mechanism to eliminate irrelevant

information. In order to condense the information in the

relevant documents, the system uses automatic abstracting

and/or extracting techniques. The last section of this

21

chapter presents the research on computer-generated

abstracts and extracts.

Theory of Inquiring Systems

The methodology and the philosophical basis of

"inquiry" (any systematic investigation or search for

knowledge or information) has been examined by Churchman

(1971). He identifies five approaches to inquiry, each of

which emanates from the writings of a different

philosopher. These five "inquiring systems," identified by

the name of the philosopher, are briefly described below.

The descriptions included here are condensed from

Churchman's book (1971), which the reader is encouraged to

examine for greater detail and background.

Leibnizian Inquiring Systems

In the Leibnizian inquiring system, rationalism is the

key to the discovery of truth. The inquirer has the innate

ability to separate the tautologies from the non-

tautologies, truth from error, as well as to identify

"contingent truths," data which are neither true nor false.

These latter contingent truths are built into "fact nets"

by the process of logically connecting them through their

implications. Those contingent truths that are implied by

many others lie at the bottom of the fact net; thus they

are the best candidates to become identified as

22

tautologies. To challenge one of these requires the

willingness to reconsider the entire net, for if a critical

assumption is proven false, all the statements which imply

its truth must also be false. On the other hand,

contingent truths at the top of the net are the "most

contingent"; if they are proven false, there is little

consequence to the overall model. The process of inquiry

seeks to build these fact nets by deduction. A Leibnizian

inquirer is a model builder, a theory-driven rationalist.

An important aspect of the Leibnizian system of

inquiry is the assumption of an ultimate solution. All

facts nets must converge to the optimum, and the system

must be capable of ranking and classifying the model units

(e.g., the well-formed formulas, the fact nets) in such a

way as to eventually arrive at the optimum solution and to

know that it has arrived.

Mason and Mitroff (1973), in commenting on Churchman's

inquirers, describe the Leibnizian system as being best

suited for well-structured problems. In such cases an

algorithm or mathematically derived model (as in certain

operations research problems) may be thought of as a "fact

net" which converges to the optimum solution. This type of

system relies on such notions as completeness, internal

consistency, the specification of a mathematical proof, and

the like to validate the correctness of its conclusions.

23

Churchman comments that much of the practice of modern

science can be perceived as a Leibnizian inquirer. By this

he means that in spite of their professed objectivity, a

group of researchers will tend to build upon established

theory (the accepted fact net) and resist any challenge

from findings that lie outside the prevailing dogma,

particularly if the challenge is to a contingency at the

bottom of the net. Surely such a reluctance to question

the status quo is risky business for any organization, no

matter how stable the environment.

Lockean Inquiring Systems

Where the Leibnizian inquirer is a rationalist, the

Lockean is an empiricist. Mason and Mitroff (1973) call

the Lockean system "data based" as opposed to "model

based." The Leibnizian builds interlocking sets of

contingencies based on deductive reasoning, while the

Lockean builds data banks of direct observations and

inductively derives the appropriate generalizations. But

without the rigor of a theoretical model, upon what can the

system rely to validate its conclusions? To the Lockean,

the answer is consensus. If a group of inquirers are all

in agreement, the conclusion must be valid. Thus an

empirically derived inference is felt to be true if it is

24

agreed upon by the other members of the "Lockean

community."

A Lockean inquirer would claim to approach all

questions as a "blank tablet," only able to make direct

observations and store them into memory. This only allows

the system to make statements that are in the indicative

mood; there is no theoretical discussion of what "ought" to

be, only what "is." Of course, one such inquirer has no

validity acting alone; there must be agreement among a

community of experts that these indicative statements are

correct. When there is disagreement among the community,

then the question is reconsidered by the members until the

majority is sufficiently large to overwhelm the dissent.

Agreement plays the important (and dangerous!) role of

terminating the inquiry.

On closer examination, however, it becomes clear that

the concept of a "blank tablet" is impossible: a true

empiricist cannot exist. There must be some minimal set of

innate ideas, some "given" which enables the inquirer to

make observations, apply labels, make inferences, and draw

inductive generalizations. These basic ground-rules upon

which the system builds its inquiry must be built into the

system from the start.

There is another way in which pure empiricism is

impossible: there is too much data. No question, no

25

matter how small or seemingly insignificant, could ever be

decided if the inquirers first insisted upon examining all

the relevant data. As Churchman says, "such phrases as

'thorough examination of the facts,' 'study of all aspects

of the situation,' are sheer nonsense on the face of it"

(p. 120). The inquiring system must somehow select a

subset of the data from an infinitely large set of

possibilities and draw its conclusions there. How the

subset is chosen depends upon the "given."

Kantian Inquiring Systems

The concept of the "given" takes on much greater

significance in the Kantian system. In the Lockean

community, the inquirers desire to minimize any reliance on

the innate, depending instead upon direct observation of

the "hard facts" of the real world. However, this is an

unobtainable ideal. As Churchman observes, the mere

ability to examine data implies the existence of built-in,

a priori tools. Further, the process of drawing inference

from the data base would not be possible without the a

priori.

In Kantian inquiring systems the a priori is a model

or representation of the world, which acts as a lens

through which the data are viewed. A key question, of

course, is which model is most appropriate for a given

26

problem. There may be many such models, as many as there

are ways of looking at the world. We can imagine an

inquiring system which has the ability to store many

models, and is able to bring these to bear upon a problem.

By using each model or representation as an analogy to the

real world, the system's executive can decide which is the

richest and most productive approach to the problem.

Thus the Kantian system is both Leibnizian and

Lockean, in that the models or representations are tested

by examining their assumptions and predictions empirically.

The validity of the conclusions of the system is based on

this combination of theory and data: by looking at the

problem from many points of view and testing the data

within the context of each view, we identify the "best

fit." This ability to choose from among the alternative

points of view implies the existence of another point of

view, that of the observer, a part of the system that

observes the process of inquiry as it attempts to fit the

data to the model, and decides when the inquiry has reached

a stopping point.

Hegelian Inquiring Systems

To the Hegelian inquirer, these alternative points of

view (Weltanschauung) are more than just competing

theories, they become deadliest enemies. This is the

27

thesis-antithesis approach to inquiry. In the Hegelian, a

world-view is chosen so as to maximize the likelihood of

the thesis, given the observed data. (Note that the

Hegelian is thus similar to the Kantian, containing both

Leibnizian and Lockean components.) In other words, the

inquirer finds the way of looking at the data such that the

thesis is best supported. This view is challenged by the

antithesis, the most likely opposing viewpoint which also

explains the data.

The conflict between opposing interpretations exposes

the weaknesses and unwarranted assumptions of each. As in

the Kantian, the observer (or decision-maker) is able to

develop a separate point of view as the debate progresses.

But by observing the conflict, the observer is able to

develop a higher point of view, one that explains and

understands the conflicting arguments and their resolution.

Churchman calls this view the synthesis. In the Hegelian,

the conflict validates the synthesis.

In one sense, the process of Hegelian inquiry can

continue indefinitely. As the synthesis rises above the

conflict, it develops a Weltanschauung of its own: the

synthesis replaces the thesis. The new thesis can then be

challenged by its own antithesis, and the process

continues.

28

Singerian Inquiring Systems

The key concept in Singerian inquiry is that the

system can never arrive at the final truth. Inquiry must

continue, with no terminating point. Churchman uses the

term "partitioning" to describe this process, by which he

means that whenever a point of agreement is reached, the

problem needs to be partitioned into problem subsets, a

finer level of inquiry. In other words, if the problem

seems to be solved, you must not be looking at the problem

closely enough. This partitioning of a "solved" problem

continues until it is clear that the problem is not solved,

until the hypothesis is no longer consistent with the data.

When that point is reached, there are three alternatives:

(1) revise the hypothesis (perhaps by including previously

ignored factors, or perhaps by rearranging the factors

already considered), (2) revise the method of examining the

data (possibly by discarding the conflicting data), or (3)

search for more evidence until the nature of the

inconsistency becomes clear (tolerating the inconsistency

for the time being). Thus inquiry is a never-ending

process, a constant challenge to the status quo.

At first glance, such an approach may seem unsettling,

especially to those who prefer to view the world as would a

Leibnizian inquirer. But the strength of this approach is

its rebuke of complacency. As Mitroff (1985) has pointed

29

out, nothing is more likely to breed failure than a

complacent attitude toward the underlying assumptions on

which a system operates. If an idea is thought to possess

perfect validity, why question it? If our Leibnizian

inquirer has discovered the ultimate reality, why look any

further? Mitroff must have been thinking of the Singerian

inquirer when he said: "Thinking rationally in today's

world basically means... engaging in a continually ongoing

process of challenging one's assumptions" (Mitroff, 1985,

p. 198).

Application of Inquiring Systems to Information Systems Research

The brief discussion above necessarily simplifies and

limits Churchman's work, which he applies to a much broader

scope than systems that support decision-making and

management in an organization. Nevertheless, we can gain

insight into the development of such systems by applying

Churchman's theoretical ideas in that context.

Mason and Mitroff (1973) applied the inquiring system

concepts to problems at either end of the structured-

unstructured continuum: well-structured problems were seen

as Leibnizian or Lockean, while ill-structured problems

required the use of the Hegelian or Singerian approach, in

their view. As pointed out by Ginzberg and Stohr (1981),

the role of a DSS is to bring as much structure to bear on

30

an ill-structured problem as the problem will allow. In

essence, the process of problem-solving is a process of

creating structure where there was none. If a problem is

moved from the ill-structured end of the continuum to the

well-structured end, then it has been "solved," in the

Leibnizian sense (Simon, 1973b). In other words, if a

problem can have enough structure built into it so that it

can be considered solved, than the Leibnizian approach has

been applied and will work well, at least as long as the

underlying assumptions of the model hold.

Another approach to the application of Churchman's

inquirers to the problem of management of organizations is

to consider them in the context of the steps or phases in

the decision-making process. For example, the intelligence

phase (Simon, 1960) requires a Lockean, data-gathering

activity. At the same time, however, we recognize that

Kantian models are being employed in the process of

gathering data, and models may be challenged (lest they

fail to describe the world). Since the world constantly

changes, a model may prove to be inadequate even if it was

appropriate before the changes occurred; thus the Hegelian

or Singerian approach should be considered.

The activity of environmental scanning is a process

that is Lockean in the sense that it requires empirical,

data-gathering activity, yet it should be recognized that

31

the observers (the scanning team) incorporate models (in

the Kantian sense) through which the data are filtered.

Research has shown that high-performing firms employ a

greater degree of scanning activity than others (Daft, et

al., 1987), and yet due to the constraints of time scanning

is necessarily limited (El Sawy, 1986). Systems which are

designed to support the scanning activity will contain

elements of the Lockean inquirer, allowing the collection,

organization, and drawing of generalizations from large

amounts of data. They will also display Kantian qualities,

in that models will have to be programmed into the system

for the filtering and condensing of the data. And they

should include Singerian capabilities, in that the system

should observe itself, monitor its activities and

constantly challenge the assumptions and models which drive

the activity of the system.

Text-Based Information Systems

Most systems designed to process text-based

information fall into one of two categories. These are

(1) systems which are primarily intended to support

communication functions and (2) systems which are

intended to support document archiving and retrieval.

This categorization does not include systems which aid in

creating text, such as word processors, style critiquing

32

systems, spelling checkers, etc. Text editing systems

such as these are analogous to the transaction processing

systems in the typical MIS/DSS: just as the transaction

processing systems create the data which are summarized

and processed by MIS and DSS, text editing systems

provide raw data for text-based information processing.

Systems which support the creating and editing of text

are not considered in this review.

The term "computer-mediated communication systems"

(CMCS) has been used to describe those systems which

support organizational communication (Kerr and Hiltz,

1982). Most of the MIS research which can be classified

as text-based information processing falls into this

category. Systems which support document archiving and

retrieval include the automated secondary source

databases, electronic filing systems, and on-line text

searching systems, among others. This category of

systems has been called "document-based systems" (Swanson

and Culnan, 1978). Both categories of systems are

typically passive in nature, requiring the user to

initiate the processing activity (Montgomery, 1981).

A third category of systems designed for the

processing of text is beginning to emerge in the MIS/DSS

literature. These are systems which use text processing

as a tool for decision support. Typically, these systems

33

combine features of both the CMCS and the document-based

systems, and often have programmed logic and models which

enable the system to actively assist the manager in

dealing with the document base. In fact it would seem

that in order for a text-based system to be considered in

the DSS domain, all three of these characteristics would

need to be evident. For the purposes of this research,

the term "text-based decision support system" will be

used to indicate a system which actively supports

managers with the maintenance and flow of the textual

information necessary to decision making.

In the following, each of these three categories of

text-based information systems will be discussed in turn,

and a selected review of related research will be

presented.

Computer-Mediated Communication Systems

One of the most significant trends in information

processing is the growth of CMCS (Kiesler, et al., 1984).

Examples of CMCS that have already found widespread use

include electronic mail, bulletin-board systems, computer

conferencing, and others (Valle, 1984). Several recent

surveys of major business firms have projected significant

growth of CMCS during the rest of this decade (Kolodziej,

1985; Kriebel and Strong, 1984; Dickson, et al., 1984).

34

The projected growth of CMCS and its impact on society has

important implications for text-based DSS, in that much of

the hardware and software necessary for implementing a

text-based DSS (networking systems and protocols,

communications equipment, user interface software, etc.)

will already be in place once CMCS are installed. In fact,

well-designed CMCS systems treat the message units

independently of the processing that occurs at the

destination, and thus allow "client" software packages to

use the CMCS for naming, addressing, resource location, and

other functions. The actual packet being transmitted might

be a mail message, a print file, digitized voice, software,

or whatever the client sends to the destination "mailbox"

(Birrell, et al., 1982).

The simplest form of CMCS are the ubiquitous

electronic mail systems. Very large electronic mail

systems exist, and their impact on the computing

environment has been significant (Birrell, et al., 1982;

Crawford, 1982). Smaller, local implementations of

electronic message systems have also been successful,

although not all types of communication are readily adapted

to CMCS (Rice and Case, 1983). Most wide-area computer

networks feature electronic mail as an integral component

of the system (Quarterman and Hoskins, 1986).

35

One study found that users of CMCS became dissatisfied

with simple electronic mail systems as their experience

increased. Group conferencing and group addressing

features became important, as well as features to support

the filing, manipulation, and retrieval of messages as

documents (Hiltz and Turoff, 1981). As CMCS evolve, these

features will become more common. Kerr and Hiltz (1982)

provide a review of several major CMCS, most of which

involve conferencing capability. Their book offers many

insights into the current state of the art for these

systems as well as suggestions for future system features.

Tombaugh (1984) presents a useful description of computer

conferencing systems, and although generally positive about

the future of computer conferencing via CMCS, raises

several warnings against embracing the technology too

hastily.

The use of group conferencing techniques in a CMCS is

especially important for supporting the decision-making

process. Turoff and Hiltz (1982) argue that embedding the

DSS software in a computer conferencing system will allow

DSS to become generalized tools for group decision making

and communication. The important concept is that most

decisions are made by groups, and the opinions and views of

the decision makers must be communicated and discussed to

be of value. The advantage of CMCS as a tool to support

36

group decision making is that the communication can be

structured according to whatever design or paradigm the

management of a firm decides to employ. Thus the structure

could call for more egalitarian participation, an

authoritative control approach, some sort of voting or

utility analysis, or some other technique. The authors

argue that this structured communication will improve the

organization's ability to adapt to change and increase

their flexibility. The typical model-based DSS approach

has fostered a tendency to over-quantify decisions, when a

more qualitative approach (i.e., more communication) may be

indicated. They feel that structuring the communication

process is one important way that structure can be brought

to bear on an ill-structured problem, which is how such

problems are solved (Simon, 1973b).

Huber (1984) also argues for the use of communication

systems to impose structure on group decision-making

meetings through systems he called "Group Decision Support

Systems" (GDSS). Huber's model of a GDSS incorporates

tools normally associated with a DSS and adds a

communication capability. The GDSS is capable of both text

and numeric data processing, and utilizes a public display

screen in addition to regular terminals so that meeting

participants can work individually or jointly on the

problem at hand. Rathwell and Burns (1985) extend the idea

37

of a GDSS to a distributed environment, a "meeting" for

group decision making that could be separated by time and

place.

Most CMCS are passive systems. Several recent

articles, however, describe the potential for improving

CMCS by programming some intelligent, active processing

into the system. For example, Schicker (1982) suggested

that a distributed database of attributes which uniquely

identify the subscribers in a very large communicating

network be maintained, which would allow senders to query

the system for identifiers of the recipients. These

identifiers will be independent of the actual addresses.

The address of the recipient would be determined by the

system as it processed the mail message, perhaps assisted

by a "suggested" slot identifier provided by the sender.

This scheme allows for quicker, less expensive updates to

maintain the system, an important contribution since in

most large networks there will be a constant movement of

subscribers from one location to another.

A more elaborate scheme was proposed by Tsichritsis,

et al. (1982) in which pre-specification of the routing

schemes is built into the design of a messaging system.

This system relies on the concept of structured messages;

that is, the messages are somehow identified by their

structure as belonging to a particular message class.

38

Classifying the messages allows them to be processed as in

a database, using message templates as schemata. Automatic

procedures assist in the routing of the messages through

the system. More recently, Tsichritsis (1984) described a

system in which the messages contain the intelligence to

alter their routing patterns based on the actions taken by

the users at various points along the path. His analogy

was that of a village word-of-mouth chain, in which a news

item effectively works its way through the village and

returns the knowledge that the intended recipient got the

information. One possible application of such a system

would be poll-taking or Delphi studies; the originator

simply specifies a few starting recipients, and the message

has enough logic to run the poll or Delphi study by itself.

Mazor and Lochovsky (1984) describe a "Message

Management System" for office automation applications, in

which the system knows how to process certain message types

without the originator having to specify the routing. They

introduce the role of Communication Base Administrator

(CBA), analogous to a Database Administrator, who would be

responsible for the creation, maintenance, security, and

integrity of the communication base. In their concept, the

routing scheme involves two kinds: type routing and

instance routing. A sender can specify instance routing

when he or she desires the message to receive special

39

routing treatment, and the instance routing plan applies

only to the single message instance. Type routing is

maintained by the CBA and applies to all messages of a

particular type, as long as they are not overridden by

instance routing. As in the Tsichritsis, et al. (1982)

scheme, this system relies on the structured message

concept, such that the system is able to determine the

values of fields in the message template.

Malone, et al. (1987) recently described a prototype

system which also is based on structured message templates,

and extends the capabilities of the system to manage the

information by allowing users to build rule-based filters

which are designed to screen messages based on the

attributes of the fields in the message templates. The

filters can act upon messages as they enter the system;

thus the system has moved away from a purely passive CMCS

to one that takes an active role in processing information.

Interest in the behavioral aspects of CMCS and its

effect on organizations is also evident in the literature.

Reviews of this topic may be found in Rice (1980; 1983),

Rice and Bair (1983), and Svenning and Ruchinskas (1983).

However, the interpersonal, behavioral aspects of CMCS and

how they affect the organizational environment are not well

understood, and research in this area continues (Turoff and

40

Hiltz, 1982; Olson, 1982; Olson and Lucas, 1982; Culnan and

Bair, 1983; Kiesler, et al., 1984; Siegel, et al., 1986).

Document-Based Systems

Systems designed to process unstructured text

information are quite different from those which process

data. Blair (1984) points out one of the most obvious

differences, the distinction between data retrieval and

document retrieval. In data retrieval a query is

deterministic, while a document retrieval query is

nondeterministic. The question is "I want to know a fact"

as opposed to "I want to know about a subject." Blair also

describes an important difference in the evaluation of a

query response. For data-based systems, the criterion is

correctness: the system should respond with the right

answer to the factual question. For text or document-based

retrieval, the criterion is utility: the system should

provide a useful response to the person requesting

information.

Brookes (1983) pointed out other distinctions between

text and data-based systems. An important factor, one

which has perhaps been the cause of the relative neglect of

document-based information in MIS/DSS, is the fact that the

meaning of textual data is often ambiguous and thus

difficult to process by automated systems. This ambiguity

41

has been the source of many problems for researchers in

natural language processing (Smeaton and van Rijsbergen,

1986). Other factors noted by Brookes were that (1) users

need to be aware of the source of the text in order to make

judgments of its accuracy, (2) the author of a piece of

text may want to exercise control over its distribution,

and (3) the element of time is often critical to the value

of text-based information.

There has been extensive research on document-based

systems, but this research has concentrated in the area of

library science and the secondary source databases which

are used to index scientific and technical publications.

Little attention has been paid to document processing in

the business environment (Swanson and Culnan, 1978;

Schwartz, et al., 1980; Slonim, et al., 1981). One of the

most important distinctions between the two application

areas is that in the typical secondary source database, the

content is relatively stable compared to what might be

expected in a business document base (Slonim, et al.,

1981). It also seems clear that a pure document retrieval

system would be useful only in certain "electronic filing

cabinet" applications; more features would be needed to

broaden the application to the more general concept of

decision support (Turoff and Hiltz, 1982).

42

The choice of a document retrieval access method is

likely to depend on the type of application. Faloutsos and

Christodoulakis (1984) review the five document retrieval

methods and suggest that for business retrieval systems,

which will include messages as well as reports and other

documents, the signature file method is the most

appropriate. Tsichritsis and Christodoulakis (1983) also

recommend the use of signature files in a message filing

and retrieval system. A signature file is essentially a

sequence of bits which approximately represent the

important words in a document. When searching for a

document match, the signature file is searched prior to

accessing the text file itself, thus reducing access time

and storage costs (Christodoulakis and Faloutsos, 1984).

Most secondary source databases use the clustering access

method for text retrieval (van Rijsbergen, 1979), which

seems more appropriate for large, stable collections.

Swanson and Culnan (1978) reviewed a number of

document-based systems which have been used to support

business activities. Schwartz, et al., (1980) described a

document handling facility which was successfully employed

in a medical firm. Slonim, et al., (1981) present

equipment designed exclusively for document and message

handling, and suggest that combining the hardware and

software into a document-based system apart from the

43

regular data processing equipment is a more reasonable

solution when designing systems to process text.

Text-Based Decision Support Systems

Simon (1960) argued that the essence of management is

decision making. Many studies have shown that the majority

of the manager's time involves some form of written or

verbal communication (see Rice and Bair, 1983, for a review

of these studies). Thus an important aspect of decision

making is the processing of text-based information in the

form of communication and environmental data (Aguilar,

1967; Mintzberg, et al., 1976). As mentioned above, the

term text-based DSS will be used to describe systems which

are designed to actively support the decision-making

process through the use of text-based information.

Some of the CMCS described above may be thought of as

systems that support decision making through communication,

particularly the more sophisticated computer conferencing

systems. In fact, Turoff and Hiltz (1982) demonstrated

that the conferencing system they described functioned as a

DSS. Huber's work on GDSS (1984) also seems to resemble

closely this category, although the GDSS concept is not

limited to communication processes. Smeaton and van

Rijsbergen (1986) review the filing and retrieval

techniques for unstructured information, and describe their

44

work with project Minstrel. Part of the project involves

content retrieval of text from an office filing database.

The techniques they are experimenting with offer

significant promise for text-based decision support.

A prototype text-based DSS is presented by Brookes

(1983), which provides for the "capture, storage,

retrieval, and transmission of text" (p. 135) within a

vehicle designed to examine ways text processing tools can

be used for decision support. A key feature of the system

is a set of user interest profiles, used for content

addressing and match-making procedures. In the text

database, the system maintains free and fixed format

information; each piece of unformatted text is associated

with database fields such as author and source identity,

date received, and so forth. Cross-referencing keywords

can be associated with each text entry for addressing and

retrieval purposes. Brookes notes that although the

keyword indexing scheme often leads to difficult

ambiguities in larger environments (different keywords can

be used to describe the same basic concept), the experience

in the prototype indicates that a limited group of users

(such as would be found in a management team or strategic

scanning unit) typically uses a particular word to describe

a particular concept. Thus the ambiguities are resolved by

the habitual use of a restricted keyword list.

45

The importance of text-based information to

environmental scanning is discussed by Ewusi-mensah (1981).

He argues that information systems have largely ignored the

external environment, acting as though the organization is

a closed system. This has produced some degree of success,

particularly at the management control level where problems

tend to be well-structured or semi-structured. However,

the requirements of the strategic planning level demand an

open system approach with reliance upon information from

the external environment. He remarks that "to date most

management information systems have focused almost

exclusively on internal information needs" (p. 307).

After presenting a framework for comparing and

contrasting different organizational environments, Ewusi-

mensah develops suggestions for systems to support the

information needs of all levels of management. Included in

his suggestions is the concept of user interest profiles,

similar to those in the prototype system described by

Brookes (1983). He also makes it clear that qualitative

(text-based) information processing is a critical component

of such a system. The goals of filtering and condensing

text-based information are mentioned in the following

excerpt (p. 312).

The qualitative information can be generated through such available techniques as automatic abstracting, encoding, classification and

46

indexing of externally-based non-structured information. In both instances the computer can be used to filter and condense the information gleaned from the environment.

Despite Ewusi-mensah's mention of automatic

abstracting and its applicability to the problem of

scanning text-based environmental information, there are no

operational systems which have employed these techniques.

The present research presents a model of a text-based DSS

in the following chapter, as mentioned above. An important

part of the system is the ability to generate computer-

based abstracts as suggested by Ewusi-mensah (1981). In

the next section, the topic of automatic abstracting and

extracting will be reviewed.

Automatic Abstracting

The simplest definition of an abstract is that it is

an abbreviated, accurate representation of the contents of

a document (American National Standards Institute, 1979).

The use of abstracts as a surrogate for the complete text

goes back at least to the ancient Greeks; at present there

are over a thousand services which provide abstracts to

their clients (Borko and Bernier, 1975). The idea of using

the computer to automatically generate abstracts was first

proposed and tested by Luhn (1958).

Automatic abstracting has only received slight

attention by researchers in the field of library and

47

information science as compared with the amount of

attention paid to automatic indexing techniques (Paice,

1977; 1981). Three reasons have been suggested for this.

First, most automatic abstracts are really extracts; that

is, they consist of complete sentences or phrases lifted

verbatim from the text. Because of this, the quality of

automatic extracts has never been as good (in the

subjective literary sense) as a well-written abstract

(Bernier, 1985; Cremmins, 1982). Secondly, the cost

effectiveness of automatic abstracting is still an open

question, particularly with respect to entering the full

text of the document to be abstracted. In the mid-1970's,

when most of the automatic abstracting work took place, the

cost of keying in the text was significant. It seemed

wasteful to enter the full text and then reduce it by

computer abstracting (Paice, 1981; Borko and Bernier,

1975). Paice (1977), in commenting on the costliness of

inputting the text, predicted that when the input problem

is solved, "the interest in automatic extracting will be

revived" (p. 144). A third problem with automatic

abstracting research is that no adequate objective measure

of abstract quality exists. Without an objective measure,

there is no guideline for judging the success of an

automatic abstracting system.

48

The problem of the quality of automatic extracts may

be solvable (or at least reduced) by the use of NLP

techniques applied to the extracting process. A simple

rule-based parsing technique which derives from the work of

Paice (1981) can greatly improve the readability of

extracts. This technique was incorporated into the

extracting algorithm presented in Chapter IV.

The second problem, that of the cost of input the

documents, is being solved from two directions. On the one

hand, the technology of optical character readers is much

more sophisticated than before. In addition, more and more

text is being created with the aid of computers, and is

therefore economically available for abstracting by direct

access to the source computer. If Paice's prediction is

correct, more interest in automatic abstracting should be

evident in the near future.

The problem of measuring the quality of abstracts and

extracts remains an ill-structured problem; one that

appears to depend upon subjective evaluations alone.

However, the approach used in this research provides an

objective procedure for measuring the effectiveness of

extracts and abstracts, if not their subjective qualities.

Rather than attempting to develop an objective measure of

extract quality, the approach used in this research was to

empirically test the extracting system in a designed and

49

controlled experiment. In other words, we can measure the

extent to which an extract or abstract performs the task

which it is intended to do, rather than try to place a

value on its subjective qualities.

In the following paragraphs some basic concepts

related to abstracting are presented. Then the research on

abstracting using NLP techniques is briefly reviewed,

followed by a review of the research on automatic

extracting.

Abstracting Concepts

Borko and Bernier (1975) identify seven functions that

abstracts serve. These are (1) promote current awareness,

(2) save reading time, (3) facilitate selection, (4) help

overcome the language barrier, (5) facilitate literature

searches, (6) improve indexing efficiency, and (7) aid in

the preparation of reviews. While these functions describe

the role of abstracts in the academic and professional

community, which is served by the current abstracting and

indexing industry through the secondary source publications

and databases, they can also be used to describe the

functions abstracting would serve in a system that

processes text to support organizational decision making.

In the context of a management team or strategic scanning

50

unit, abstracts could serve all these functions with the

possible exception of improving indexing efficiency.

Most authors identify three main types of abstracts,

the indicative, informative, and critical abstracts (Borko

and Bernier, 1975; Weil, 1970; Cremmins, 1982). Briefly,

the indicative (sometimes called descriptive) abstract

describes what the text is about and helps the reader

decide if the full text should be consulted; the

informative abstract tries to convey the information in the

text in summary form so that the reader will not need to

consult the full text; while the critical abstract in some

way evaluates or criticizes the text, letting the reader

know the reviewer's opinion of the full text. Automatic

abstracting work has focused on indicative or informative

abstracts, since critical abstracts seem to be beyond the

reach of current technology (Paice, 1981).

Work on establishing a standard for abstracting began

with an extensive review of published guidelines used by

the various abstracting and indexing services (Borko and

Chatman, 1963). In 1970, a standard was adapted by the

American National Standards Institute (ANSI) as a guideline

for the preparation of abstracts (Weil, 1970; Borko and

Bernier, 1975). An excellent abstract of the ANSI standard

on abstracts can be found in appendix 1 of Cremmins (1982).

51

Natural Language Processing Techniques

In this subsection a brief review of NLP studies that

contribute to computer-generated abstracting is presented.

For a thorough review of NLP research in general, see

Ballard, et al. (1984); for a discussion of the potential

market for NLP systems, see Johnson (1986). Smeaton and

van Rijsbergen (1986) provide a good introduction to NLP as

it relates to the processing of free, unstructured text in

a business environment.

Linguistic information resides in at least two forms:

syntax and semantics. Syntactic information is based on

the constructs of the language syntax, while semantic

information is domain-related (Smeaton and van Rijsbergen,

1986). A system that can fully analyze the semantic

information of free text remains beyond the scope of

current technology, and probably will for the foreseeable

future (Epstein, 1985). Those systems which do use

semantic information require "deep" understanding of the

domain in which the system is operating, and thus tend to

be slow, restricted to very small domains, and expensive

(Smeaton and van Rijsbergen, 1986). Some efforts at

separating domain-specific semantics from more general

semantic information have been reported (Hafner and Godden,

1985), but for use in an automatic abstracting system, the

52

expanse of semantic knowledge required would be

prohibitive.

On the other hand, the analysis of syntax is

relatively domain-independent. Several important NLP

systems have been built upon syntactic processing, such as

the experimental EPISTLE project (Miller, 1980; Miller, et

al., 1981; Heidorn, et al., 1982; Schriber, 1983). In most

NLP systems, the syntactic information is analyzed by a

general-purpose parser which provides input to the semantic

analysis component of the system (Hafner and Godden, 1985).

When ambiguities occur, the semantic processor is called on

to resolve them by using heuristics (Smeaton and van

Rijsbergen, 1986). However, for the EPISTLE system, a

unique parse is always produced (Heidorn, et al., 1982);

ambiguities are not allowed. While this may result in some

incorrect interpretations, particularly of idiomatic

phrases such as "he threw the book at me," the overall

effect of using syntax alone is robust with respect to

developing a system which could support content analysis

and retrieval in a business environment (Smeaton and van

Rijsbergen, 1986).

The original goals of the EPISTLE project as outlined

by Miller (1980) included automatic abstracting and

indexing. Thus far, however, the reported functions of the

EPISTLE system are oriented toward text-critiquing

53

(Heidorn, et al., 1982; Schriber, 1983). Also, it has been

noted that the system is very expensive to run, requiring a

large mainframe computer and considerable computer

resources to parse a single sentence (Smeaton and van

Rijsbergen, 1986).

One experimental system was developed which produced

abstracts of children's stories (Taylor and Krulee, 1977).

These were based on the semantic network knowledge

representation scheme. Maximally connected sub-graphs were

located in the network, and the most influential nodes in

the sub-graphs identified. Proceeding iteratively, a

single sub-graph was obtained which served as an abstract

of the original network, and from this graph a set of

natural language sentences was produced. This system seems

to come the closest to the human abstracting process of

reading the text and writing the abstract. The results

were said to be encouraging and plausible, although

significant difficulties were encountered. As in most NLP

systems based on semantic information, however, it was

restricted to a very limited domain.

Automatic Extracting Techniques

In addition to syntactic and semantic information,

statistical information can be derived from unstructured

text. Luhn (1958) pioneered this research direction with a

54

system designed to produce "auto-abstracts" based on the

frequencies and relative positions of the non-trivial words

in a document. These are really "extracts" (Weil, 1970) in

that they consist of sentences selected verbatim from the

body of the document. Work on automatic extracting has

continued sporadically during the period since Luhn's

initial efforts, and a brief review of the most important

studies is presented in the following. More detailed

reviews can be found in Borko and Bernier (1975), Mathis

and Rush (1985), and Paice (1977). Reviews of the non-

English language extracting research can be found in

Wellisch (1984).

During the 1960's work was done on extracting systems

by Edmunson and Wyllys (1961) and Edmunson (1964, 1969).

These studies involved the analysis of four methods of

sentence weighting: the location method, the cue method,

the key method, and the title method. In the location

method, the position of the sentence in the document served

to weight its importance. This method is based on the work

of Baxendale (1958) who discovered during the course of her

investigation of automatic indexing that so-called "topic

sentences" were most likely to occur as either the first

(85%) or the last (7%) sentence in a paragraph. The cue

method used a dictionary of words selected by the

researchers as associated statistically with either an

55

extract-worthy sentence or a sentence of negative value to

the extract, and weighted the sentences accordingly. The

key method is similar to the Luhn approach, in that it

identifies the most frequent non-common words not included

in the cue dictionary. The title method used the words in

the title and subtitles as indicative of relative

importance, and assigned appropriate weights when the words

were used in a sentence in the text. The four methods were

used and tested in combination, and the best results were

obtained when the title, cue, and location methods were

used together (Edmunson, 1969).

Earl (1970) investigated the idea that similarities of

syntax structure might indicate extract-worthy sentences.

However, her results indicated that the vast majority of

sentences exhibited unique syntax patterns, and therefore

syntax was not useful for selecting sentences. She did,

however, develop a statistical selection algorithm similar

to Luhn's and achieved results which were "mildly

encouraging" (p. 327).

Perhaps the most successful extracting system was

developed in the 1970's and reported on by Rush, et al.

(1971), Mathis, et al. (1973), and Pollock and Zamora

(1975). This system is called ADAM (for Automatic Document

Abstracting Method), and is based primarily on the use of

cue words in a "word control list" (WCL). In addition.

56

algorithms for improving the extract sentences by deleting

certain phrases and combining redundant phrases through the

use of structural analysis were developed (Mathis, et al.,

1973).

More recently, Paice (1981) described a method for

automatic extracting based on the use of self-indicating

phrases. An important aspect of the method is the use of

exophoric references to indicate clusters of sentences

which should be extracted as a unit. Thus the indicated

sentences serve as a basis around which to build the

extract, greatly reducing the sometimes disjointed

appearance of automatic extracts.

In discussing the use of automatic extracts as a

surrogate for abstracting, Paice (1981, p. 172) made the

following observations:

The possibility of producing abstracts by computer has not received very much attention. There are perhaps two main reasons for this. First, it appears that the production of well-constructed abstracts is an artificial intelligence problem, and therefore unlikely to be either feasible or worthwhile until well into the future: the alternative of picking sentences here and there in a document is a rather unattractive proposition. Second, the cost of key-punching texts for input to an abstracting program can hardly be justified--especially since the program will then in effect discard most of the text which has been so laboriously prepared. It now appears that the first of these objections is exaggerated--reasonable-looking abstracts can often be produced by quite

57

"unintelligent" programs--while with advances in technology the second problem should soon disappear.

Summary of Related Research

The five philosophies of inquiring systems as

presented by Churchman (1971) were presented and reviewed.

The application of these systems to the phases and routines

in the decision-making process yields insight into the

development of systems to support information processing in

organizations. In particular, systems to support

environmental scanning and which allow organizations to

challenge the assumptions underlying their view of the

world are indicated.

There are two familiar categories of information

systems which are designed to process text-based

information: CMCS and document-based systems. A third

category, text-based DSS, is defined and discussed. Text-

based DSS use both communication functions and document

handling functions to actively support the text information

needs of decision makers in an organization, and much of

the research in CMCS and document-based systems will

contribute to the development of text-based DSS.

The creation of computer-generated abstracts offers

potential for supporting decision makers in a text-based

DSS by condensing information to a manageable level. Prior

58

research in automatic abstracting has been concentrated in

the library science field, with the purpose of providing

automatic abstracts for the secondary source databases.

Most of this research has been on systems that use

algorithms for selecting important sentences from the

document, although some attempt has been made to generate

abstracts through artificial intelligence using NLP

techniques. It appears that creditable extracts have been

produced by the first method, although their quality is not

as good as a well-written abstract prepared by an expert.

The latter method has seen modest success in very limited

domains, but a working system using NLP in unrestricted

domains is considered unattainable for the foreseeable

future.

CHAPTER III

MODEL OF A TEXT-BASED DECISION

SUPPORT SYSTEM

Most existing DSS rely upon the processing of data,

not text. However, much of the information which managers

use in the decision making is text-based, either spoken or

written. In this section, a model of an information system

to support decision making based on the processing of text

is presented.

The objectives of the system are presented first,

followed by an overview of the system model and features.

A discussion of the automatic indexing and extracting which

will be used to filter and condense information in the

system is presented. Characteristics of the system that

reflect ideas from Churchman's inquiring systems are then

discussed. The chapter concludes with a review of the

unique features and the contributions of the system.

Objectives of the System

The objectives of the text-based DSS presented here

are: (1) to improve the performance of knowledge workers

whose primary responsibility is decision making by the use

of text-based information processing, and (2) to filter and

59

60

condense text-based information by using automatic indexing

techniques as a means of attenuating users to relevant

information (filtering) and automatic extracting and/or

abstracting to reduce that information (condensing).

In meeting these objectives, the system provides a

vehicle to support scanning of external environmental

information, as called for by El Sawy (1985), Ewusi-mensah

(1981), and others. In addition, the system supports

organizational communication, which is critical to the

decision-making process (Mintzberg, et al., 1976).

Further, the system actively processes information.

In an active information system the information seeks the

user, while in a passive system the user must seek the

information (Montgomery, 1981). In the system described

here, new information will be brought to the user's

attention when a document relevant to the user's area of

interest enters the system.

System Processes

A representation of the components of a text-based DSS

which actively supports environmental scanning and

organizational communication by filtering and condensing

text is presented in Figure 3.1. The system as pictured is

a multi-processor system; there are one or more central

processors (only one is shown) and many local processors.

61

Central Processor

Document Base

Indexing

"Singerian" Component

EXTERNAL SOURCES

Trade Journals

Reports Articles Summaries Etc.

Local Proc.

CMCS

Models Extract!

^ INTER:IAL SOURCES

Memos Field Reports Communications Etc.

Local Proc.

Figure 3.1. Model of a text-based decision support system.

62

Document-based information may enter the system through

external sources or internal sources.

External documents, consisting of trade journals,

reports, articles, on-line information services, and the

like could be directly linked to the central processor from

the originating source. These sources can be selected by

the decision-making team and adjusted over time, even as

research has shown that decision makers consult a limited

set of external sources on a routine basis (El Sawy, 1985).

By automating the scanning activity, the number of sources

will be expanded and/or the time spent in the scanning

activity will be reduced, increasing the effectiveness of

the decision-making team (Daft, et al., 1987).

Internal documents enter the system through the local

processors, and would for the most part originate from

members of the decision-making team. The local processors

support a full range of the typical CMCS functions, and

allow the user to specify a particular communication as one

which should enter the document base, or as one which

should remain private between the sender and receiver. The

document base (or bases) is maintained on the central

processor (or processors), and the system supports a full

range of the typical document-based retrieval activities.

These aspects of the system just described (the

ability to retrieve and store large amounts of external and

63

internal communication, to support the sharing of that

information among the decision-making team, to catalog and

organize that information in a document base, etc.) are

characteristic of the Lockean inquiring system. Decision

makers must be constantly examining the environment

(internal and external), building a database of knowledge

from which to draw conclusions and generalizations about

the world.

When a new document enters the document base, a

signature file is prepared by automatic keyword indexing

such as described in Dillon and Gray (1983). A signature

file is a short representation of the document based on

keywords and relationships between them, and is an access

method well-suited for documents in a business document

base (Faloutsos and Christodoulakis, 1984). The central

processor then "announces" the new document by broadcasting

the signature file on the network. At the local processor

level, models are maintained which reflect the interest-

areas of the particular users associated with that site.

The models are used by the local processors to evaluate the

relevance of the new document by matching the model to the

signature file when it appears on the network. If the

local processor finds a "match," (matches will not

necessarily be a perfect match, the models can be designed

in such a way that close matches are also retrieved) the

64

full text of the document is requested from the central

processor and down-loaded to the local processor. The

filters used to match the users needs to the signature

files can be thought of as models representing the

knowledge requirements of the users. These filters may be

stored, updated, and managed as models in any model base.

Thus the system is actively filtering text-based

information and routing documents to the appropriate user

as the documents become available by applying models (the

filters for matching) to the signature files.

In addition to the models used to determine relevancy

of new documents, the local processors can also be

programmed with the ability to generate extracts of the

documents. Recall that extracts consist of selected

sentences and phrases from a document, and act as pseudo-

abstracts. Automatic extracting research has shown that an

algorithm for generating an extract may be tuned to the

type of document typically extracted as well as the

interests of the user. This tuning relies in part on a

"word control list" (WCL), a list of key words or phrases

and associated weights, as well as other parameters that

can be adjusted to fit the document categories. In fact,

users may have a set of extracting algorithms, each

designed for a different purpose or document category, to

65

further refine the ability of the system to filter and

condense the received information.

By maintaining user-specific WCL's and parameters

within the extracting models, the local processor can

condense the text-based information in a manner appropriate

to the interests of the individual members of the decision

making team. After a document is selected by matching the

signature file to the local filters, the local processor

retrieves the document from the central processor,

generates an extract of the document, and prepares an

announcement of the new document for user. When the user

invokes the system, the system advises him or her of the

new document(s) which were judged relevant. The user can

review the title and source (and any other identifying

data), the extract, and/or the full text of the document.

The capability of the system to apply different models

(the signature file generating models, the matching

criteria used as filters, and the extracting models) is

suggestive of the Kantian inquiring system. The system

could be programmed with the ability to select the

appropriate models based on characteristics of the incoming

document; to find the best fitting model. Further, the

building of new models could also be programmed through

expert system technology. Thus the system could expand its

model base over time by paying attention to the use and

66

retrieval activities of the decision makers, and through

dialogue with them.

Singerian Component

At the central processor level, a Singerian capability

can be built into the system. A characteristic of

Singerian inquiring systems is the continual re-examination

of the underlying assumptions and models which define the

"world-view" of the decision makers. A Singerian system

would never assume to have reached a true problem solution,

since what is a solution for today's reality may not be

correct for tomorrow. In the system under consideration,

for example, profiles can be developed of the documents

that are overlooked by the decision makers. This

information could be used to question the underlying

assumptions and the effectiveness of the models and

extracting activity. Unless the system can respond to

subtle changes in the environment, in time its

effectiveness will deteriorate. By monitoring the activity

of the users in the system, warning signals can be given

when certain information patterns are consistently

overlooked, thus prompting a re-examination of the models

and extracting functions.

67

Automatic Indexing and Abstracting

There has been much research in the topic of automatic

indexing. This system would use a system similar to the

FASIT system (Dillon and Gray, 1983), which is a fast,

economical, syntax-based analyzer. The output of the index

routine could be stored in signature files as discussed by

Faloutsos and Christodoulakis (1984). In addition,

relational thesauri can be developed over time to further

refine the system's ability to generate consistent sets of

keywords, reducing the problems associated with ambiguous

terms.

As described above, automatic abstracting can

conceptually take place in one of two ways: by a system

that "understands" the content of the document and writes

an abstract using NLP techniques, or by using an algorithm

to extract significant sentences verbatim from the text.

Theoretically, if the NLP approach were successful it would

provide abstracts that were as good as those written by a

human expert. The extracting systems can never be expected

to consistently produce extracts that are of the same

quality as a well-written abstract, although some of the

systems reported in the literature have had impressive

results.

Unfortunately, the NLP techniques have not yet been

developed to the point that they could be solely used in

68

the system described in this chapter. It is possible that

the technology will be available in the future, as we

understand more about language and its analysis. On the

other hand, the extracting algorithms that can be

implemented today may be adequate for use in the system

presented here. The purpose of the abstracting capability

in this system is to condense the information in documents

so that (1) users do not have to spend as much time reading

as they otherwise would, thus improving their efficiency,

and (2) users can increase their coverage of relevant

documents, thus increasing their effectiveness. If we can

demonstrate that the extracting technique generates an

informative extract that suffices, then the system can be

developed using techniques available today.

A major problem with the extracts, of course, is that

they tend to be disjointed, and may be difficult to read.

This problem can be ameliorated somewhat by selecting

groups of sentences from the text based on exophoric

references, as was done in the Paice (1981) design, or by

joining closely related sentences or phrases and removing

redundancy by structural analysis as was done in the

Mathis, et al. (1975) study. It is also possible that NLP

techniques could be applied to the output of the extracts

to "clean them up" and make the output more readable. If

the important information in the document can be extracted

69

using a simple algorithm, the additional use of NLP to

improve the quality of the "abstract" by processing only

the extracted sentences can reasonably be expected to

consume considerably less processing resources than an

abstracting system based entirely on NLP techniques. Thus

it may be that the extracting approach could serve as an

important intermediate step in the preparation of an

automatic abstract, if it can be shown that the extract

contains the necessary information from the document.

Summary of System Features

The system presented here contributes to the research

on text-based processing as a tool for decision support.

Further, the goals of filtering and condensing are

addressed by the use of automatic indexing and extracting

techniques, which have never been employed in a working

system for the support of text-based information processing

in a business environment. The system also contributes to

the goal of supporting environmental scanning activities,

which remains as one of the areas where little support from

information systems is available. The system directly

supports the communication activities of the strategic

planning level, filtering and condensing the information

flows among the decision makers and increasing their

efficiency and effectiveness.

CHAPTER IV

MODEL VALIDATION

This chapter describes the experiment which was

conducted to study the effectiveness of computer-based

extracting and abstracting techniques. The results of this

experiment have important implications for the development

of the system described in the previous chapter, and serve

as a partial validation of the model system.

First, the research model is presented, followed by

the research question, the research hypotheses, and the

experimental treatments. Next, the extracting algorithm

which was developed to demonstrate the capability of the

computer to perform as described in the previous chapter

is presented. Following that are descriptions of the

experimental design, the dependent variables, the

subjects for the experiment, and the procedures used to

conduct the experiment.

Research Model

A communication system consists of a message

transmitter, a channel through which that message is sent,

and a message recipient (Shannon and Weaver, 1964). Each

of the three components of the communication system are

70

71

subject to the law of increasing entropy, and as such may

introduce error into the system which will reduce the

overall effectiveness of a particular message instance.

Figure 4.1, which was taken from Kasper and Morris (1988),

is a simple box and arrow diagram which pictures these

relationships. For the purposes of this discussion, the

sending agent is the originator of the message, the channel

is a CMCS which has the capabilities to modify the message

(e.g., by creating extracts), and the receiving agent is

the message recipient.

In the figure, the differences associated with each

message instance are shown as determining the effectiveness

or performance of the communicating system. Each of the

three components of the communication system contribute to

system performance by generating these differences. To

illustrate, the sending agent controls such variables as

message length, difficulty level, implicit background

assumptions, and so forth. Each of these factors will

affect the comprehension of the message at the receiving

end. The channel selected for message transmission

influences comprehension through media differences. The

study from which the figure was taken demonstrated that

CMCS which use audio or video communication channels may

expect to find reduced comprehension of difficult messages

(Kasper and Morris, 1988). At the receiving end, many

72

Message Differences

Channel

^

Media Differences

Receiving Agent

Individual Recipient

Differences

Communication System

Performance

Figure 4.1. A behavioral model of CMCS performance

73

variables inherent to the recipient of a message can

influence the effectiveness of the communication. Such

factors as interest level, experience with the channel

characteristics, background in the subject of the

communication, and others can all affect the message

transmission.

The relationships in the figure suggest a simple liner

model which allows the investigation of the effects due to

each component of the communication process. Consider a

communication system in which there are a limited set of

messages being sent to a group of recipients over a fixed

set of CMCS channels. The performance of a given message

may be modeled as

Y = u + a(i) + b(j) + c(k) + e(ijk),

where Y is a measure of communication performance, u is the

overall mean of Y and is an unknown constant, a(i) is the

effect due to the (i)th message, b(j) is the effect due to

the (j)th CMCS channel, c(k) is the effect due to the k(th)

individual recipient, and e(ijk) is the error associated

with each message instance.

The experiment described here investigated the effects

on the communication process of the use of computer-

generated extracts and abstracts. In order to examine

these effects, the channels were manipulated to study the

74

techniques of interest. By creating an environment with a

limited set of text messages and a small group of

recipients, the effects due to the different CMCS channels

(the treatment effects) can be isolated from the effects

due to the messages and the individual recipient

differences.

Research Question

The primary purpose of the experiment described in

this chapter is to determine the effectiveness of

extracting algorithms as compared to intelligently written

abstracts. There are two main concerns: first, will the

extracts contain enough information about the passages from

which they are taken to suffice as surrogates for the

passage in the same way as would an abstract written by a

expert; secondly, will the extracts be easy to read and

understand or will they seem disjointed and confusing?

The first concern can be examined by having subjects

read the extracts and attempt to answer questions which

were designed to test their knowledge and understanding of

the complete passages. If the extract has done a good job

of selecting the most important sentences, than the

subjects should be able to perform reasonably well on the

questions. The performance of the subjects on the

questions will also give at least an indirect indication of

75

the readability of the extract, but to assess this more

directly a simple response scale was used to ask the

subjects about the readability of the passages.

The primary research questions can be stated as

follows: (1) will the comprehension of the material in a

passage be significantly reduced when subjects are only

allowed to read an algorithm-based extract of a passage;

and (2) will subjects who read an algorithm-based extract

of a passage find it difficult to read and understand as

compared to the readability of the full passage?

As a side issue, two other questions were addressed.

These are (1) will the subject's perception of the

information content of the extracts be similar to that of a

well-written abstract or to that of the full text of the

passage, and (2) will the amount of time spent reading the

extracted passages be reduced in an amount consistent with

the reduction in length of the passages?

Research Hypotheses

The goal of a good abstract is to provide an

abbreviated accurate representation of the contents of a

document (ANSI, Inc., 1979). Due to the constraint of

length, however, abstracts "rarely equal and never surpass

the information content of the basic document" (Cremmins,

1982, p. 3). It can therefore be expected that the

76

comprehension of the subjects in the experiment will be

lowered in treatments where only the abstracts or extracts

are presented as compared with the full text. On the other

hand, if the abstracts/extracts are of sufficient quality,

then the reduction in length will not greatly reduce the

information content, and the comprehension will not be

significantly different from the full text treatment. It

is also conceivable that an excellent abstract or extract

may actually enhance the subject's ability to answer

comprehension questions, since the information is presented

in a concise, summarized format without a lot of

distracting information (as might occur in information

overload). The primary research hypothesis can thus be

stated in the null form as follows:

Hypothesis 1: There is no effect on the

subject's comprehension due to treatment

differences after removing the effect of

recipient and passage.

If the null hypothesis is not rejected, then there is

an indication that the abstracts and extracts were of

sufficient quality such that comprehension was not

significantly reduced in the treatments of interest. If

there is a significant treatment effect, then it will be

necessary to examine the nature of the effect more

77

carefully. in either case, simultaneous confidence limits

for the difference between the treatment means in all

pairwise comparisons should be constructed to clarify the

nature of the results. Note that even if the null is

accepted, we cannot conclude that there is no difference

between the treatments since we expect a priori some

reduction in information.

It is also reasonable to expect that there will be a

reduction in reading time for shortened passages (abstracts

and extracts), since less material is presented for

reading. However, if the passages are confusing or

disjointed, as they may be in the extract treatments, then

the time reduction may not be significant since subjects

might take longer to process the information. Stated in

the null form, the hypothesis concerning reading time is as

follows:

Hypothesis 2: There is no effect on reading time

due to the treatment differences after removing

the effect of recipient and passage.

Confidence limits for all pairwise comparisons of the

mean reading times by treatment are also of interest,

regardless of the result of the hypothesis test. These

intervals will provide information about the exact nature

of the experimental results.

78

Another important research question concerns the

reading difficulty of the passage. While it seems likely

that the reading difficulty of the treatments will be

different, it is not clear what the exact nature of the

differences will be. One concern is that the text in the

extract treatments will appear disjointed and be difficult

to read. However, it may be that the reduction of text

will make it easier to focus on the content, and subjects

will not find the reading difficult. The null hypothesis

associated with the difficulty rating is as follows:

Hypothesis 3: There is no effect on the

difficulty of reading the passage due to the

treatment differences after removing the effect

of recipient and passage.

Once again, confidence limits for the difference in

the reading difficulty ratings for all pairwise comparisons

are of interest.

The fourth and last hypothesis of interest is the

subject's perception of information availability. It is

reasonable to expect that this variable will be highly

correlated with the comprehension, since subjects will

indirectly be judging how well they think they scored on

the comprehension test in the experiment. However, our

79

interest is in their perception of information content

across treatments. Thus the null hypotheses is:

Hypothesis 4: There is no effect on the amount

of information in the passage due to the

treatment differences after removing the effect

of recipient and passage.

Once again, confidence limits for the difference in

the mean information availability for all pairwise

treatment comparisons are of interest.

Treatments

There were four treatments in the experiment, a

control and three treatments of interest. The control

treatment simulated a typical passive CMCS; i.e., the full

text of the passage was presented as in an electronic mail

system. The second treatment simulated a CMCS which uses

human expertise to prepare abstracts of documents for

distribution, as is done in many secondary source databases

and other systems for the selective dissemination of

information. One can also think of the second treatment as

simulating the future ability of the computer to generate

abstracts through artificial intelligence; if such ability

is developed in the future, then it is expected that the

computer could do no better than to write an abstract as

80

would a human expert. The third and fourth treatments

simulate a system which has the ability to generate

extracts using a simple algorithm. The difference in these

two treatments is length: there is a short extract

treatment and a long extract treatment. One of the

parameters in an extracting algorithm is the length; it can

be adjusted based on the nature of the source documents.

Our concern was that if we chose a length parameter that

was too long or too short, the outcome of the experiment

would be less clear. By using two different extract

lengths, it was felt that more information concerning the

extracting algorithm's effect on comprehension could be

obtained.

Each of the four treatments was presented to the

subject using a microcomputer program written for the

purposes of this experiment. A copy of the program listing

is included in Appendix A. Several features of the program

are worth noting.

The program was written in Turbo C, a product of

Borland International. The program made use of the special

function keys on the microcomputer, in particular the

"PgUp," "PgDn," "End," and "Home" keys. A consistent,

user-friendly interface was developed, which minimized any

distractions or problems associated with the mechanics of

reading the texts on the computer. Prior to actually

81

reading the first treatment passage, subjects read an

instructions text using the same interface as the actual

treatments. Thus, they were able to practice reading the

passages and using the program before actually beginning

the experiment. A copy of the instructions text is also

included in Appendix A.

The program determined the treatment order by reading

a disk file which contained a list of a possible orderings.

All that was required of the researcher when administering

the experiment was to assign a number to each subject to

insure that no two subjects had the same treatment/passage

ordering. Once started, the program controlled the

presentation of the passages and questions through all four

treatments. In addition, the program was designed to

record the time spent reading the passages in each

treatment by measuring the amount of time spent between the

moment the subject requested the document until he or she

signaled that they were finished by typing the "End" key.

In fact, each keystroke made by the subjects was recorded,

and the time from the start of the treatment passage

presentation until the moment the key was pressed was

recorded as well.

The reason for recording keystroke information was in

response to a side issue that had been raised during a

previous study (Kasper and Morris, 1988). During that

82

study, which compared several presentation media (voice,

video, paper, and electronic mail), subjects were not free

to refer back or re-read the text in the electronic mail

treatment. Many subjects commented about this during the

debriefing, indicating that they would have re-read the

passage if they could have done so. Therefore, this study

provided that capability to the subjects, intending to

observe whether or not subjects took advantage of the

opportunity to re-read the documents, and perhaps draw some

inferences about the subjects who referred back versus

those who did not.

The text passages and their associated test questions

were randomly selected from sample reading comprehension

tests prepared for the Graduate Management Admissions Test

(GMAT) (Educational Testing Service, 1986). Four similar

GMAT passages were used successfully in the previous study

(Kasper and Morris, 1988).

As mentioned above, the algorithm which was used to

create the extracts in the experiment can be "tuned" by

selecting terms for the word control list and adjusting the

associated weights. In a working system, this feature

would be used to adjust the extracts to reflect the

interests of each user as well as the document types that

the system is scanning. In the experiment, a similar

procedure was used which increases the generalizability of

83

the results in this manner: working with the four passages

used in the previous experiment, the algorithm was

developed and tested by creating appropriate extracts for

this type of document and type of task required. Once the

algorithm was established, the set of four reading

comprehension tests used in this experiment was then

randomly selected from the available sample GMAT tests.

The extracts for the experiment were generated by applying

the algorithm prior to the researcher having any knowledge

of the contents of the comprehension questions. The

abstracts were also written by an expert at the same time.

In this way, knowledge of the comprehension questions did

not bias the extracts or abstracts.

Full Text Treatment

The full text treatment was considered the control

treatment for this experiment. The passages were selected

randomly from a set of twelve sample GMAT reading

comprehension tests (Educational Testing Service, 1986).

The passages ranged from 447 to 470 words, and the average

length was 457 words. During the experiment, the program

for displaying the texts required three or four screens for

the full text treatments.

The Fog Index, a well-known indicator of the

difficulty of reading a passage, was calculated for each

84

(Gunning, 1968). The fog index is considered to be an

estimate of the educational grade level required to read

and understand a passage. Results indicated that the

passages were all about the same level of difficulty, with

indices from 15.5 to 18.9, which must be considered fairly

difficult. The use of moderately difficult passages and

questions was intentional, in order to generate enough

variance for analysis.

Abstract Treatment

The abstracts were written by a hired expert, who is a

teaching assistant in the Library and Information Studies

Department at the Florida State University, and holds a

master's degree in library sciences. No instructions

regarding style or length were given the expert; he was

just asked to write indicative abstracts of the passages,

and he was never shown the comprehension questions. The

abstracts ranged in length from 85 to 100 words, and appear

to be of very good quality.

Extract Treatments

The extracts of the passages were prepared according

to the algorithm described below. For the short extract, a

stopping rule of 100 words was chosen, while 200 words was

chosen as the stopping rule for the long extract. In the

algorithm, once an indicated sentence is selected the

85

algorithm continues to select sentences which are related

to that sentence through exophoric references, without

checking the stopping parameter until all exophora are

resolved. Thus, the length of the extracts will always be

greater than or equal to the stopping criteria, and may be

considerably greater if there are a lot of exophoric

references to resolve. This was the case with one of the

passages chosen; the lengths of the short extracts were

198, 108, 125, and 103 words. For the long extracts, based

on the stopping rule of 200 words, the lengths were 235,

204, 228, and 212 words. Note that the short extracts were

totally contained within the long extracts, since the long

extracts were just a continuation of the algorithm past the

stopping point used for the short extracts.

As can be seen, one of the short extracts was quite

long as compared to the others. This was a result of

linking all exophoric references in the selected parts of

the passage, as described below. Since this short extract

was almost as long as the long extract for that passage, it

may have influenced the experimental results in such a way

as to obscure the differences between the long and short

extract treatments which might otherwise have been

observed. However, since the passages were randomly

selected and the algorithm applied strictly as developed on

86

the test passages, the treatment passage was left as it was

generated by the algorithm.

An Extracting Algorithm

The extracting algorithm which was tested as a part of

this research is based in part on the work of the ADAM

system (Pollock and Zamora, 1975), in part on the work of

Edmunson (1969), and in part on the work of Paice (1981).

The ADAM system was designed to rely on a word control

list (WCL). The WCL primarily contained negative terms,

terms that if they were present in a sentence, would

indicate that the sentence was less likely to be a good

candidate for extraction. Examples would be a phrase such

as "for instance," or the words "perhaps," "possible," or

"we cannot." A smaller number of words from the WCL were

positive, and increased the likelihood of a sentence being

selected. Examples would be phrases such as "it was

found," or the word "results" or "conclusion." A review of

the WCL portions given in the published papers (some 70

words and phrases) allows us to draw some general

conclusions about the nature of the WCL used by ADAM: (1)

the system was targeted toward journal articles, therefore

the key words and phrases reflect styles used in scientific

literature; (2) negative words were those which seemed to

imply a sense of qualification or hedging, a statement of

87

obvious facts or previously known research, and other types

of words which are characteristic of the peripheral topics

that often accompany journal articles; (3) positive words

were those which were characteristic of the statements of

findings or of the intent of the research. It was a simple

matter to construct a small test WCL by selecting similar

words from the trial passages which were used to tune the

algorithm (as described below) and adding them to the words

given in the published reports on ADAM.

Another feature of the ADAM system which was adapted

in this research was a technique for the removal of

parenthetic material. The rule is this: if a pair of

commas occurs in a sentence, and the second comma is

followed by a verb or verb form or by an infinitive, then

the material between the commas is deemed parenthetical and

can be deleted from the extract.

The Edmunson research contributed to this research in

the following way. Edmunson tested a number of methods

that had been suggested for generating extracts and found

that a combination of methods worked best. Following

Edmunson's lead, a simple summate of functions was

developed which can be parameterized (and easily adjusted),

and which includes terms reflecting all the approaches that

have been shown to be useful in selecting sentences and

phrases from a document.

88

Paice presented an approach which built an extract

based on the selection of indicative phrases. His approach

spends considerable effort identifying key phrases in a

journal article which describe the purpose or intent of the

research. For the algorithm used in this research, it was

felt that since the nature of the texts to be extracted in

the experiment and subsequently in a business document base

was not at all similar to journal publications, we could

not expect to find such indicative phrases. However, Paice

does describe a useful approach to resolving exophora which

was adopted in the present research. Exophora are words

within a sentence which require reference to a prior or

following sentence for resolution; the most common example

of this is a pronoun which has its antecedent in a previous

text unit. Exophora present in a sentence which has been

indicated as a good candidate for inclusion in an extract

imply that the sentences which are referred to should also

be included, so that the exophoric reference is not left

unresolved in the final product.

The basic structure of the extracting algorithm used

in this research consists of three steps. These are: (1)

remove parenthetic material, (2) determine the sentence

weights for the remaining material, and (3) build the

extract by selecting the highest weighted sentences and

89

resolving the exophoric references. Each of these three

steps will be discussed in turn.

The algorithm was developed and first tested on four

documents which were used in a previous research study

(Kasper and Morris, 1988). These four documents were

sample reading comprehension tests used for the GMAT

(Educational Testing Services, 1984). Once the parameters

and procedures were worked out, four passages were randomly

selected from a more recent version of sample GMAT reading

comprehension tests (Educational Testing Service, 1986) and

the algorithm was used to produce extracts for those

passages. These latter passages and extracts were used in

the experiment.

The first step in the procedure is to remove

parenthetic material from the text. Three steps were

required to do this. First, all text contained within

parentheses or between a pair of dashes was removed.

Secondly, using the rule taken from the ADAM system

described above, any material contained within a pair of

commas where the second comma was followed immediately by a

verb, verb form, or by an infinitive, was assumed to be

parenthetic and was deleted. Third, "padding" expressions

were deleted. Examples of padding expressions included "In

fact," "Indeed," "Of course," "In any case," etc. The

90

effect of the first step was to reduce the document length

about five to ten percent.

The second step used in the procedure was to determine

a weight for each sentence. The weight used is a simple

summate of variables, each with a weighting factor. The

weighting factors reside in the algorithm as parameters,

which can easily be adjusted to different levels to achieve

the desired effect. The variables used in the summate were

(1) the number of words in the sentence which are title

words (non-trivial words in the title of the passage), (2)

counts of the words in the sentence that were in each of

the categories of the word control list, (3) two indicator

(zero-one) variables which designated a sentence as being

either the first or the last sentence in a paragraph, and

(4) a count of the high-frequency non-trivial words

contained in the sentence (the definition of high frequency

is also a parameter).

A matrix was constructed for each passage, which

consisted of the values of each variable for each sentence,

and the vector of parameters then applied. The result is a

vector of sentence weights. Working with the test

passages, the parameters were adjusted to produce a

reasonable extract; however, little effort was spent trying

to optimize the parameters as only two iterations were

performed. The vector of parameters used consisted of the

91

following values: the weight for title words was 2.5, the

weight for first position in a paragraph was 1.5, for last

position 0.5, the weight for the count of very positive WCL

words was 2.0, for positive WCL words 1.5, for very

negative WCL words -1.5, for negative WCL words -1.0, and

for high frequency words 0.1. A word was deemed to be a

high-frequency word if it occurred more than once per

hundred words of text. These parameters worked well with

the GMAT passages used in this research. For other

document sets, different parameter values may be more

successful.

The third step in the extract procedure was to build

the extract. This was accomplished by (1) selecting the

highest weighted sentence; (2) checking to see if that

sentence had any exophoric references, and if so, adding

the indicated sentences; (3) checking the sentence

following the selected sentence to see if that sentence had

exophoric references back to the selected sentence, and if

so, adding that sentence; (4) iterating (2) and (3) on the

newly selected sentences until all exophora were resolved;

(5) counting the number of words in the set of sentences

selected thus far and comparing the total to a stopping

criteria. If the total were less than the stopping

criteria, the process began again with step (1), selecting

the next highest weight among the unselected sentences.

92

The stopping criteria is another parameter; for the

purposes of this research two values were tested, 100 and

200 words. The rules for exophora presented by Paice

(1981) were followed without modification or exception.

The process generated extracts which appeared

reasonable. The use of the exophora-resolution procedure

was especially important in terms of making the extracts

readable. Copies of the extracts which were used in the

experimental study are presented in appendix A.

Experiment Design

A blocked design was used where each subject received

all treatments. This design takes advantage of the good

statistical power available through the use of repeated

measures (Horton, 1978). The order of the treatments was

assigned in such a way as to allow each of the 24 possible

combinations to be used once. This minimizes any possible

learning effect. In order to isolate the message

difference effect, four passages were used for each

subject and assigned across treatments. Combining the

passage and the treatment orderings results in the

treatment pairings. A complete listing of the treatment

and passage pairings is given in Table 4.1.

Table 4.1. Order of treatment and passage pairs in the experimental design.

93

SUBJ

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

POSITION

Full Text.

Full Text,

Full Text,

Full Text,

Full Text,

Full Text.

Abstract.

Abstract,

Abstract.

Abstract,

Abstract,

Abstract,

Short Ext.

Short Ext.

Short Ext.

Short Ext.

Short Ext.

Short Ext.

Long Ext..

Long Ext.,

Long Ext.,

Long Ext.,

Long Ext.,

Long Ext.,

1

0

C

B

A

A

D

D

C

B

A

B

C

, D

. C

, B

, A

. B

. C

D

C

B

A

D

A

POSITION

Abstract,

Abstract,

Short Ext.,

Short Ext..

Long Ext..

Long Ext.,

Full Text,

Full Text,

Short Ext.,

Short Ext.,

Long Ext.,

Long Ext.,

Full Text,

Full Text.

Abstract,

Abstract.

Long Ext..

Long Ext.,

Full Text,

Full Text,

Abstract,

Abstract,

Short Ext.

Short Ext.

2

C

0

A

B

D

A

C

D

A

B

C

B

B

A

D

C

C

B

B

A

D

C

, A

, D

POSITION

Short Ext.,

Long Ext..

Abstract.

Long Ext..

Abstract.

Short Ext..

Short Ext..

Long Ext..

Full Text,

Long Ext.,

Full Text,

Short Ext.,

Abstract.

Long Ext..

Full Text.

Long Ext..

Full Text,

Abstract.

Abstract,

Short Ext.

Full Text.

Short Ext.

Full Text.

Abstract.

3

B

A

D

C

B

C

A

B

C

0

A

. D

C

D

A

B

D

A

A

. B

C

. D

B

C

POSITION

Long Ext.,

Short Ext..

Long Ext..

Abstract.

Short Ext.,

Abstract,

Long Ext.,

Short Ext.,

Long Ext.,

Full Text,

Short Ext..

Full Text.

Long Ext..

Abstract.

Long Ext..

Full Text.

Abstract,

Full Text.

Short Ext.

Abstract.

Short Ext.

Full Text.

Abstract.

Full Text.

4 A

B

C

D

C

B

B

A

D

C

D

A

A

B

C

D

A

D

. C

D

. A

B

C

B

94

Dependent Variables

The primary dependent variable was the subject's score

(percentage correct) on the multiple choice questions given

following each passage. This served as an estimate of the

comprehension of the material in the text.

A second dependent variable was the time spent reading

the passages, which was measured by the computer program.

It was expected that the subjects would spend less time

reading the shorter treatments.

An important consideration in this study is the

reaction of the subjects to the extracts, which may appear

disjointed and distracting to read. As was mentioned

above, this is one possible reason why the techniques of

automatic abstracting have not been adopted by the

secondary source database industry. To measure the

attitude of the subjects concerning the difficulty of the

treatment text to read, an unlimited response direct

judgment scale was used (Green and Tull, 1978). The scale

was anchored by the terms "VERY DIFFICULT TO READ" and

"VERY EASY TO READ." There were four scales for the

subject's response on one sheet of paper, with two example

scales and some brief instructions. The subjects marked

the reading difficulty scale immediately after reading each

passage. This procedure allowed the subjects to rate each

passage relative to the others.

95

The fourth dependent variable was measured

subjectively in a similar manner. This was an unlimited

direct response scale with the anchors, "LITTLE OR NO

INFORMATION AVAILABLE," and "ALL INFORMATION AVAILABLE."

These scales, four on one page with two example scales as

in the case of the reading difficulty scales, were

administered to the subjects after completing the questions

for each passage, before moving on to the next passage.

The scale was intended to measure the subjects impression

of how much of the information needed to answer the

questions was included in the treatment passage.

Subjects

Subjects for the experiment were twenty-four self-

selected graduate and upper-division undergraduate students

from the College of Business at Florida State University.

The subjects were appropriate for the task, which was to

read and comprehend a text passage presented on a computer

screen, and to then answer some multiple choice questions

about the passages. The GMAT test is normally administered

to the same population of students. A cash stipend of five

dollars was paid to each participant, and an award of

twenty-five dollars for the highest score on the

comprehension test was offered; ten dollars was offered to

the subjects which had the second and third highest scores.

96

The use of cash awards was intended to motivate the

subjects to do their best on the questions, which were in

some cases fairly difficult.

The average age of the subjects was 22 years, and most

had previous employment experience. In addition, all the

subjects indicated that they had at least moderate

experience with computers or computer terminals.

Procedures

The experiment was administered in a student

microcomputer lab, which had a room that could be reserved

for this purpose. This offered several advantages, in that

the subjects were familiar with the setting and equipment,

and yet by reserving the room the subjects were able to

participate in the experiment with a minimum of

distractions.

Each subject took all four treatments in a single

session, without having to get up or leave the

microcomputer at which he or she was seated. At the

beginning of the experiment, subjects completed a brief

questionnaire designed to collect biographic and background

information relevant to the study. The researcher then

started the computer program for the subject, entering the

sequence number described above which determined the

treatment-passage pairs and their order for each subject.

97

After that, the computer program was self-explanatory.

When the subject completed reading each passage, the

program instructed them to ask for the questions for that

passage from the proctor (the researcher). Using the

program in this way, the experiment was administered to

several subjects at the same time without any difficulties

or confusion. The order of administration for each

treatment was, in every case, as follows: (1) read the

passage, (2) mark the reading difficulty scale, (3) take

the comprehension test, and (4) mark the information

availability scale. After the conclusion of the four

passages, the subjects were spoken to briefly and given an

opportunity to comment on the experiment (only one comment

was made with any regularity, that the passages were

difficult and a bit boring).

Summary of Experimental Methodology

The purpose of the experiment described in this

chapter is to test the effectiveness of the computer-based

extracting and abstracting techniques which are used in the

system described above. The results of this experiment

provide direction for the development of the system. If

the extract treatments are not significantly less effective

in terms of the comprehension of the information in the

98

texts, then the model system presented above can use

existing techniques to condense text-based information.

The experiment uses a repeated measures design with

four treatments, and a general linear model was presented

for analysis of the data. The treatments consist of

reading comprehension tests derived from a sample

standardized test. The test passages were presented on a

microcomputer terminal through a custom interface which

looked like an electronic mail system, and the subjects

were business students. Formal research hypotheses were

developed for each of four dependent variables:

comprehension score, reading time, subjective reading

difficulty rating, and subjective information availability

rating. In addition, simultaneous confidence intervals for

the mean differences for all pairwise combinations of

treatments for each of the dependent variables are of

interest.

CHAPTER V

ANALYSIS

This chapter presents the analysis of the data from

the experiment described in the preceding chapter. In the

first part of this chapter, there is a discussion of the

overall approach used in the analysis and a brief

discussion of some simple descriptive statistics. The

second section discusses the analysis of the comprehension

scores. Following that, discussions of the hypotheses

related to the reading time, reading difficulty, and

information availability variables are presented. The last

section of this chapter summarizes the analysis and

hypothesis test results.

Overview of Analysis

The data were analyzed using the SAS software package,

running on an IBM 3090 computer under the MVS/XA operating

system in batch mode. In agreement with the research model

presented above, the analysis in most cases used the

general linear model paradigm to describe the effects

observed in the experiment. In SAS, the GLM procedure is

used to examine general linear models (SAS Institute, Inc.,

99

100

1985). For each of the four hypotheses listed above, the

model

Y(ijk) = u + a(i) + b(j) + c(k) + e(ijk)

was examined, where Y(ijk) is the observed value of the

dependent variable (a different variable for each of the

four hypotheses); u is the overall mean of Y, an unknown

constant; a(i) is the subject effect, where i varies from 1

to 24; b(j) is the passage effect, where j varies from 1 to

4; c(k) is is the treatment effect, where k varies from 1

to 4; and e(ijk) is the random error associated with each

of the 96 observations. For each model, there are 95 total

degrees of freedom, 66 error degrees of freedom, and 29

model degrees of freedom. Four analysis of variance tables

are presented in the following sections, which include the

standard summary statistics as well as type III sum of

squares tests of the main effects.

For each of the four hypotheses, the six possible

pairwise comparisons of the treatment means are of

interest. To control the type one error rate, the pairwise

comparisons were examined simultaneously by constructing

Bonferroni confidence intervals. The Bonferroni method

uses critical values from the Student's t distribution,

dividing the overall alpha-level of the confidence

intervals (in this case .05) by the number of comparisons

101

to be made (in this case six). The Bonferroni method is

discussed in Johnson and Wichern (1984, p. 197) and in SAS

Institute, Inc. (1985, p. 470).

Analysis of the residual errors (differences between

the predicted and observed values of Y) is used to examine

the equal variances assumption which underlies the general

linear model paradigm. In addition, models on the ranks of

the dependent variables (Conover and Iman, 1976; Conover,

1980, p. 236) were tested as a check for the presence of

serious outliers.

Table 5.1 presents the sample means and standard

errors for each of the four dependent variables measured in

the experiment. The values are printed by passage and by

treatment, as well as for the total dataset. Appendix B

contains additional tables which present the raw data in

more detail.

The dependent variable associated with the first

hypothesis is comprehension score. Each comprehension

score was determined by summing the correct answers on the

comprehension test and dividing by eight, the number of

questions per test. Thus the variable is discrete, with

the set S of all possible values given as

S = {.000,.125,.250,.375,.500,.625,.750,.875,1.00}.

102

Table 5.1. Sample means and standard errors by treatment and by passage for four dependent variables.

Passage A Passage B Passage C Passage D

Abstract Full Text Long Extr. Short Extr.

Comprehension Score

mean

.500

.620

.693

.521

.641

.620

.547

.526

s.e.

.043

.038

.034

.040

.034

.041

.039

.049

Reading Time

mean

239.04 199.79 168.54 167.46

101.42 342.83 184.46 146.13

s.

26, 35. 16. 19.

7, 28. 15. 14.

.e.

.17

.46

.98

.94

.79

.47

.86

.64

Total 583 023 193.71 12.96

Reading Difficulty

Information Availability

mean s.e mean s.e.

Passage A Passage B Passage C Passage D

Abstract Full Text Long Extr. Short Extr

504 361 263 325

211 459 423 360

.050

.050

.045

.048

.030

.046

.054

.056

.468

.452

.519

.446

.368

.656

.537

.323

.044

.053

.041

.052

.041

.041

.043

.032

Total 363 .025 .471 024

103

The discrete nature of the dependent variable raises

questions concerning the equal variance assumption,

particularly if the data were skewed either toward the top

or bottom of the distribution. However, as can be seen

from Table 5.1, the mean values were close to the midpoint

of the Y range, and the variability was not excessive.

Table 5.2 shows frequency tables for the comprehension

scores by passage and by treatment. In both cases it

appears that there is some skewness to the right, although

not severe. Note that in none of the observations did any

subject fail to get at least one question right, nor was

any subject able to achieve a perfect score on any of the

eight-question tests.

The other three dependent variables are continuous

variables; however, they were measured discretely. For

example, the computer program described in the previous

chapter measured the time (a continuous variable) from the

start of each treatment reading session to the time the

subject signaled they were finished reading the passage (by

typing the "End" key). Each measurement was rounded off to

the nearest second (i.e., the resulting values are discrete

integers). Likewise, for the reading difficulty scales and

the information availability scales, distance along a scale

is continuous, but for practical reasons the measurements

were rounded to the nearest sixteenth of an inch. To

104

Table 5.2. Frequency counts for comprehension score results.

Score

Passage

A

B

C

D

Total

0, .125

1

1

0

1

3

0. .250

5

0

1

2

8

0, .375

4

2

2

5

13

0. .500

4

5

1

7

17

0, .625

4

9

4

4

21

0, .750

5

2

11

3

21

0, .875

1

5

5

2

13

Score

Treatment

Abstract

Full Text

Long Ext.

Short Ext.

0. .125

0

0

0

3

0. .250

1

2

2

3

0. .375

2

2

7

2

0. ,500

4

6

4

3

0. ,625

6

5

4

6

0. .750

8

3

5

5

0.875

3

6

2

2

Total 8 13 17 21 21 13

105

obtain the actual value for the difficulty scale and the

information availability scale used in the analysis, the

inches from the left of the scale to the subject's mark (to

the nearest sixteenth inch) was divided by 6.5, the total

length of the scale in inches. This ratio represents the

proportion of the distance along the scale. For example,

the left-hand anchor for the readability scale is "VERY

EASY TO READ"; thus a score of .010 (the smallest value

observed) represents a passage which the subject felt was

very readable, while a score of .913 (the largest

observation) represents a passage which the subject felt

was very difficult. Likewise, for the information

availability scale the minimum observation of .096

represents the subject's view that little or no information

was available to answer the questions, while the maximum

response value of .962 indicates the subject felt that

almost all of the needed information was contained in the

passage.

An additional measurement was taken during the

experiment: the number and value of the keys pressed by

the subjects, as well as the time (in seconds) from the

start of each reading session to the moment of each

keystroke were recorded. By examining these data, it was

possible to determine which subjects had taken the

opportunity to review the passages before asking for the

106

test questions. Recall that in a previous study of text-

based information systems, which did not allow the subjects

to review the text in the electronic mail treatment, many

of the subjects observed that they would like to have

reviewed the passages before answering the test questions

(Kasper and Morris, 1988). The results here confirm this:

twenty-one of the twenty-four subjects used the "PgUp" key

and reviewed the passages. Of these, all but two took

substantial time to read over the passage; in the two

exceptions the subjects appeared to glance back briefly and

then move on. Of the twenty-four subjects, only three did

not use the "PgUp" key at all. Since such a small group of

subjects failed to review the passages, we made no attempt

to differentiate population characteristics of subjects who

review from those who do not. Furthermore, little benefit

could be gained by such analysis; the obvious conclusion is

that any text-based information system should provide the

ability to review and re-read a document.

Analysis Related to Comprehension

This section contains the analysis for the first

dependent variable, comprehension score, and presents the

results for hypotheses one. This section also contains

confidence intervals for the six pairwise comparisons of

the difference in mean comprehension scores by treatment.

107

Main Effects Analysis

The primary research hypotheses, stated in null form,

is that there is no effect on comprehension score due to

the treatments after removing the effects due to subject

and passage. Table 5.3 presents the analysis of variance

table, along with the type III sum of squares tests for the

main effects. As is readily observed from the table, both

the subject and passage had a highly significant effect on

comprehension score, but the effect due to the treatment

was not significant. Thus we do not reject hypothesis one.

This implies that there was no significant reduction in

comprehension in the different treatments; i.e., the

extracting algorithms appear to have produced extracts that

were sufficiently informative such that the subjects were

able to answer the comprehension questions as well as they

did in the other treatments.

However, we expect that there really is a reduction of

information in the extract treatments, since the extracts

consist of sets of sentences selected from the original

texts. And in fact, the p-value of .0632 for the type III

sum of squares test of the treatment effect suggests that

an effect may exist, even though this analysis does not

have enough power to detect it. Further analysis of the

nature of possible treatment differences is presented in

108

Table 5.3. score.

Main effects analysis for comprehension

SOURCE

MODEL

ERROR

CORRECTED TOTAL

DF

29

66

95

SUM OF SQUARES

2.017

1.910

3.927

MEAN SQUARE

0.0695

0.0289

F VALUE

2.40

PR > F

0.0017

R-SQUARE C.V. ROOT MSE Y MEAN

0.5136 29.1639 0.1701 0.5833

SOURCE

SUBJECT

PASSAGE

TREATMENT

DF

23

3

3

TYPE III SS

1.2161

0.5794

0.2214

F VALUE

1.83

6.67

2.55

PR > F

0.0300

0.0005

0.0632

109

the next subsection, the discussion of multiple comparisons

of pairwise differences.

A model of the main effects with the inclusion of the

passage-treatment interaction term was also examined, as

well as models which included the treatment position effect

(whether the treatment was the subject's first, second,

third, or fourth passage), both with and without

interaction terms. In all of these models, the treatment

effect remained insignificant (p-value > .05) and none of

the additional terms contributed significantly to the

model. Also, the rank transformation model was examined,

the results of which were very similar to the main effects

model in Table 5.3 (the p-value for the treatment effect in

the rank transformation model was .10), implying that there

is no serious problem with outliers.

To check the main effects model of Table 5.3 for

constant error variance, the plot of residuals versus

predicted values was prepared. This is presented in Figure

5.1. We would expect, given (1) the discrete character of

the data, (2) the fact that the frequency table showed a

slight skew to the right, and (3) the upper bound of the

data, that there may be a reduction in error variance as

the predicted values approach their upper bound. However,

while some evidence of a reduction in error variance

appears at the right-hand side of the plot, for the most

0.3 *

110

0.2 •

O.l •

0.0 • R E S I D -O.l • U A L S

-0.2 •

-0.3 •

-0.4 •

-0.5 • I

A A

A

A AA

A

A

AA A A

AA

ABC A AB

B A A

A A

AA -AA-

A A A A AA

AA A A

A A

A

0.2S 0.30 — + ...i— 0.35 0.40

• • 4 i • + —• — — + —

0.43 0.50 O.SS O.&O 0.65 0.70 0.75 0.80

PREDICTED VALUE

4 4 — + — 0.85 0.?0 0.95

Figure 5.1. Plot of residual errors versus predicted values for comprehension score model. Legend is A = 1 observation, B = 2 observations, etc.

Ill

part there seems to be a fairly even distribution of the

errors, implying that there is not a serious problem with

the assumption of constant variance. Note that the

diagonal lines apparent in the plot are characteristic of

discrete data where there are few values in the set of

possible values.

Multiple Comparisons of Means

To examine the differences between the treatment

means, Bonferroni confidence intervals were prepared.

These are presented in Table 5.4, and are in agreement with

the overall conclusion of hypothesis one; that is, none of

the differences between treatments means is significantly

different than zero, since the simultaneous confidence

intervals all contain the zero value.

In spite of the fact that none of the comparisons is

significantly different than zero, the a priori realization

that a reduction in comprehension score is expected due to

the reduction in the amount of text presented in the

different treatments must be recalled. Bonferroni

comparisons are conservative; the true alpha-level is

something less than .05 and the stated intervals are

slightly wider as a result (of course, the intervals are

also wider than intervals not adjusted for multiple

comparisons). A simple unadjusted pairwise t-test between

112

Table 5.4. Bonferroni 95% simultaneous confidence intervals for six pairwise comparisons for treatment comprehension score means. DF are 66, mean square error is .02894, critical value for t is 2.7201, and minimum significant difference is .1336.

Treatment Comparison

Lower Limit

Difference Between Means

Upper Limit

Abstract - Full Text -.113 .021 .154

Abstract - Long Ext.

Abstract - Short Ext.

Full Text - Long Ext.

Full Text - Short Ext

Long Ext. - Short Ext

040

019

061

040

113

.094

.115

.073

.094

.021

.227

.248

.207

.227

.154

113

the abstract treatment (which had the highest treatment

mean) and the short extract treatment (the lowest treatment

mean) had a p-value of .0227 (none of the other unadjusted

pairwise t-tests for mean difference between treatments had

p-values less than .05).

However, it can be stated with at least 95% confidence

that each of these intervals contains the difference

between the population means for the treatments. Thus

subjects reading automatic extracts of short text passages

can be expected to perform as well, better than, or (as a

worst case) no more than 22.7% less than subjects reading

the full text (in terms of comprehension). This is in

spite of the fact that the reduction in length averaged

51.9% for the long extracts and 71.0% for the short

extracts.

Influential Test Items

Table 5.5 presents a listing of the number of correct

responses for each test item by treatment. Examining these

data, a clearer picture of the (insignificant) difference

due to the treatments emerges. For example, items A.l

through A.4 have correct response totals that are evenly

distributed across the treatments. Such items would have

influenced the analysis toward the conclusion of no

significant treatment effect. On the other hand, items A.5

114

Table 5.5: Correct responses to individual test items by treatment. The maximum possible score for each cell is 6; the maximum possible total for each test item is 24.

Abstract Full Text Long Extract Short Extract Total 16

Results For Passage A

Al

0 1 1 0

A2

4 4 4 4

A3

6 5 5 6

A4

3 3 3 3

A5

3 4 0 1

A6

2 4 3 4

A7

5 4 1 3

A8

2 1 4 3

Total

25 26 21 24

22 12 8 13 13 10 96

Results For Passage B

Abstract Full Text Long Extract Short Extract Total

Bl

6 6 6 5

23

B2

4 5 4 5 18

B3

5 6 5 4 20

B4

6 4 4 4 18

B5

4 3 2 3 12

B6

0 1 1 2 4

B7

2 5 3 3 13

B8

4 2 4 1 11

Total

31 32 29 27 119

Results For Passage C


CI

5 6 6 6

23

C2

3 3 5 3 14

C3

5 5 5 5

20

C4

0 1 1 1 3

C5

6 6 5 5

22

C6

4 4 1 2 11

C7

6 5 6 6

23

C8

5 5 4 3 17

Total

34 35 33 31 133

Results For Passage D


Dl

6 5 4 5

20

D2

6 2 4 3 15

D3

3 3 0 1 7

D4

6 5 5 2 18

D5

1 2 0 2 5

D6

5 6 3 4 18

D7

4 3 5 2 14

D8

2 0 1 0 3

Total

33 26 22 19 100

115

and A.7 are skewed toward the full text and abstract

treatments with fewer correct responses in the extract

treatments, influencing the analysis in the opposite

direction. In this subsection, an anecdotal analysis of

selected items is given to further explain the subjects'

performance relative to the different treatments.

Since each question had five multiple choice answers,

those test items in which there were fewer than five or six

correct responses among the twenty-four subjects,

especially where the correct responses were spread across

treatments, indicate little about the results other than

that those test items were difficult. If the probability

of a correct answer were one out of five (a random guess),

than the expected number of correct responses for each test

item by treatment would be 1.2. The expected total for

each item over all treatments would be 4.8 correct

responses, while the expected total of each eight-question

test by treatment would be 9.6, and the expected total over

all treatments would be 38.4. A few test items (e.g., A.l,

B.6, C.4, and D.8) were so difficult that subjects could do

no better than if they had merely guessed the answer

blindly (each had four or fewer correct responses).

However, all of the test totals by treatment were well

above the 9.6 figure, indicating that on average, subjects

in all treatments were able to do better than random

116

guessing. To prove this, a simple one-sample t-test of the

null hypothesis that the mean of the population treatment

totals is equal to 9.6, with sigma unknown and 15 degrees

of freedom, gives a test-statistic of 22.55, which has a p-

value less than .0001 (Conover and Iman, 1983, p.237).

In fact, many items had a large number of correct

responses, and these were often evenly spread across

treatments. Examples include A.2, A.3, B.l through B.5,

and others. Many of these questions were general in

nature, tending to require the subjects to have a good

understanding of the primary meaning or thrust of the

passage. For example, question A.2 asks for "the primary

purpose of the passage," question B.l is designed to see if

the subject can determine what the "passage is most

probably an excerpt from," and question C.7 asks the

subject to identify the source the "passage most likely

appeared in." A prima facie examination of those items in

which all treatments had roughly the same number of correct

responses shows that most of the questions require a good

understanding of the overall meaning and purpose of the

passages, a "feel" for the intent and motivation of the

original authors.

Other questions, however, exhibited a greater range of

response totals among the treatment groups. For example,

four subjects in both the full text and abstract treatments

117

responded correctly to question C.6, but there were only

three correct responses in the long and short extract

treatments combined. A correct response to this question

required that the reader understand the financial

statistics mentioned in the second paragraph of the full

text treatment. These statistics were excluded from the

extracts, and do not contribute much to the overall

understanding of the passage; however, the author of the

abstracts included them. As a result, subjects in the

extract treatments did not do well relative to the full

text and abstract treatments for that question. The

omission of the statistics from the extracts did not

seriously hurt the overall performance of the subjects on

the passage C tests, however, as can be seen from the

treatment totals.

Other items also appear to depend upon the presence or

absence of one or two phrases in the text, which may or may

not have been included in the extracts. Examples would

include A.5, A.7, B.7, B.8, C.8, D.2, D.3, D.4, D.7, and

D.8. The reader may examine the instruments used in the

experiment to confirm this; they are presented in Appendix

A. For example, a correct response to item A.5 depended on

the subject's realization that writers use injustices to

elicit sympathy and support for the victims in the minds of

their readers. The phrase from the full text that provides

118

the most information for the correct response is that

certain authors "often enlist their readers on the side of

their tragic heroines by describing injustices so cruel

that readers cannot but join in protest." The author of

the abstract chose to include a modified version of this

sentence, but neither of the extracts contained it. The

results, which included seven correct responses for the

abstract and full text treatments, and only one correct

response in the extract treatments combined, were not

surprising.

Each of the other items listed above which appeared to

have a clear difference among treatments can also be shown

to be dependent on relatively unimportant phrases contained

in the text. The fact that these items played an important

role in the analysis of variance in the model is obvious,

but to make this point clear consider the following: if we

remove a single item, item A.5, from the computation of the

scores and run the model on the resulting data, the p-value

for the type III sum of squares test for the treatment

effect becomes .1961, where before (with all items) it was

.0632. In addition, the pairwise t-test for the difference

between the abstract and short extract treatment means

(unadjusted for multiple comparisons) mentioned above which

has a .0227 p-value, becomes insignificant with the removal

of item A.5 (p-value greater than .0548). Further, if just

119

three of the thirty-two items (A.5, A.7, and D.4) are

removed, then the treatment effect sum of squares test has

a p-value of .4961, while the same unadjusted pairwise t-

test has a p-value of .1660. While it is not valid to

arbitrarily remove the most influential items from the

analysis, this was done here merely to make the point that

a few items, for which the subjects had little information

in the extract treatments, and which relied upon

information that one would not necessarily expect to find

in an abstract or extract, accounted for a large portion of

the treatment effect variance as presented in Table 5.3.

Analysis Related to Reading Time

In this section the hypothesis related to the second

dependent variable is examined, along with the multiple

comparisons of treatment means. The dependent variable is

reading time, the amount of time in seconds from the moment

the subject signaled the computer to present the next

passage to the moment they indicated they were finished

reading by typing the "End" key. As discussed above, the

computer program used to present the passages to the

subjects recorded this variable without their knowledge.


We anticipate a highly significant treatment effect

for reading time, since the length of the extracts and the

120

abstracts was dramatically shorter than the full text in

each case. It is obvious that one of the key purposes of a

system which delivers abstracts of documents to interested

recipients (regardless of the extent to which the system is

or is not computer-based) is defeated if recipients do not

save time by reading abstracts as opposed to complete

documents.

Table 5.6 presents the analysis of variance for the

main effects model with reading time as the dependent

variable, along with the type III sum of squares test for

the main effects. As expected, there is a very significant

treatment effect, and we can soundly reject null hypotheses

two, that there is no effect due to treatment on reading

time after the effect due to subject and passage is

removed.

The ordering of the treatment means for reading time

(refer to Table 5.1) is also as expected, in that the full

text treatment took the most time, followed by the long

extract, the short extract, and lastly the abstract

treatment, which had a treatment mean of less than one

third the time of the full text treatment mean.

As in the case of the comprehension score model,

reading time models which included feasible interaction

terms and the effect of treatment position were examined

with similar results; that is, none of the other variables

121

Table 5.6. Main effects analysis for reading time.

SOURCE

MODEL

ERROR

DF SUM OF SQUARES

MEAN SQUARE

29 1,217,093.42 41,968.74

66 315,444.42 4,779.46

CORRECTED TOTAL 95 1,532,537.83

F VALUE

8.78

PR > F

0.0001

R-SQUARE C.V, ROOT MSE Y MEAN

0.7942 35.6896 69.1336 193.7083

SOURCE

SUBJECT

PASSAGE

TREATMENT

DF

23

3

3 7,

TYPE III SS

340,606.33

81,949.00

,945,338.08

F VALUE

3.10

5.72

55.41

PR > F

0.0002

0.0015

0.0001

122

changed the above analysis, nor did they contribute

significantly to the model.

The constant variance assumption for the analysis in

Table 5.6 was examined as before by preparing a plot of

residuals versus the predicted values. This plot is

presented in Figure 5.2. There does appear to be an

outward funnel shape to the plot, which implies that the

assumption of constant variance is violated. The plot also

shows that there were a few outliers in this data, subjects

who spent a much longer time reading the passages than did

the rest of the sample. However, the effect of the

heteroscedasticity and the outliers should not change the

overall results of the analysis, especially since the

evidence for the treatment effect is so strong, and is in

agreement with prior expectations. It is not the intent of

this model to predict reading time as a function of

treatment, but merely to confirm the expectation that a

reduction in length of text will reduce the amount of time

spent reading that text.


To consider the differences between the treatment mean

reading times. Table 5.7 presents Bonferroni simultaneous

confidence intervals for the six pairwise comparisons.

Those intervals which do not contain zero indicate a

123

300 *

250 •

200 •

150 • R E S I 100 • D U A L 50 • A A A S ! A A BAA A A

A B A A A A A A A A A A A A

A A AA A BA

- 5 0 •

-100 •

-150 •

•200 • -• 4 • • •

A AA A A A A A A

A B A A A

A

A A A

A A A A A

A A A A B A A A

AA

B A

+ 4 4 4- +. • • — - • - -•—

0 25 SO 75 100 125 150 175 200 225 250 275 300 323 350 375 400 425 450 475 500 525 550

PREDICTED VALUE

Figure 5.2. Plot of residual errors versus predicted values for reading time model. Legend is A = 1 observation, B = 2 observations, etc.

124

Table 5.7. Bonferroni 95% simultaneous confidence intervals for six pairwise comparisons for treatment reading time means. DF are 66, mean square error is 4779.46, critical value for t is 2.7201, and minimum significant difference is 54.286.


Lower Limit

295.71

137.33

-99.00

104.08

142.42

-15.96


-241.42

-83.04

-44.71

158.37

196.71

38.33

Upper Limit

-187.13

-28.75

9.58

212.66

251.00

92.62

Abstract - Full Text


Abstract - Short Ext




125

significant difference between treatment means; thus, we

have at least 95% simultaneous confidence that there is a

significant difference between the mean reading time in the

full text treatment and the mean reading time in each of

the other three treatments, as well as a significant

difference between the mean reading time in the abstract

treatment and the mean reading time in the long extract

treatment. However, there is insufficient evidence for a

significant difference between the mean reading time in the

abstract treatment as compared with the short extract

treatment, and insufficient evidence for a difference

between the mean reading time in the short extract

treatment as versus the long extract treatment.

The difference in mean reading time between the full

text treatment and each of the other three treatments can

be expressed in terms of a percentage reduction in reading

time by dividing the upper and lower limits of the

confidence intervals (for the first three rows in Table

5.7) by the estimate of the overall mean reading time in

the full text treatment, given in Table 5.1 as 342.83

seconds. We can then re-express the intervals (although

the confidence is no longer 95%): the reduction in mean

reading time in the abstract treatment over the mean

reading time in the full text treatment is between 54.6%

and 86.3%; for the short extract treatment, the reduction

126

in mean reading time is between 41.5% and 73.2%; for the

long extract treatment, between 30.4% and 62.0%. These

estimates of the percentage reduction in mean reading time

are consistent with what would be expected given the

reduction in the length of the passages. Table 5.8 gives

the number of words per passage by treatment with the

percentage of reduction in text for each of the reduced

treatments. In all cases, the percentage reduction in text

is within the intervals just stated.

Analysis Related to Reading Difficulty

Analysis of the third dependent variable, reading

difficulty, was conducted in a manner similar to that

presented in the preceding sections. Reading difficulty is

a" subjective concept, and the intent of the analysis was to

determine if there was a significant effect due to

treatment on the subject's perception of the difficulty of

the passages in the experiment. The passages were somewhat

difficult to start with, as discussed in the previous

chapter, and a concern was that the extracts may seem

disjointed or difficult to understand since they consisted

of sentences selected from different portions of the

passages.

The reading difficulty was measured by having the

subjects make a mark on a scale between two anchors. The

127

Table 5.8. Number of words and percentage reduction in text by passage.

Passage A

Passage B

Passage C

Passage D

Averages

Full Text

447

470

462

450

457

Long Extract

235

204

228

212

220

47.4

56.6

50.6

52.9

51.9

Short Extract

193

108

125

103

132

56.8

77.0

72.9

77.1

71.0

Abs

85

98

99

100

96

tract

81.0

79.1

78.6

77.8

79.1

128

left hand anchor was "VERY EASY TO READ," while the right

hand anchor was "VERY DIFFICULT TO READ." The reading

difficulty score was the percentage of the scale from the

left hand end to the subject's mark. The rank ordering of

the treatment means presented in Table 5.1 indicated that

on average, the subjects felt that the abstract was easiest

to read, followed by the short extract and the long

extract, while the full text was considered the most

difficult. This is not too surprising, since the length of

the passages might contribute to a perception of reading

difficulty. The passages were difficult, as evidenced by

the fog indices, and the more of each passage the subject

had to read, the more difficult the passage seemed.


A model was examined as in the previous analyses, with

the reading difficulty score as the dependent variable.

The results for this model are presented in Table 5.9.

There was a very significant treatment effect in the

analysis, and we can reject hypothesis three, that there is

no effect on perceived reading difficulty due to treatment

differences after removing the effect due to passage and

recipient.

Models with interaction terms and the position effect

were examined using reading difficulty as the dependent

Table 5.9. Main effects analysis for reading difficulty.

129

SOURCE DF SUM OF SQUARES

MEAN F SQUARE VALUE

MODEL

ERROR

CORRECTED TOTAL

29

66

95

3.1774

2.6798

5.8572

0.1096

0.0406

2.70

PR > F

0.0004


0.5425 55.4822 0.2015 0.3632

SOURCE

SUBJECT

PASSAGE

TREATMENT

DF

23

3

3

TYPE III SS

1.5660

0.7523

0.8591

F VALUE

1.68

6.18

7.05

PR > F

0.0534

0.0009

0.0003

130

variable, with similar results. The added variables did

not change the results of the hypothesis test, nor did they

contribute significantly to the model.

The residual versus predicted values plot for the

reading difficulty model is presented in Figure 5.3. There

is some apparent heteroscedasticity, and it appears that

there are at least two outliers. This raises some question

about the stability of the estimates, and the mean square

error is possibly inflated. Therefore, confidence

intervals based on this model (such as presented in the

next subsection) are probably wider than they would

otherwise be.


Bonferroni 95% confidence intervals for the difference

in treatment reading difficulty means were constructed.

These are presented in Table 5.10. There is evidence for

concluding that the mean reading difficulty scores were

significantly different between the full text and the

abstract treatments, and between the long extract and the

abstract treatments. However, there is no evidence of any

difference between the mean reading difficulty scores in

the full text treatment and either of the extract

treatments, nor is there evidence for a difference between

the abstract treatment and the short extract treatment.

I

.5 •

0.4 *

131

0.3 *

0.2 *

R 0.1 * E S I D 0 . 0 *• U A L S - O . l •

AA A A A

A A

A A A

AA -AA-

A A

A A

A - A - A — A A —

- 0 . 2 •

- 0 . 3 •

- 0 . 4 •

- 0 . 5 •

0 . 0 0

A A A A A A A A A A A A A A A

A A A A A A A A

A A A A A

B A A A A

A AA A

0.12 0.24 0.3i

PREDICTED VALUE

0.48 O.&O 0.72

Figure 5.3. Plot of residual errors versus predicted values for reading difficulty model. Legend is A = 1 observation, B = 2 observations, etc.

132

Table 5.10. Bonferroni 95% simultaneous confidence intervals for six pairwise comparisons for treatment reading difficulty scale means. DF are 66, mean square error is .0406, critical value for t is 2.7201, and minimum significant difference is .1582.


Lower Limit

-.406

-.370

-.307

-.122

-.060

-.096


-.248

-.212

-.149

.036

.099

.063

Upper Limit

-.089

-.053

.009

.194

.257

.221



Abstract - Short Ext.




133

Because of the problems with the stability of the model

mentioned above, however, it should be noted that the power

of these tests may be low.

Analysis of Information Availability

As in the case of the reading difficulty variable,

information availability is a subjective concept, and was

measured by asking the subjects to make a mark along a

scale. The left hand anchor for the scale was "LITTLE OR

NO INFORMATION AVAILABLE," while the right hand anchor was

"ALL INFORMATION AVAILABLE." Subjects were instructed to

mark these scales after answering the questions for each

passage in the experiment.

The treatment means for information availability

listed in Table 5.1 are not in the same order as the

results for comprehension. The order indicates that the

subjects felt they had more information in the full text

and long extract treatments than they did in the abstract

and short extract treatments. The variable appears on the

surface to be closely related to the length of the passage:

the longer the passage, the more information the subjects

felt they were getting.


Hypothesis four is that there is no effect on the

information availability score due to the treatments after

134

removing the subject and passage effects. To test this

hypothesis, a model was examined using the information

availability score as the dependent variable. The results

of this analysis are presented in Table 5.11. Based on the

type III sum of squares test, we can reject hypothesis four

and conclude that there is a significant treatment effect

on the subject's perception of information availability.

As a side note, it is interesting that the passage

effect was very insignificant in this model (p-value =

.4849). It did not make much difference which passage was

being read by the subjects, the main source of variance in

the information availability scale was due to the treatment

effect.

As was done for the three previous dependent

variables, models were examined for information

availability which included interaction terms as well as

the treatment position term. None of these showed any

difference in the treatment effect, nor were any of the

added variables significant in the model.

A residual versus predicted values plot was again

constructed, and is presented in Figure 5.4. Note that

while there is some evidence of non-constant variance

apparent in the plot, on the whole the data seem to be more

in line with the model assumptions than was true in the

case of the reading difficulty variable. Nonetheless, the

135

Table 5.11. Main effects analysis for information availability.

SOURCE DF

29

66

95

SUM OF SQUARES

3.0008

2.1454

5.1462

MEAN SQUARE

0.1035

0.0325

F VALUE

3.18

PR > F

0.0001

MODEL

ERROR

CORRECTED TOTAL


0.5831 38.2733 0.1803 0.4711

SOURCE

SUBJECT

PASSAGE

TREATMENT

DF

23

3

3

TYPE III SS

1.2155

0.0804

1.7049

F VALUE

1.63

0.82

17.48

PR > F

0.0647

0.4849

0.0001

136

0.3 *

0.2

0.1

R 0.0 E S I 0 U -0 .1 A L S

-0 .2

t

-0.3

-0.4

AA

AA

A B

A A B

A

A A A A

A A A A A B A A

A A A B A

A A A A A A

A A A A A

A AA A A A

AA

A A

-A A-

A A A A

-0.5 •

0.0& 0.12 O.IB 0.24 0.30 0.36 0.42 0.48 0.54 O.&O O. i i 0.72 0.78 0.84

PREDICTED VALUE

Figure 5.4. Plot of residual errors versus predicted values for information availability model. Legend is A = 1 observation, B = 2 observations, etc.

137

heteroscedasticity may contribute to instability in the

model, particularly with respect to the confidence

intervals.


To examine the differences in the information

availability score treatment means, Bonferroni simultaneous

95% confidence intervals were constructed, and are

presented in Table 5.12. It appears that there is a

significant difference between the mean perceived

information availability in the full text treatment versus

that in the abstract treatment, between that in the full

text versus the short extract treatments, between that in

the abstract versus the long extract treatments, and

between the short extract aind the long extract treatments.

However, there is insufficient evidence to conclude that

there is any difference in mean perceived information

availability in the full text treatment versus the long

extract treatment, and also in the short extract treatment

versus the abstract treatments.

This means that for the longer passages, the full text

and long extract treatments, the subjects felt that they

had significantly more information than they did in the

shorter treatments. These results raise questions about

the amount of confidence that subjects may or may not have

138

Table 5.12. Bonferroni 95% simultaneous confidence intervals for six pairwise comparisons for treatment information availability scale means. DF are 66, mean square error is .0325, critical value for t is 2.7201, and minimum significant difference is .1416.


Lower Limit

-.429

-.311

-.096

-.023

.191

.073


-.288

-.169

.045

.119

.333

.214

Upper Limit

-.146

-.028

.188

.260

.474

.356



Abstract - Short Ext




139

in reduced text passages. While this study did not examine

the subjects' confidence in the information as distinct

from their perception of its availability, the two

constructs would seem to be closely related. It seems

that, in spite of the fact that subjects performed as well

or better in the abstract and short extract treatments,

their perception of the information they were getting, and

thus their confidence, was lower in the reduced text

treatments. This may imply that some information users

would choose not to use an abstracting (or automatic

extracting) service if it were available to them, since

they would lack confidence in the amount of received

information. These questions remain a topic for future

research.

Summary of Analysis

Results of the four hypothesis tests are summarized

below. First, there is insufficient evidence to reject

hypothesis one, that there is no effect on comprehension

score due to the effect of treatment after removing the

effects of passage and recipient. Thus, in the data

collected in this experiment, the algorithm for generating

extracts worked well enough such that the subjects'

comprehension of the documents was not significantly

reduced. Hypothesis two was soundly rejected, and we can

140

conclude that the amount of time taken to read the extracts

and abstracts in the experimental treatments was

significantly less than that taken for the full text. This

is in agreement with our expectations, the shorter passages

should certainly take less time to read. Hypothesis three,

that there is no effect on the difficulty of reading the

passage due to treatment after removing the effect of

passage and recipient, was also rejected. The multiple

comparisons analysis for reading difficulty indicates that

the longer passages were seen as more difficult to read,

while the shorter extract and abstracts were perceived as

easier to read. There was insufficient evidence to

conclude that there was any difference between the

abstracts and the short extracts in terms of reading

difficulty. Information availability was examined, and

hypothesis four was rejected. The multiple comparisons for

the information availability variable indicate that

subjects perceived that there was greater information in

the longer treatment passages than in the shorter. Table

5.13 contains a summary of the hypotheses tests and their

interpretations.

141

Table 5.13. Summary of hypotheses tests.

Dependent Test Hyp. Variable Result Interpretation

1 Comprehension Do Not Score Reject

Comprehension not significantly reduced in extract treatments.

Reading Time

Reject Shorter treatments had shorter reading times.

3 Reading Reject Difficulty

Longer passages perceived as more difficult to read.

4 Information Reject Availability

Longer passages perceived as having more information.

CHAPTER VI

CONCLUSION

In this chapter, the implications of this research are

presented. First, the implications of the experimental

results are reviewed. Next there is a discussion of the

implications for text-based information systems, an

emerging area of considerable interest within the field of

MIS/DSS. Third, implications concerning the impact of this

research on organizational management are suggested. This

research raises a number of questions for future research,

and these are presented in the fourth subsection of this

chapter. We then present a discussion of the limitations

of the research, followed by a summary of the conclusions

and final remarks.

Implications of the Experimental Results

The analysis presented in the preceding chapter

demonstrates some interesting and useful results. First,

it appears that the comprehension scores in the experiment

were not seriously reduced in the extract treatments as

compared to the full text and abstract treatments. Thus,

we have shown that it is possible to apply a simple

computer algorithm to text and produce extracts (i.e..

142

143

pseudo-abstracts) that capture enough of the information in

the text such that comprehension of the passages is

reasonable, even with difficult passages and comprehension

questions. At the same time, the savings in terms of

reading time which would be expected due to the reduction

in text was realized. Thus there are significant benefits

that can be anticipated in terms of time savings in a

system designed to automatically condense the content of

text-based information, without a serious loss in the

comprehension of that information. These benefits can be

achieved with current technology.

The subject's impression of the difficulty of the

passages to read and understand was examined. The results

presented above did not find a difference between the full

text treatments and the extract treatments; therefore a

system for generating automatic extracts using a simple

algorithm such as presented here could be expected to

produce extracts that are at least as readable in the minds

of the recipients as are the full passages. On the other

hand, the analysis did show that subjects felt that the

expertly written abstracts were easier to read than either

the full texts or the long extracts. However, questions

concerning the quality of the model make the multiple

comparisons of readability effects less certain. It may be

that the analysis did not have enough power to detect all

144

significant differences between the treatment means for

reading difficulty.

Finally, the analysis of the subject's perception of

the information availability showed that the longer

passages (full text as well as long extract) were perceived

as containing more of the information needed to answer the

comprehension questions than the shorter passages (abstract

and short extract), even though the comprehension score

results did not agree with that assessment. As was pointed

out, this may indicate a potential problem with all

reduced-text information systems, whether the computer

generates an extract or an expert writes an abstract, in

that the users of the information may lack confidence in

the completeness of the information when the passages are

condensed.

Implications for Text-Based Information Systems

There is a growing awareness by researchers in MIS/DSS

that text-based information systems are important to the

future of the field. An example of this increased interest

can be seen by the number of papers presented at a recent

international conference on systems sciences which dealt

with the subject of document-based and text-based

communication systems (e.g., Martinez and Mohamed, 1988;

Tonge, 1988; Rau, 1988). At the same conference a special

145

task force convened to discuss the topic of document-based

decision support systems (Sprague, 1988). The research

presented in this dissertation contributes to a growing

body of work on text-based decision support systems.

Of particular importance is the evidence for the

success of an extracting (e.g., condensing) algorithm in a

non-domain specific application, using technology that is

easily within current capabilities. Much of the research

on intelligent and/or active text processing systems to

date has been toward rather esoteric artificial

intelligence approaches, which are typically restricted to

very limited domains and as such of limited use. The

system presented here can be developed using existing

technology, and has potential for application in a broader

domain.

It cannot be determined from the limited laboratory

experiment presented here the extent to which the model

system developed as part of this research will be accepted

by managerial users in a business environment. The next

step is to develop a prototype which implements the major

concepts of the text-based filtering and condensing system,

and then apply that prototype in an actual organizational

situation. Data generated in field studies will offer

opportunities to test the effectiveness of the prototype,

as well as demonstrate which features and capabilities are

146

well-received by users. The laboratory findings presented

here indicate that such field studies may well be

worthwhile.

Implications for Organizational Management

The concept of systems to support environmental

scanning and organizational communication at the strategic

decision-making level in an organization have been

discussed before, but little has been done to implement

such systems. The techniques presented here offer an

approach that could significantly change the activity of

managers in these important areas. Managers have long

known that information in text form is important to their

needs, but little MIS/DSS support has been provided to

business in this area.

The growing application of data communications

networks may also provide an opportunity for systems such

as presented here to find fertile ground for development.

As the networks become more common, managers will began

using them as a communication channel of choice (resulting

in a potential for information overload), and the filtering

and condensing requirement will become more pressing. The

techniques discussed and tested here offer promise for a

near-term solution to this growing problem.

147

Computer-mediated communication systems and large-

scale digital data/voice networks will produce dramatic

changes on organizational structure and decision-making

behavior (Kerr and Hiltz, 1979; Keen, 1986). Systems for

filtering and condensing text-based information may well

have an important place in this new environment.

Implications for Future Research

Several areas for future research are indicated by the

findings of this research. As mentioned in the analysis of

the experimental data, one area of concern is the subjects'

perception that less information was present in the

shortened treatments, in spite of the fact that they did as

well or better in those treatments on the comprehension

tests. These results raise important and interesting

questions concerning user's confidence levels in text-based

DSS: what influences confidence in text-based DSS, and how

can confidence be maintained or increased.

In addition, research should look into the question of

extract length, since we found several differences between

the shorter extracts and the long extracts. Also, passages

of a more simple, straightforward style could be examined

for effectiveness with extracting techniques, since this

research was limited to more difficult passages.

148

The question of extract quality has been a problem to

researchers in this area. Since most of the prior research

on automatic extracting was done with the secondary source

databases in mind, the issue of extract quality versus the

quality of expertly-written abstracts had implications for

the marketability of the abstracting services. However, it

has proved difficult to devise a method for objectively

measuring the quality of abstracts and extracts. The

approach taken here was to sidestep this issue, and measure

the quality of the abstracts and extracts by examining

their effectiveness as media through which information is

communicated. In other words, the output in terms of

comprehension of the information served as a surrogate for

quality. This approach to measuring the effectiveness of

abstracting and extracting techniques works well. Future

research should consider a similar approach both to test

techniques for generating abstracts and also as a means to

verify or validate other proposed measures for objectively

measuring the information in text.

Two of the tools used in the experiment will be useful

to other researchers as well. First, the research model

and experimental design that was applied here and in a

previous study (Kasper and Morris, 1988) has wide

applicability to the study of mediated communication,

either in field studies or experimental settings. Also,

149

the computer program that was developed is an effective way

to control and administer an experiment of this type.

Since the program is highly modular and parameterized, it

could easily be adapted by researchers for other, similar

experiments.

Limitations of the Research

There are several limitations of the research that

need to be noted. First, the sample was taken from a

population of students, many of whom were young. The task

involved in the study (reading and comprehending a passage

on the computer screen) does not seem too dependent on the

sample population, and we anticipate that results are

likely to be similar with an older, employed population.

However, it may be that the effects would be different for

other populations, and future studies of similar effects

should consider possible population differences. For

example, a field study of a prototype extracting system may

find that mature, managerial workers in a text-based DSS

environment respond differently to difficult passages than

did these students. Given the straightforward nature of

the task, however, it would seem that these results will

hold for other types of subjects.

The type of passages chosen for the study is a

limitation. These passages are difficult; the use of

150

easier passages may alter the results. Also, the algorithm

that was developed was designed for this type of passage,

by developing and testing on four passages from a similar

collection. Since the results of the experiment agreed

with the intent of the researcher developing the extracts

(that is, the researcher deliberately tried to develop an

algorithm that would produce extracts to help subjects do

well on the comprehension tests), we might suspect that an

artifact of the experimental procedure was demonstrated

rather than a real effect. However, care was taken to

prevent such an artifact. The researcher developed the

algorithm by combining several previous published

approaches, tuned and tested the algorithm by using

separate passages from a similar, yet different, set of

documents, and then randomly chose the documents to be

extracted from the more recent set of comprehension tests.

The abstracts were written and the extracts generated prior

to either the researcher or the writer of the abstracts

viewing the comprehension questions, so that they would not

be biased in favor of the information needed for the

questions.

Differences in length may also be a limitation. These

passages were only about 450 words to begin with; many

documents are much longer. Future research should examine

these effects in the context of longer original documents.

151

Also, this study had one short extract which was rather

long, almost as long as the long extract for that passage.

This may have obscured possible differences between the

short and long extract treatments in the study. Future

research should examine in more detail the effect of

extract length on comprehension and the other dependent

variables.

In some cases, as noted in the analysis chapter, there

was concern about the assumptions of the models used. If

the underlying assumptions of equal variance fail, then the

confidence intervals used for the multiple comparisons are

not accurate. It may be that because of the instability of

the estimates, some of the intervals were misstated. This

was especially important for the analysis of the reading

difficulty scale variable, and to a lesser extent for the

information availability scale.

It also seemed from the examination of the data that

the subjects' perception of reading difficulty and

information availability was more related to length than to

anything else. This was surprising, especially in that the

comprehension results were different. These findings may

have been a result of the difficulty of the passages,

however, and future research involving less difficult

passages may clarify this effect.

152

Summary of Conclusions and Final Remarks

The findings reported here have demonstrated that a

system designed to support environmental scanning and

organizational communication by filtering and condensing

text-based information can be designed and built using

current technology. The application of a simple algorithm

for generating extracts of short, difficult passages has

been shown to be an effective tool for condensing text in

such a system.

The approach taken here has implications for

researchers in the field of MIS/DSS, for developers of new

systems, and for decision-makers who will use systems such

as these in the near future.

REFERENCES

Ackoff, R. L. "Management Misinformation Systems," Management Science, 14, 4 (1967), pp. B147-158.

Aguilar, F. J. Scanning the Business Environment. MacMillan, New York, 1967.

American National Standards Institute, Inc. American National Standard for Writing Abstracts. ANSI, Inc., New York, 1979.

Anthony, R. N. Planning and Control Systems: A Framework for Analysis. Harvard University, Boston, 1965.

Ariav, G. and Ginzberg, M. J. "DSS Design: A Systemic View of Decision Support," Communications of the ACM, 28, 10 (1985), pp. 1045-1052.

Ballard, B. W., Lusth, J. C., and Tinkham, N. L. "LDC-1: A Transportable, Knowledge-Based Natural Language Processor for Office Environments," ACM Transactions on Office Information Systems, 2, 1 (1984), pp. 1-25.

Baxendale, P. B. "Machine-made Index for Technical Literature--An Experiment," IBM Journal of Research and Development, 2 (1958), pp. 354-361.

Bernier, C. L. "Abstracts and Abstracting," in Subject and Information Analysis. E. D. Dym, editor. Marcel Dekker, Inc., New York, 1985.

Birrell, A. D., Levin, R., Needham, R. M., and Schroeder, M. D. "Grapevine: An Exercise in Distributed Computing," Communications of the ACM, 25, 4 (1982), pp. 260-274.

Blair, D. C. "The Management of Information: Basic Distinctions," Sloan Management Review, 26, 1 (1984), pp. 13-23.

Bonczek, R. H., Holsapple, C. W., and Whinston, A. B. Foundations of Decision Support Systems. Academic Press, New York, 1981.

153

154

Borko, H. and Bernier, C. L. Abstracting Concepts and Methods. Academic Press, New York, 1975.

Borko, H. and Chatman, S. "Criteria for Acceptable Abstracts: A Survey of Abstracters' Instructions," American Documentation, April, 1963, pp. 149-160.

^ Brookes, C. H. P. "Text Processing as a Tool for DSS Design," in Processes and Tools for Decision Support. H. G. Sol, editor, North-Holland Publishing Company, Amsterdam, 1983, pp. 131-138.

Christodoulakis, S. and Faloutsos, C. "Design Considerations for a Message File Server," IEEE Transactions on Software Engineering, SE-10, 2 (1984), pp. 201-210.

Churchman, C. W. The Design of Inquiring Systems: Basic Concepts of Systems and Organizations. Basic Books, Inc., 1971.

Conover, W. J. Practical Nonparametric Statistics, Second Edition. John Wiley and Sons, Inc., New York, 1980.

Conover, W. J. and Iman, R. L. "On Some Alternative Procedures Using Ranks for the Analysis of Experimental Designs," Communications in Statistics--Theory and Methods, A5 (1976), pp. 1349-1368.

Conover, W. J. and Iman, R. L. Introduction to Modern Business Statistics. John Wiley and Sons, Inc., 1983.

Crawford, A. B. "Corporate Electronic Mail--A Communication-Intensive Application of Information Technology," MIS Quarterly, 6, 3 (1982), pp. 1-13.

Cremmins, E. T. The Art of Abstracting. ISI Press, Philadelphia, 1982.

Culnan, M. J. and Bair, J. H. "Human Communication Needs and Organizational Productivity: The Potential Impact of Office Automation," Journal of the American Society for Information Science 34, 3 (1983), pp. 215-221.

Daft, R. L., Sormunen, J., and Parks, D. "Chief Executivr Scanning, Environmental Characteristics, and Company Performance: An Empirical Study," Strategic Management Journal, in press.

155

Denning, P. "Electronic Junk," Communications of the ACM, 25, 3 (1982), pp. 163-165.

Dickson, G. W., Leitheiser, R. L., Wetherbe, J. C , and Nechis, M. "Key Information Systems Issues for the 1980's," MIS Quarterly, 8, 3 (1984), pp. 135-147.

Dillon, M. and Gray, A. S. "FASIT: A Fully Automatic Syntactically-Based Indexing System," Journal of the American Society for Information Science, 34, 2 (1983), pp. 99-108.

Earl, L. L. "Experiments in Automatic Extracting and Indexing," Information Storage and Retrieval, 6 (1970), pp. 313-334.

Edmunson, H. P. "Problems in Automatic Abstracting," Communications of the ACM, 7, 4 (1964), pp. 259-263.

Edmunson, H. P. "New Methods in Automatic Extracting," Journal of the ACM, 16, 2 (1969), pp. 264-285.

Edmunson, H. P. and Wyllys, R. E. "Automatic Abstracting and Indexing--Survey and Recommendations," Communications of the ACM, 4, 5 (1961), pp. 226-234.

Educational Testing Service. The Official Guide to the GMAT. Graduate Management Admissions Council, Princeton, New Jersey, 1984.

Educational Testing Service. The Official Guide for GMAT Review. Graduate Management Admissions Council, Princeton, New Jersey, 1986.

El Sawy, 0. A. "Personal Information Systems for Strategic Scanning in Turbulent Environments: Can the CEO Go On-line?," MIS Quarterly, 9, 1 (1985), pp. 53-60.

Epstein, S. S. "Transportable Natural Language Processing Through Simplicity--The PRE System," ACM Transactions on Office Information Systems, 3, 2 (1985), pp. 107-120.

Ewusi-mensah, K. "The External Organizational Environment and Its Impact on Management Information Systems," Accounting, Organizations and Society, 6, 4 (1981), pp. 301-316.

156

Faloutsos, C. and Christodoulakis, S. "Signature Files: An Access Method for Documents and its Analytical Performance Evaluation," ACM Transactions on Office Information Systems. 2, 4 (1984), pp. 267-288.

Ginsberg, M.J. and Stohr, E.A. "Decision Support Systems: Issues and Perspectives," presented at the NYU Symposium on Decision Support Systems, New York, May 21-22, 1981.

Gorry, G. A. and Scott Morton, M. S. "A Framework for Management Information Systems," Sloan Management Review, 13, 1 (1971), pp. 55-70.

Green, P. E. and Tull, D. S. Research for Marketing Decisions, 4th Edition. Prentice Hall, Englewood Cliffs, New Jersey, 1978.

Gunning, R. Technique of Clear Writing, Revised Edition. McGraw-Hill, New York, 1968.

Hafner, C. G. and Godden, K. "Portability of Syntax and Semantics in Datalog," ACM Transactions on Office Information Systems, 3, 2 (1985), pp. 141-164.

Heidorn, G. E., Jensen, K., Miller, L. A., Byrd, R. J., and Chodorow, M. S. "The EPISTLE Text-Critiquing System," IBM Systems Journal, 21, 3 (1982), pp. 305-326.

Hiltz, S. R. and Turoff, M. "The Evolution of User Behavior in a Computerized Conferencing System," Communications of the ACM, 24, 11 (1981), pp. 739-751.

Hiltz, S. R. and Turoff, M. "Structuring Computer-Mediated Communication Systems to Avoid Information Overload," Communications of the ACM, 28, 7 (1985), pp. 680-689.

Horton, R.L. The General Linear Model. McGraw-Hill, Inc., 1978.

Huber, G. P. "Issues in the Design of Group Decision Support Systems," MIS Quarterly, 8, 3 (1984), pp. 195-204.

Johnson, R. A. and Wichern, D. W. Applied Multivariate Statistical Analysis. Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1982.

157

Johnson, T. "NLP Takes Off," Datamation, January 15, 1986, pp. 91-93.

V Kasper, G. M. and Morris, A. H. "Text Processing Tools for Decision Support," in Proceedings of the 19th Annual Hawaiian International Conference on Systems Sciences, Vol- I. Y. Chu, L. Haynes, L. W. Hoevel, A. Speckard, R. H. Sprague, Jr., and E. A. Stohr, editors, IEEE Computer Society Press, Los Angeles, 1986, pp. 566-572.

\/ Kasper, G. M. and Morris, A. H. "The Effect of Presentation Media on Recipient Performance in Text-based Information Systems," Journal of Management Information Systems, 4, 4 (1988), pp. 25-43.

Keen, P. G. W. Competing in Time. Ballinger Publishing Company, Cambridge, Massachusetts, 1986.

Kerr, E. B. and Hiltz, S. R. Computer-Mediated Communication Systems: Status and Evaluation. Academic Press, New York, 1982.

Kiesler, S., Siegel, J., and McGuire, T.W. "Social Psychological Aspects of Computer-mediated Communication," American Psychologist, 39, 10 (1984), pp. 1123-1134.

Kolodziej, S. "Where is the Electronic Messaging Explosion?," Computerworld Focus, 19, 41A (Oct. 16, 1985), 21-23.

Kriebel, C.H. and Strong, D.M. "A Survey of the MIS and Telecommunications Activities of Major Business Firms," MIS Quarterly 8, 3 (1984), 171-177.

Kurke, L. B. and Aldrich, H. E. "Mintzberg was Right! A Replication and Extensin of 'The Nature of Managerial Work'," Management Science, 29, 8 (1983), pp. 975-984.

Lenz, R. T. and Engledow, J. L. "Environmental Analysis Units and Strategic Decision-making: A Field Study of Selected 'Leading-edge' Corporations," Strategic Management Journal , 7 (1986), pp. 69-89.

Lenz, R. T. and Engledow, J. L. "Environmental Analysis: 'The Applicability of Current Theory" Strategic Management Journal, 7 (1986), pp. 329-346.

158

Luhn, H. P. "The Automatic Creation of Literature Abstracts," IBM Journal of Research and Development, 2, 2 (1958), pp. 159-165.

Malone, T. W., Grant, K. R., Turbak, F. A., Brobst, S. A., and Cohen, M. D. "Intelligent Information-sharing Systems," Communication of the ACM, 30, 5 (1987), pp. 390-402. ~

Martinez, R. and Mohamed, S. "Automated Document Distribution using Al Based Workstations and Knowledge Based Systems," in Proceedings of the Twenty-first Annual Hawaii International Conference on System Sciences, Vol. III. B. R. Konsynski, editor, IEEE Computer Society Press, Washington, D.C., 1988, pp. 61-67.

Mason, R. O. and Mitroff, I.I. "A Program for Research on Management Information Systems," Management Science, 19, 5 (1973), pp. 475-487.

Mathis, B. A., Rush, J. E., and Young, C. E. "Improvement of Automatic Abstracts by the Use of Structural Analysis," Journal of the American Society of Information Science, 24 (1973), pp. 101-109.

Mazor, M. S. and Lochovsky, F. H. "Logical Routing Specifications in Office Information Systems," ACM Transactions on Office Information Systems,2,4 (1984), pp. 303-330.

McLeod, R. and Bender, D. H. "The Integration of Word Processing Into a Management Information System," MIS Quarterly, 6, 4 (1982), pp. 11-28.

V Miller, L. A. "Project EPISTLE: A System for the Automatic Analysis of Business Correspondence," Proceedings of the First Annual National Conference on Artificial Intelligence, Stanford University (1980), pp. 280-282.

Miller, L. A., Heidorn, G. E., and Jensen, K. "Text-Critiquing With the EPISTLE System: An Author's Aid to Better Syntax," AFIPS Conference Proceedings, AFIPS Press, Arlington, VA (1981), pp. 649-655.

Mintzberg, H. The Nature of Managerial Work. Harper and Row, New York, 1973.

159

Mintzberg, H., Raisinghani, D., and Theoret, A. "The Structure of 'Un-structured' Decision Processes," Administrative Science Quarterly, 21 (1976), pp. 246-275.

Mitroff, I. I. "Two Fables for Those Who Believe in Rationality," Technological Forecasting and Social Change, 28 (1985), pp. 195-202.

Montgomery C. A. "Where Do We Go From Here," in Information Retrieval Research. Oddy, R. N., Robertson, S. E., van Rijsbergen, C. J., and Williams, P. W., editors, Butterworths, London, 1981.

\/ Olson, M. H. "New Information Technology and Organizational Culture," MIS Quarterly Special Issue, (1982), pp. 71-92.

/ Olson, M. H. and Lucas, H.C. "The Impact of Office Automation on the Organization: Some Implications for Research and Practice," Communications of the ACM, 25, 11 (1982), pp. 838-847.

Paice, C. D. Information Retrieval and the Computer. Macdonald and Jane's, London, 1977.

Paice, C. D. "The Automatic Generation of Literature Abstracts: An Approach Based on the Identification of Self-indicating Phrases," in Information Retrieval Research. Oddy, R. N., Robertson, S. E., van Rijsbergen, C. J., and Williams, P. W., editors, Butterworths, London, 1981.

Pollock, J. J. and Zamora, A. "Automatic Abstracting Research at Chemical Abstracts," Journal of Chemical Information and Computer Science, 15, 4 (1975), pp. 226-232.

Quarterman, J. S. and Hoskins, J. C. "Notable Computer Networks," Communications of the ACM, 29, 10 (1986), pp. 932-971.

Rappaport, A. "Management Misinformation Systems--Another Perspective," Management Science, 15, 4 (1968), pp. B133-136.

Rathwell, M. A. and Burns, A. "Information Systems Support for Group Planning and Decision-making Activities," MIS Quarterly, 9, 3 (1985), pp. 255-271.

160

Rau, L. F. "Conceptual Information Extraction from Financial News," in Proceedings of the Twenty-first Annual Hawaii International Conference on System Sciences, Vol. III. B. R. Konsynski, editor, IEEE Computer Society Press, Washington, D.C., 1988, pp. 501-509.

Rice, R. E. "The Impacts of Computer-mediated Organizational and Interpersonal Communication," in Annual Review of Information Science and Technology, Vol. 15. M. Williams, editor. Knowledge Industry Publications, White Plains, New York, 1980, pp. 221-249.

Rice, R. E. "Mediated Group Communication," in The New Media. R. E. Rice, editor. Sage Publications, Beverly Hills, California, 1983, pp. 129-154.

V Rice, R. E. and Bair, J. H. "New Organizational Media and Productivity," in The New Media. R. E. Rice, editor. Sage Publications, Beverly Hills, California, 1983, pp. 185-215.

Rice, R. E. and Case, D. "Electronic Message Systems in the University: A Description of Use and Utility," Journal of Communication, 33, 1 (1983), pp. 131-152.

Rush, J. E., Salvador, R., and Zamora, A. "Automatic Abstracting and Indexing. II. Production of Indicitive Abstracts by Application of Contextual Inference and Syntactic Coherence Criteria," Journal of the American Society for Information Science, 22 (1971), pp. 260-274.

SAS Institute, Inc. SAS User's Guide: Statistics, Version 5 Edition. SAS Institute Inc., Gary, N.C., 1985.

Schicker, P. "Naming and Addressing in a Computer-Based Mail Environment," IEEE Transactions on Communications, COM-30, 1 (1982), pp. 46-52.

Schriber, J. "Move Over, Strunk and White," Forbes, August 15, 1983, pp. 100-101.

\/ Schwartz, R., Fortune, J., and Horwich, J. "AMANDA: A Computerized Document Management System," MIS Quarterly, 4, 3 (1980), pp. 41-49.

161

Schwenk, C. R. "Effects of Planning Aids and Presentation Media on Performance and Affective Responses in Strategic Decision-Making," Management Science, 30, 3 (1984), pp. 263-272.

Shannon, C.E. and Weaver, W. The Mathematical Theory of Communication. University of Illinois Press, Urbana, 1964.

Siegel, J., Dubrovsky, V., Kiesler, S. and McGuire, T.W. "Group Processes in Computer-mediated Communication," Organizational Behavior and Human Decision Processes, 37 (1986), pp. 157-187. ~~

Simon, H. A. The New Science of Management Decisions. Harper and Row, New York, 1960.

Simon, H. A. "Applying Information Technology to Organizational Design," Public Administration Review, 33, 3 (1973), pp. 268-278"^ ~~"

\/ Simon, H. A. "The Structure of 111 Structured Problems," Artificial Intelligence, 4 (1973), pp. 181-201.

Slonim, J., MacRae, L. J., Mennie, W. E., and Diamond, N. "NDX- 100: An Electronic Filing Machine for the Office of the Future," Computer, May, 1981, pp. 24-36.

Smeaton, A. F. and van Rijsbergen, C. J. "Information Retrieval in an Office Filing Facility and Future Work in Project Minstrel," Information Processing and Management, 22, 5 (1986), pp. 135-149.

Sprague, R. H. "A Framework for Research on Decision Support Systems," in Decision Support Systems: Issues and Challenges. G. Fick and R. H. Sprague, editors, Pergamon Press, Oxford, 1980.

Sprague, R. H. "Task Force on Document Based Decision Support Systems," in Proceedings of the Twenty-first Annual Hawaii International Conference on System Sciences, Vol. IV. R. H. Sprague, editor, IEEE Computer Society Press, Washington, D.C., 1988, p. 262.

Sprague, R. H. and Carlson, E. D. Building Effective Decision Support Systems. Prentice Hall, Englewod Cliffs, New Jersey, 1982.

162

Svenning, L. L. and Ruchinskas, J. E. "Organizational Teleconferencing," in The New Media. R. E. Rice, editor. Sage Publications, Beverly Hills, California, 1983, pp. 217-248.

\y Swanson, E. B. and Culnan, M. J. "Document-Based Systems for Managelment Planning and Control: A Classification, Survey, and Assessment," MIS Quarterly, 2, 4 (1978), pp. 31-46.

Taylor, S. L. and Krulee, G. K. "Experiments with an Automatic Abstracting System," in Information Management in the 1980's. Proceedings of the ASIS Annual Meeting, vol. 14. Knowledge Industry Publications, White Plains, New York, 1977, p. 83.

Tombaugh, J.W. "Evaluation of an international scientific computer-based conference," Journal of Social Issues, 40, 3 (1984), 129-144.

\y Tonge, F. "Ontological Analysis of Document Usage: An Exploratory Study," in Proceedings of the Twenty-first Annual Hawaii International Conference on System Sciences, Vol. III. B. R. Konsynski, editor, IEEE Computer Society Press, Washington, D.C., 1988, pp. 68-76.

Tsichritzis, D. "Message Addressing Schemes," ACM Transactions on Office Information Systems, 2, 1 (1984), pp. 58-77.

Tsichritzis, D. and Christodoulakis, S. "Message Files," ACM Transactions on Office Information Systems, 1, 1 (1983), pp. 88-98.

Tsichritzis, D., Rabitti, F. A., Gibbs, S., Nierstrasz, 0., and Hogg, J. "A System for Managing Structured Messages," IEEE Transactions on Communications, COM-30, 1 (1982), pp. 66-73.

Turoff, M. and Hiltz, S.R. "Computer support for group versus individual decisions," IEEE Transactions on Communications, COM-30, 1 (1982), pp. 82-90.

Valle, J. Computer Message Systems. McGraw-Hill, New York, 1984.

van Rijsbergen, C. J. Information Retrieval (2nd edition). Butterworths, London, 1979.

163

Weil, B. H. "Standards for Writing Abstracts," Journal of the American Society for Information Science, 21, 5 (1970), pp. 351-357.

Wellisch, H. H. Indexing and Abstracting 1977-1981. ABC-Clio Information Services, Santa Barbara, California, 1984.

APPENDIX A

INSTRUMENTS USED IN EXPERIMENT

164

165

E7XFD:-RIC;NCr{ A N D DnCKGROUND QIIEBTIDNNAIRE

N<:A(nfc?;

INSTRUCTIONS

This questionnaire is primarily concerned with your t?>;per i enca in business. Your candid response to these questions is criticc*! to the success of this study.

Any information you provide on this questionnaire will be held in complete confidence. Your responses will be combined with the responses of the other participants and only the combined, aqqreqated data will be used. MO individual responses will be sinqled out for identification or reported to anyone in any way or for any purpose.

Please read each question carefully and write your response in the space provided on the questionnaire. If you feel a specific question is not applicable to your particular backqround, ^̂ r̂il:a "N/A" in the space provided for your response.

Figure A.l. Experience and background questionnaire.

166

1 . P l ( i c \se lL - . i t a l l d o q r o e s y o u h a v e r e c e i v e d ..-ind your- in.,i j<:r-( i )

DEGREE MAJOR(S)

( 1 )

( 2 )

( 3 )

2. The followinq questions re-fer to your employment hiatory. PI i.'=<se ANSWER ALL the QUESTIONS in the section.

A. Have you been employed as a manaqer on a full-time year-round basis? (check one)

Yes No

If yes, how many years of full-time, year-round manaqetnent experience do you have?

Number of years

D. Have you been employed in a non-manaqemen t position en ^̂ full-time year-round basis'^ (check one)

Yes No

If yes, how many years of experience do ycu hctve i r-i ^̂ full-time non-manaqement position?

Number of years

C. l-lave you been employed as a manaqer on a part-time baai 3 (Includinq summer employment)? (check one)

Yes No

If yes, how many years of part-time manaqement t?;; per i encs do you have?

Number of years

D. Have you been employed in a non-manaqement po-iition an a part-time basis (Includinq summer employment)? (check one)

Yes No

I f y e s , how many y e a r s of e x p e r i e n c e do you hcHve i n p a r t - t i m e non-manaqement p o s i t i o n ?

Number of y e a r s

Figure A.l. Continued

http://lL-.it

167

... Mie foliowinq pertains to your computer experience. Klease indiccite your response by ClRCLINli th« UNfe- NUI-lfcit̂R whicfi inoftit closely corresponds to your experience in each of the situations. If you have no experience in any one of these areas, circle "1", the NO EXPERIENCE category.

A. How much experience do you have working with video display computer teminals?

1 _ _ _ 2 - - - 3 - - - 4 - - - 5 - - - 6 - - - 7 NO MODERATE EXTENSIVE EXPERIENCE EXPERIENCE EXPERIENCE

3. How much experience do you have with computer-based communication systems, such as electronic mail"^

1 - - _ 2 - - - 3 - - - 4 - - - 5 - - - 6 - - - 7 NO MODERATE EXTENSIVE EXPERIENCE EXPERIENCE EXPERIENCE

Figure A.l. Continued

168

INSTRUCTIONS FOR ABSTRACTING EXPERIEMENT

This experiment is designed to test the computer's ability to generate abstracts of short text passages.

You will be reading four passages, three of which are abstracts.

While reading the passages you may use the 'PgDn' and 'PgUp' keys which are found on the numeric keypad. (The numeric keypad is on the right-hand side of the keyboard.) The 'PgDn' key will let you see more text, and the 'PgUp' key will let you review prior text. Ycu may try the 'PgUp' and 'PgDn' keys as you read these instructions.

Some of the abstracts will be so short that you will not need to use the 'PgDn' or 'PgUp' keys.

When you have finished reading the passages, you may press the 'End' key (number 1 on the keypad) to signal the computer that you are finished reading.

After each passage, there will be a short tast of your comprehension of the information in the passage. Each tast will have eight questions. If the computer's abstract is accurate, you should have enough information to answer the comprehension questions.

However, you should read each passage carefully since some of the questions require critical thinking.

It may be that the computer's abstract will not contain the informa-icr. needed to answer a particular question. If that happens, just make rr.e best guess you can using the information you did receive.

Each participant in the experiment will receive all four passages and be asked to answer the same questions as you. The order of presentation will not be the same, however, and different abstract types will be used for different passages. For example, on a particular passage you may see a long abstract, while another participant may see a shorter abstract. We have balanced the order of treatments so that no participant has an advantage over the others.

In addition to the comprehension questions, there will be a scale for you to indicate the readability of the computer's abstract, and a scale for you to indicate how much information was included in the abstract. Each scale looks like a ruler: you indicate your opinion by placing a mark along the ruler between the endpoints of t.he scale. These scales are intended to measure your subjective evaluation of t.he readability and information content of the abstracts. You will be asked to mark the readability scale as soon as you finish reading each passage. The information content scale is marked after each set of comprehension questions.

** END OF INSTRUCTIONS **

Figure A.2. Instructions displayed by computer program to subjects.

169

*** PASSAGE A ***

"Social Injustice versus Poetic Justice in Literature"

Those examples of poetic justice that occur in medieval literature and Elizabethan literature, and that seem so satisfying, have encouraged a whole school of twentieth-century scholars to "find" further examples. In fact, these scholars have merely forced victimized characters into a moral framework by which the injustices inflicted on them are, somehow or other, justified. Such scholars deny that the sufferers in a tragedy are innocent; they blame the victims themselves for their tragic fates. Any misdoing is enough to subject a character to critical whips. Thus, there are long essays about the misdemeanors of Webster's Duchess of Malfi, who defied her brothers, and the behavior of Shakespeare's Desdemona, who disobeyed her father.

Yet it should be remembered that the Renaissance writer Matteo Bandello strongly protests the injustice of the severe penalties issued to women for acts of disobedience that men could, and did, commit with virtual impunity. And Shakespeare, Chaucer, and Webster often enlist their readers on the side of their tragic heroines by describing injustices so cruel that readers cannot but join in protest. By portraying Griselda, in The Clerk's Tale, as a meek, gentle victim who does not criticize, much less rebel against the persecutor, her husband Walter, Chaucer incites the reader to espouse Griselda's cause against Walter's oppression. Thus, efforts to supply historical and theological rationalizations for Walter's persecutions tend to turn Chaucer's fable upside down, to deny its most obvious effect on the readers' sympathies. Similarly, to assert that Webster's Duchess deserved torture and death because she chose to marry the man she loved and bear their children is, in effect, to join forces with her tyrannical brothers, and so to confound the operation of poetic justice, of which readers should approve, with precisely those examples of social injustice that Webster does everything in his power to make readers condemn. Indeed, Webster has his heroine so heroically lead the resistance to tyranny that she may well inspire members of the audience to imaginatively join forces with her against the cruelty and hypocritical morality of her brothers.

Thus Chaucer and Webster, in their different ways, attack injustice, argue on behalf of the victims, and prosecute the persecutors. Their readers serve them as a court of appeal that remains free to rule, as the evidence requires, in favor of the innocent and injured parties. For, to paraphrase the noted eighteenth-century scholar, Samuel Johnson, despite all the refinements of subtlety and the dogmatism of learning, it is by the common sense and

Figure A.3. Passage A--full text treatment

170

compassion of readers who are uncorrupted by the prejudices of some opinionated scholars that the characters and situations in medieval and Elizabethan literature, as in any other literature, can best be judged.

*** END OF PASSAGE A ***

Figure A.3. Continued

171

** PASSAGE A **


Those examples of poetic justice that occur in medieval literature and Elizabethan literature have encouraged a whole school of twentieth-century scholars to "find" further examples. These scholars have merely forced victimized characters into a moral framework by which the injustices inflicted on them are justified. Such scholars deny that the sufferers in a tragedy are innocent; they blame the victims themselves for their tragic fates.

By portraying Griselda, in The Clerk's Tale, as a meek, gentle victim who does not criticize, much less rebel against the persecutor, her husband Walter, Chaucer incites the reader to espouse Griselda's cause against Walter's oppression. Thus, efforts to supply historical and theological rationalizations for Walter's persecutions tend to turn Chaucer's fable upside down, to deny its most obvious effect on the readers' sympathies. Similarly, to assert that Webster's Duchess deserved torture and death because she chose to marry the man she loved and bear their children is to join forces with her tyrannical brothers, and so to confound the operation of poetic justice, of which readers should approve, with precisely those examples of social injustice that Webster does everything in his power to make readers condemn.

Thus Chaucer and Webster attack injustice, argue on behalf of the victims, and prosecute the persecutors. Their readers serve them as a court of appeal that remains free to rule, as the evidence requires, in favor of the innocent and injured parties.

** END OF PASSAGE A **

Figure A.4. Passage A--long extract treatment.

172

** PASSAGE A **


Those examples of poetic justice that occur in medieval literature and Elizabethan literature have encouraged a whole school of twentieth-century scholars to "find" further examples These scholars have merely forced victimized characters into a moral framework by which the injustices inflicted on them are justified. Such scholars deny that the sufferers in a tragedy are innocent; they blame the victims themselves for their tragic fates.

By portraying Griselda, in The Clerk's Tale, as a meek, gentle victim who does not criticize, much less rebel against the persecutor, her husband Walter, Chaucer incites the reader to espouse Griselda's cause against Walter's oppression. Thus, efforts to supply historical and theological rationalizations for Walter's persecutions tend to turn Chaucer's fable upside down, to deny its most obvious effect on the readers' sympathies. Similarly, to assert that Webster's Duchess deserved torture and death because she chose to marry the man she loved and bear their children is to join forces with'her tyrannical brothers, and so to confound the operation of poetic justice, of which readers should approve, with precisely those examples of social injustice that Webster does everything in his power to make readers condemn.


Figure A.5. Passage A--short extract treatment.

173

** PASSAGE A **

Social Injustice versus Poetic Justice in Literature"

Examples of poetic justice in medieval and Elizabethan literature have encouraged a school of twentieth-century scholars to "find" further examples. However, these scholars have merely forced victimized characters into a moral framework where the victims themselves are blamed for their tragic fates. Yet, by describing cruel injustices, Shakespeare, Chaucer, and Webster often enlist their readers on the side of their tragic heroines. Thus, the scholar's efforts to supply historical and theological rationalizations tend to deny poetic justice's most obvious effect on the reader's sympathies.


Figure A.6. Passage A--abstract treatment.

174

QUESTIONS AND ANSWERS FOR PASSAGE A

1. According to the passage, some twentieth-century scholars have written at length about (A) Walter's persecution of his wife in Chaucer's The Clerk's Tale (B) the Duchess of Malfi's love for her husband (C) the tyrannical behavior of the Duchess of Malfi's brothers (D) the actions taken by Shakespeare's Desdemona (E) the injustices suffered by Chaucer's Griselda

2. The primary purpose of the passage Is to

(A) describe the role of the tragic heroine in medieval and Elizabethan literature

(B) resolve a controversy over the meaning of "poetic justice" as it is discussed in certain medieval and Elizabethan literary treatises

(C) present evidence to support the view that characters in medieval and Elizabethan tragedies are to blame for their fates

(D) assert that it is impossible for twentieth-century readers to fully comprehend the characters and situations in medieval and Elizabethan literary works

(E) argue that some twentieth-century scholars have misapplied the concept of "poetic justice" in analyzing certain medieval and Elizabethan literary works

3. It can be inferred from the passage that the author considers Chaucer's Griselda to be

(A) an innocent victim (B) a sympathetic judge (C) an imprudent person (D) a strong individual (E) a rebellious daughter

4. The author's tone in her discussion of the conclusions reached by the "school of twentieth-century scholars" is best described as

(A) plaintive (B) philosophical (C) disparaging (D) apologetic (E) enthusiastic

5. It can be inferred from the passage that the author believes that most people respond to intended instances of poetic justice in medieval and Elizabethan literature with

(A) annoyance (B) disapproval (C) indifference (D) amusement (E) gratification

Figure A.7. Passage A--comprehension test questions

175

6. As described in the passage, the process by which some twentieth-century scholars have reached their conclusions about the blameworthiness of victims in medieval and Elizabethan literary works is most similar to which of the following?

(A) Derivation of logically sound conclusions from well-founded premises (B) Accurate observation of data, inaccurate calculation of statistics,

and drawing of incorrect conclusions from the faulty statistcs (C) Establishment of a theory, application of the theory to ill-fitting

data, and drawing of unwarranted conclusions from the data (D) Development of two schools of thought about a factual situation,

debate between the two schools, and rendering of a balanced judgment by an objective observer

(E) Consideration of a factual situation by a group, discussion of various possible explanatory hypotheses, and agreement by consensus on the most plausible explanation

7. The author's paraphrase of a statement by Samuel Johnson serves which of the following functions in the passage?

(A) It furnishes a specific example. (B) It articulates a general conclusion. (C) It introduces a new topic. (D) It provides a contrasting perspective. (E) It clarifies an ambiguous assertion.

8. The author of the passage is primarily concerned with

(A) reconciling opposing viewpoints (B) encouraging innovative approaches (C) defending an accepted explanation (D) advocating an alternative interpretation (E) analyzing an unresolved question


176

*** PASSAGE B ***

"Economic Difficulties in Eighteenth Century Japan"

In the eighteenth century, Japan's feudal overlords, to the shogun to the humblest samurai, found themselves under financial stress. In part, this stress can be attributed to the overlords' failure to adjust to a rapidly expanding economy, but the stress was also due to factors beyond the overlords' control. Concentration of the samurai in castle-towns had acted as a stimulus to trade. Commercial efficiency, in turn, had put temptations in the way of buyers. Since most samurai had been reduced to idleness by years of peace, encouraged to engage in scholarship and martial exercises or to perform administrative tasks that took little time, it is not surprising that their tastes and habits grew expensive. Overlords' income, despite the increase in rice production among their tenant farmers, failed to keep pace with their expenses. Although shortfalls in overlords' income resulted almost as much from laxity among their tax-collectors (the nearly inevitable outcome of hereditary officeholding) as from their higher standards of living, a misfortune like a fire or a flood, bringing an increase in expenses or a drop in revenue, could put a domain in debt to the city rice-brokers who handled its finances. Once in debt, neither the individual samurai nor the shogun himself found it easy to recover.

It was difficult for individual samurai overlords to increase their income because the amount of rice that farmers could be made to pay in taxes was not unlimited, and since the income of Japan's central government consisted in part of taxes collected by the shogun from his huge domain, the government too was constrained. Therefore, the Tokugawa shoguns began to look to other sources for revenue. Cash profits from government-owned mines were already on the decline because the most easily worked deposits of silver and gold had been exhausted, although debasement of the coinage had compensated for the loss. Opening up new farmland was a possibility, but most of what was suitable had already been exploited and further reclamation was technically unfeasible. Direct taxation of the samurai themselves would be politically dangerous. This left the shoguns only commerce as a potential source of government income.

Most of the country's wealth, or so it seemed, was finding its way into the hands of city merchants. It appeared reasonable that they should contribute part of that revenue to ease the shogun's burden of financing the state. A means of obtaining such revenue was soon found by levying forced loans, known as goyo-kin; although these

Figure A.8. Passage B--full text treatment.

177

were not taxes in the strict sense, since they were irregular in timing and arbitrary in amount, they were high in yield. Unfortunately, they pushed up prices. Thus, regrettably, the Tokugawa shoguns' search for solvency for the government made it increasingly difficult for individual Japanese who lived on fixed stipends to make ends meet.

*** END OF PASSAGE B ***


178

** PASSAGE B **


In the eighteenth century, Japan's feudal overlords found themselves under financial stress. In part, this stress can be attributed to the overlords' failure to adjust to a rapidly expanding economy, but the stress was also due to factors beyond the overlords' control.

It was difficult for individual samurai overlords to increase their income because the amount of rice that farmers could be made to pay in taxes was not unlimited, and since the income of Japan's central government consisted in part of taxes collected by the shogun from his huge domain, the government too was constrained. Therefore, the Tokugawa shoguns began to look to other sources for revenue. Direct taxation of the samurai themselves would be politically dangerous. This left the shoguns only commerce as a potential source of government income.

A means of obtaining such revenue was soon found by levying forced loans, known as goyo-kin; although these were not taxes in the strict sense, since they were irregular in timing and arbitrary in amount, they were high in yield. Unfortunately, they pushed up prices. Thus, regrettably, the Tokugawa shoguns' search for solvency for the government made it increasingly difficult for individual Japanese who lived on fixed stipends to make ends meet.

** END OF PASSAGE B **

Figure A.9. Passage B--long extract treatment.

179

** PASSAGE B **


In the eighteenth century, Japan's feudal overlords found themselves under financial stress. In part, this stress can be attributed to the overlords' failure to adjust to a rapidly expanding economy, but the stress was also due to factors beyond the overlords' control.

It was difficult for individual samurai overlords to increase their income because the amount of rice that farmers could be made to pay in taxes was not unlimited, and since the income o*f Japan's central government consisted in part of taxes collected by the shogun from his huge domain, the government too was constrained. Therefore, the Tokugawa shoguns began to look to other sources for revenue.


Figure A.10. Passage B--short extract treatment.

180

** PASSAGE B **


In the eighteenth century, Japan's feudal overlord's found themselves under financial stress by failing to adjust to a rapidly expanding economy. Eventually, overlords' income failed to keep pace with their expenses. It was difficult for the overlords to increase their income since the amount farmers could be taxed was limited. Thus, the government too was constrained, and began to look for other sources of revenue. Revenue was soon found by levying forced loans, known as goyo-kin, on city merchants. Unfortunately, they pushed up prices, and made it increasingly difficult for individual Japanese on fixed incomes to make ends meet.


Figure A.11. Passage B--abstract treatment.

181

QUESTIONS AND ANSWERS TO PASSAGE B

1. The passage is most probably an excerpt from

(A) an economic history of Japan

(B) the memoirs of a samurai warrior (C) a modern novel about eighteenth-century Japan (D) an essay contrasting Japanese feudalism with its Western counterpart (E) an introduction to a collection of Japanese folktales

2. Which of the following financial situations is most analogous to the financial situtation in which Japan's Tokugawa shoguns found themselves in the eighteenth century?

(A) A small business borrows heavily to invest in new equipment, but is able to pay off its debt early when it is awarded a lucrative government contract.

(B) Fire destroys a small business, but insurance covers the cost of rebuilding.

(C) A small business is turned down for a loan at a local bank because the owners have no credit history.

(D) A small business has to struggle to meet operating expenses when its profits decrease.

(E) A small business is able to cut back sharply on spending through greater commercial efficiency and thereby compensate for a loss of revenue.

3. Which of the following best describes the attitude of the author coward the samurai?

(A) Warmly approving (B) Mildly sympathetic (C) Bitterly disappointed (D) Harshly disdainful (E) Profoundly shocked

4. According to the passage, the major reason for the financial problems experienced by Japan's feudal overlords in the eighteenth century was that

(A) spending had outdistanced income (B) trade had fallen off (C) profits form mining had declined (D) the coinage had been sharply debased (E) the samurai had concentrated in castle-towns

Figure A.12. Passage B--comprehension test questions.

182

5. The passage implies that individual samurai did not find it easy to recover from debt for which of the following reasons?

(A) Agricultural production had increased. (B) Taxes were irregular in timing and arbitrary in amount. (C) The Japanese government had failed to adjust to the needs of a

changing economy. (D) The domains of samurai overlords were becoming smaller and poorer as

government revenues Increased. (E) There was a limit to the amount in taxes that farmers could be made

to pay.

6. The passage suggests that, in eighteenth-century Japan, the office of tax collector

(A) was a source of personal profit to the officeholder (B) was regarded with derision by many Japanese (C) remained within families (D) existed only in castle-towns (E) took up most of the officeholder's time

7. The passage implies that which of the following was the primary reason why the Tokugawa shoguns turned to city merchants for help in financing the state?

(A) A series of costly wars had depleted the national treasury. (B) Most of the country's wealth appeared to be in city merchants' hands. (C) Japan had suffered a series of economic reversals due to natural

disasters such as floods. (D) The merchants were already heavily indebted to the shoguns. (E) Further reclamation of land would not have been economically

advantageous.

8. According to the passage, the actions of the Tokugawa shoguns in their search for solvency for the government were regrettable because those actions

(A) raised the cost of living by pushing up prices (B) resulted in the exhaustion of the most easily worked deposits of

silver and gold (C) were far lower in yield than had originally been anticipated (D) did not succeed in reducing government spending (E) acted as a deterrent to trade.


183

*** PASSAGE C ***

"A 1978 Discussion of Minority Business Opportunities"

Recent years have brought minority-owned businesses in the United States unprecedented opportunities—as well as new and significant risks. Civil rights activists have long argued that one of the principle reasons why Blacks, Hispanics, and other minority groups have difficulty establishing themselves in business is that they lack access to the sizable orders and subcontracts that are generated by large companies. Now Congress, in apparent agreement, has required by law that businesses awarded federal contracts of more than $500,000 do their best to find minority subcontractors and record their efforts to do so on forms filed with the government. Indeed, some federal and local agencies have gone so far as to set specific percentage goals for apportioning parts of public works contracts to minority enterprises.

Corporate response appears to have been substantial. According to figures collected in 1977, the total of corporate contracts with minority businesses rose from $77 million in 1972 to $1.1 billion in 1977. The projected total of corporate contracts with minority businesses for the early 1980's is estimated to be over $3 billion per year with no letup anticipated in the next decade.

Promising as it is for minority businesses, this increased patronage poses dangers for them, too. First, minority firms risk expanding too fast and overextending themselves financially, since most are small concerns and, unlike large businesses, they often need to make substantial investments in new plants, staff, equipment, and the like in order to perform work subcontracted to them. If, thereafter, their subcontracts are for some reason reduced, such firms can face crippling fixed expenses. The world of corporate purchasing can be frustrating for small entrepreneurs who get requests for elaborate formal estimates and bids. Both consume valuable time and resources, and a small company's efforts must soon result in orders, or both the morale and the financial health of the business will suffer.

A second risk is that White-owned companies may seek to cash in on the increasing apportionments through formations of joint ventures with minority-owned concerns. Of course, in many instances there are legitimate reasons for joint ventures; clearly. White and minority enterprises can team up to acquire business that neither could acquire alone. But civil rights groups and minority business owners have complained to Congress about minorities being set up as "fronts" with White backing, rather than being accepted as

Figure A.13. Passage C--full text treatment.

184

full partners in legitimate joint ventures.

Third, a minority enterprise that secures the business of one large corporate customer often runs the danger of becoraing--and remaining—dependent. Even in the best of circumstances, fierce competition from larger, more established companies makes it difficult for small concerns to broaden their customer bases; when such firms have nearly guaranteed orders from a single corporate benefactor, they may truly have to struggle against complacency arising from their current success.

*** END OF PASSAGE C ***


185

** PASSAGE C **


Recent years have brought minority-owned businesses in the United States unprecedented opportunities—as well as new and significant risks.

The projected total of corporate contracts with minority businesses for the early 1980's is estimated to be over $3 billion per year with no letup anticipated in the next decade.

Promising as it is for minority businesses, this increased patronage poses dangers for them, too. First, minority firms risk expanding too fast and overextending themselves financially, since most are small concerns and, unlike large businesses, they often need to make substantial investments in new plants, staff, equipment, and the like in order to perform work subcontracted to them. If, thereafter, their subcontracts are for some reason reduced, such firms can face crippling fixed expenses.

A second risk is that White-owned companies may seek to cash in on the increasing apportionments through formations of joint ventures with minority-owned concerns. In many instances there are legitimate reasons for joint ventures; clearly. White and minority enterprises can team up to acquire business that neither could acquire alone. But civil rights groups and minority business owners have "complained to Congress about minorities being set up as "fronts" with White backing, rather than being accepted as full partners in legitimate joint ventures.

Third, a minority enterprise that secures the business of one large corporate customer often runs the danger of becoming dependent.

** END OF PASSAGE C **

Figure A.14. Passage C--long extract treatment.

186

** PASSAGE C **


Recent years have brought minority-owned businesses in the United States unprecedented opportunities—as well as new and significant risks.

First, minority firms risk expanding too fast and overextending themselves financially, since most are small concerns and, unlike large businesses, they often need to make substantial investments in new plants, staff, equipment, and the like in order to perform work subcontracted to them. If, thereafter, their subcontracts are for some reason reduced, such firms can face crippling fixed expenses.

A second risk is that White-owned companies may seek to cash in on the increasing apportionments through formations of joint ventures with minority-owned concerns.

Third, a minority enterprise that secures the business of one large corporate customer often runs the danger of becoming dependent.


Figure A.15. Passage C--short extract treatment.

187

** PASSAGE C **


Congress has required businesses awarded federal contracts of more than $500,000 to do their best to find minority subcontractors. Corporate contracts with minority businesses rose from $77 million in 1972 to $1.1 billion in 1977. The projected total in the early 1980's is estimated at $3 billion per year. This increased patronage poses dangers for minority businesses. First, minority firms risk expanding too fast and overextending themselves financially. Second, White-owned companies may seek to cash in on increasing apportionments through formations of joint ventures with minority owned concerns. Third, a minority enterprise that secures the business of one large corporate customer may become dependent.


Figure A.16. Passage C--abstract treatment

188

QUESTIONS AND ANSWERS TO PASSAGE C

1. The primary purpose of the passage is to

(A) present a commonplace idea and its inaccuracies (B) describe a situation and its potential drawbacks (C) propose a temporary solution to a problem (D) analyze a frequent source of disagreement (E) explore the implications of a finding

2. The passage supplies information that would answer which of the following questions?

(A) What federal agencies have set percentage goals for the use of minority-owned businesses in public works contracts?

(B) To which government agencies must businesses awarded federal contracts report their efforts to find minority subcontractors?

(C) How widespread is the use of minority-owned concerns as "fronts" by White backers seeking to obtain subcontracts?

(D) How many more minority-owned businesses were there in 1977 than in 1972?

(E) What is one set of conditions under which a small business might find itself financially overextended?

3. According to the passage, civil rights activists maintain that one disadvantage under which minority-owned businesses have traditionally had to labor is that they have

(A) been especially vulnerable to governmental mismanagement of the economy

(B) been denied bank loans at rates comparable to those afforded larger competitors

(C) not had sufficient opportunity to secure business created by large corporations.

(D) not been able to advertise in those media that reach large numbers of potential customers

(E) not had adequate representation in the centers of government power

4. The passage suggests that the failure of a large business to have its bids for subcontracts result quickly in orders might cause it to

(A) experience frustration but not serious financial harm (B) face potentially crippling fixed expenses (C) have to record its efforts on forms filed with the government (D) increase its spending with minority subcontractors (E) revise its procedure for making bids for federal contracts and

subcontracts

Figure A.17. Passage C--comprehension test questions.

189

5. The author implies that a minority-owned concern that does the greater part of its business with one large corporate customer should

(A) avoid competition with larger, more established concerns by not expanding

(B) concentrate on securing even more business from that corporation (C) try to expand its customer base to avoid becoming dependent on the

corporation (D) pass on some of the work to be done for the corporation to other

minority—owned concerns (E) use its influence with the corporation to promote subcontracting with

other minority concerns

6. Which of the following, if true, would most weaken the author's assertion that, in the 1970*3, corporate response to federal requirements was substantial?

(A) Corporate contracts with minority-owned businesses totaled $2 billion in 1979.

(B) Between 1970 and 1972, corporate contracts with minority-owned businesses declined by 25 percent.

(C) The figures collected in 1977 underrepresented the extent of corporate contracts with minority-owned businesses.

(D) The estimate of corporate spending with minority-owned businesses in 1980 is approximately $10 million too high.

(E) The $1.1 billion represented the same percentage of total corporace spending in 1977 as did $77 million in 1972.

7. The passage most likely appeared in

(A) a business magazine (B) an encyclopedia of Black history to 1945 (C) a dictionary of financial terms (D) a yearbook of business statistics (E) an accounting textbook

8. The author would most likely agree with which of the following statements about corporate response to working with minority subcontractors?

(A) Annoyed by the proliferation of "front" organizations, corporations are likely to reduce their efforts to work with minority-owned subcontractors in the near future.

(B) Although corporations showed considerable interest in working with minority businesses in the 1970's, their aversion to government paperwork made them reluctant to pursue many government contracts.

(C) The significant response of corporations in the 1970's is likely to be sustained and conceivably be increased throughout the 1980's.

(D) Although corporations are eager to cooperate with minority-owned businesses, a shortage of capital in the 1970's made substantial response impossible.

(E) The enormous corporate response has all but eliminated the dangers of overexpansion that used to plague small minority-owned businesses.


190

*** PASSAGE D ***

"Botticelli and the Critics of Florentine Art"

The history of responses to the work of the artist Sandro Botticelli (14447-1510) suggests that widespread appreciation by critics is a relatively recent phenomenon. Writing in 1550, Vasari expressed an unease with Botticelli's work, admitting that the artist fitted awkwardly into his (Vasari's) evolutionary scheme of the history of art. Over the next two centuries, academic art historians denigrated Botticelli in favor of his fellow Florentine, Michelangelo. Even when antiacademic art historians of the early nineteenth century rejected many of the standards of evaluation espoused by their predecessors, Botticelli's work remained outside of accepted tastes, pleasing neither amateur observers nor connoisseurs. (Many of his best paintings, however, remained hidden away in obscure churches and private homes.)

The primary reason for Botticelli's unpopularity is not difficult to understand: most observers, up until the mid-nineteenth century, did not consider him to be noteworthy because his work, for the most part, did not seem to these observers to exhibit the traditional characteristics of fifteenth-century Florentine art. For example, Botticelli rarely employed the technique of strict perspective and, unlike Michelangelo, never used chiaroscuro. Another reason for Botticelli's unpopularity may have been that his attitude toward the style of classical art was very different from that of his contemporaries. Although he was thoroughly exposed to classical art, he showed little interest in borrowing from the classical style. Indeed, it is paradoxical that a painter of large-scale classical subjects adopted a style that was only slightly similar to that of classical art.

In any case, when viewers began to examine more closely the relationship of Botticelli's work to the tradition of fifteenth-century Florentine art, his reputation began to grow. Analyses and assessments of Botticelli made between 1850 and 1870 by the artists of the Pre-Raphaelite movement, as well as by the writer Pater (although he, unfortunately, based his assessment on an incorrect analysis of Botticelli's personality), inspired a new appreciation of Botticelli throughout the English-speaking world. Yet Botticelli's work, especially the Sistine frescoes, did not generate worldwide attention until it was finally subjected to a comprehensive and scrupulous analysis by Home in 1908. Home rightly demonstrated that the frescoes shared important features with paintings by other fifteenth-century Florentines--features such as skillful representations of anatomical proportions, and of the human figure in motion. However, Home argued that Botticelli did not treat these

Figure A.18. Passage D—full text treatment.

191

qualities as ends in themselves—rather, that he emphasized clear depiction of a story, a unique achievement and one that made the traditional Florentine qualities less central. Because of Home's emphasis on the way a talented artist, reflects a tradition yet moves beyond that tradition, an emphasis crucial to any study of art, the twentieth century has come to appreciate Botticelli's achievements.

*** END OF PASSAGE D ***


192

** PASSAGE D **

"Boticelli and the Critics of Florentine Art"

The history of responses to the work of the artist Sandro Botticelli suggests that widespread appreciation by critics is a relatively recent phenomenon. Over the next two centuries, academic art historians denigrated Botticelli in favor of his fellow Florentine, Michelangelo. Even when antiacademic art historians of the early nineteenth century rejected many of the standards of evaluation espoused by their predecessors, Botticelli's work remained outside of accepted tastes, pleasing neither amateur observers nor connoisseurs.

The primary reason for Botticelli's unpopularity is not difficult to understand: most observers did not consider him to be noteworthy because his work did not seem to these observers to exhibit the traditional characteristics of fifteenth-century Florentine art. For example, Botticelli rarely employed the technique of strict perspective and never used chiaroscuro. Another reason for Botticelli's unpopularity may have been that his attitude toward the style of classical art was very different from that of his contemporaries.

When viewers began to examine more closely the relationship of Botticelli's work to the tradition of fifteenth-century Florentine art, his reputation began to grow. Because of Home's emphasis on the way a talented artist reflects a tradition yet moves beyond that tradition, an emphasis crucial to any study of art, the twentieth century has come to appreciate Botticelli's achievements.

** END OF PASSAGE D **

Figure A.19. Passage D--long extract treatment

193

** PASSAGE D **

"Boticelli and the Critics of Florentine Art"

The primary reason for Botticelli's unpopularity is not difficult to understand: most observers did not consider him to be noteworthy because his work did not seem to these observers to exhibit the traditional characteristics of fifteenth-century Florentine art. For example, Botticelli rarely employed the technique of strict perspective and never used chiaroscuro. Another reason for Botticelli's unpopularity may have been that his attitude toward the style of classical art was very different from that of his contemporaries.

When viewers began to examine more closely the relationship of Botticelli's work to the tradition of fifteenth-century Florentine art, his reputation began to grow.


Figure A.20. Passage D--short extract treatment

194

** PASSAGE D **

"Botticelli and the Critics of Florentine Art"

Widespread appreciation by critics of artist Sandro Botticelli (14447-1510) is a relatively recent phenomenon. The primary reasons for Botticelli's past unpopularity were that his work did not seem to exhibit the traditional characteristics of fifteenth-century Florentine art, and his lack of interest in borrowing from the classical style. Botticelli's work, especially the Sistine frescoes, did not generate worldwide attention until 1908, when it was rightly demonstrated that the frescoes shared important features with other fifteenth-century Florentine paintings. However, Botticelli also emphasized a clear depiction of a story, a unique achievement that made the traditional Florentine qualities less central.


Figure A.21. Passage D--abstract treatment.

195

QUESTIONS AND ANSWERS TO PASSAGE D

1. Which of the following would be the most appropriate title for the passage?

(A) Botticelli's Contribution to Florentine Art (B) Botticelli and the Traditions of Classical Art (C) Sandro Botticelli: From Denigration to Appreciation (D) Botticelli and Michelangelo: A Study in Contrasts (E) Standards of Taste: Botticelli's Critical Reputation up to the

Nineteenth Century.

2. It can be inferred that the author of the passage would be likely to find most beneficial a study of an artist that

(A) avoided placing the artist in an evolutionary scheme of the history of art

(B) analyzed the artist's work in relation to the artist's personality (C) analyzed the artist's relationship to the style and subject matter of

classical art (D) anlayzed the artist's work in terms of both traditional

characteristics and unique achievement (E) sanctioned and extended the evaluation of the artist's work made by

the artist's contemporaries

3. The passage suggests that Vasari would most probably have been more enthusiastic about Botticelli's work if that artist's work

(A) had not revealed Botticelli's inability to depict a story clearly (B) had not evolved so straightforwardly from the Florentine art of the

fourteenth century (C) had not seemed to Vasari to be so similar to classical art (D) could have been appreciated by amateur viewers as well as by

connoisseurs (E) could have been included more easily in Vasari's discussion of art

history

4. The author most likely mentions the fact that many of Botticelli's best paintings were "hidden away in obscure churches and private homes" in order to

(A) indicate the difficulty of trying to determine what an artist's best work is

(B) persuade the reader that an artist's work should be available for general public viewing

(C) prove that academic art historians had succeeded in keeping Botticelli's work from general public view

(D) call into question the assertion that antiacademic art historians disagreed with thier predecessors

(E) suggest a reason why, for a period of time, Botticelli's work was not generally appreciated

Figure A.22. Passage D--comprehension test questions.

196

5. The passage suggests that most seventeenth- and eighteenth-century academic art historians and most early-nineteenth-century antiacademic art historians would have disagreed significantly about which of the following?

I. The artistic value of Botticelli's work II. The criteria by which art should be judged

III. The features that characterized fifteenth-century Florentine art

(A) I only (B) II only (C) III only (D) II and III only (E) I, II, and III

6. According to the passage, which of the following is an accurate statement about Botticelli's relation to classical art?

(A) Botticelli more often made use of classical subject matter than classical style.

(B) Botticelli's interest in perspective led him to study classical art. (C) Botticelli's style does not share any similarities with the style of

classical art. (D) Because he saw little classical art, Botticelli did not exhibit much

interest in imitating such art. (E) Although Botticelli sometimes borrowed his subject matter from

classical art, he did not create large-scale paintings of these subjects.

7. According to the passage. Home believed which of the following about the relation of the Sistine frescoes to the tradition of fifteenth-century Florentine art?

(A) The frescoes do not exhibit characteristics of such art. (B) The frescoes exhibit more characteristics of such art than do the

paintings of Michelangelo. (C) The frescoes exhibit some characteristics of such art, but these

qualities are not the dominant features of the frescoes. (D) Some of the frescoes exhibit characteristics of such art, but most do

not. (E) More of the frescoes exhibit skillful representation of anatomical

proportions than skillful representation of the human figure in motion.

8. The passage suggests that, before Home began to study Botticelli's work in 1908, there had been

(A) little appreciation of Botticelli in the English-speaking world (B) an overemphasis on Botticell's transformation, in the Sistine

frescoes, of the principles of classical art (C) no attempt to compare Botticelli's work to that of Michelangelo (D) no thorough investigation of Botticelli's Sistine frescoes (E) little agreement among connoisseurs and amateurs about the merits of

Botticelli's work


197

ON THE SCALE BELOW, YQU WILL INDICATE YOUR OPINION OF HOW EASY OR HARD THE PASSAGE WAS TO READ AND UNDERSTAND. INDICATE YOUR OPINION BY PLACING AN 'X' ANYWHERE ALONG THE SCALE FROM "VERY EASY TO READ" TO "VERY DIFFICULT TO READ".

FOR EXAMPLE, IF YOU THINK THE FIRST PASSAGE WAS EASY TO READ, YOU WOULD MAKE A MARK SUCH AS ILLUSTRATED BELOWi

IF THE SECOND PASSAGE WAS MORE DIFFICULT TO READ THAN THE FIRST, YOU WOULD INDICATE THE DIFFERENCE BY PLACING THE MARK FOR THE SECOND PASSAGE TO THE RIGHT OF THE MARK FOR THE FIRST PASSAGE, AS ILLUSTRATEDi

VERY EASY TO READ — (EXAMPLES) —

VERY DIFFICULT TO READ

1ST PASSAGE READ -'A - -'- — —I

2ND PASSAGE t READ X USE THE FOLLOWING SCALES FOR YOUR RESPONSESi

VERY EASY TO READ

VERY DIFFICULT TO READ

1ST PASSAGE READ

2ND PASSAGE READ

3RD PASSAGE t READ I

4TH PASSAGE I READ I

Figure A.23. Reading difficulty scale.

198

ON THE SCALE BELOW, YOU WILL INDICATE YOUR OPINION OF HOW MUCH OF THE INFORMATION YOU NEEDED TO ANSWER THE QUESTIONS WAS INCLUDED IN THE ABSTRACT OF EACH PASSAGE. INDICATE YOUR OPINION BY PLACING AN 'X' ANYWHERE ALONG THE SCALE FROM "LITTLE OR NO INFORMATION" TO "ALL INFORMATION".

FOR EXAMPLE, IF YOU THINK MOST OF THE NEEDED INFORMATION WAS AVAILABLE FOR THE FIRST PASSAGE, YOU WOULD MAKE A MARK SUCH AS ILLUSTRATED BELOWi

IF THE SECOND PASSAGE HAS LESS OF THE INFORMATION NEEDED TO ANSWER THE QUESTIONS RELATIVE TO THE FIRST PASSAGE READ, YOU WOULD INDICATE BY PLACING THE MARK FOR THE SECOND PASSAGE TO THE LEFT OF THE MARK FOR THE FIRST PASSAGE, AS ILLUSTRATEDi

LITTLE OR NO INFORMATION

AVAILABLE (EXAMPLES)

ALL INFORMATION AVAILABLE

1ST PASSAGE READ

2ND PASSAGE READ

X I I V ' , , ,

USE THE FOLLOWING SCALES FOR YOUR RESPONSES!

LITTLE OR NO INFORMATION AVAILABLE

ALL INFORMATION AVAILABLE

1ST PASSAGE ! READ I

2ND PASSAGE READ

3RD PASSAGE I READ !

4TH PASSAGE READ

Figure A.24. Information availability scale.

199

#define MAIN

/* includes files provided by TURBO-C for screen handling functions */

#include "mcalc.h"

#include "mcvars.h"

#include <stdio.h>

#include <dos.h>

#include <bios.h>

#include <string.h>

#include <math.h>

#include <alloc.h>

#include <mem.h>

#include <time.h>

#define MESTHANKS

#define MESPREAD

#define MESBEFBEG

#define MESANY

#define MESHOME

#define MESINST

#define MESREAD

#define MESTEXT

#define MESQUEST

#define MESWAIT

#define MESMORE

#define MESMENU

#define MESGETQUEST

#define INSTFILE

#define DESIGNFILE

#define TREATICODE

"THANK YOU FOR PARTICIPATING IN THIS EXPERIMENT!!"

••**** PLEASE READ THE INSTRU(rriONS CAREFULLY ****"

"**** BEFORE BEGINNING ****"

"Press any key "

"Press the 'Home' key "

"to read instructions..."

"to begin reading..."

"text is passage "

"after you have completed the questions..."

"Wait a minute..."

"More..."

"'PgDn' for more 'PgUp' to review 'End'

"ASK PROCTOR FOR YOUR COPY OF THE QUESTIONS FOR PASSAGE "

'INSTRUC.DOC"

'DESIGN.DTA"

'F'

when done

#define TREAT2C0DE 'A'

^define TREAT3C0DE 'L'

#define TREAT4C0DE 'S'

#define TREATIEXT "FUL"

#define TREAT2EXT "ABS"

#define TREAT3EXT "EXL"

#define TREAT4EXT "EXS"

#define PASSAGEl "TEXT304."

#define PASSAGE2 "TEXT236."

#define PASSAGES "TEXT178."

#define PASSAGE4 "TEXT238."

#define RECORDFILE "RESULTS.DAT'

#define RECFILELEN 442

^define TLINELIMIT 100

#define TCOLLIMIT 75

#define TDISPLEN 18

#define MENUROW 23

#define MAXTIMES 25

Figure A.25. Experiment program header file listing.

200

#include "subject.h"

mainC)

{

initdisplay();

initvars();

clrscr():

run();

) /* main */

MAIN PROGRAM LOOP

runO

{

int pca,keycntrs[4][4].times[4][MAXTIMES].key3[4][MAXTIMES];

char text[TLINELIMIT][TCOLLIMIT].passage[4].design[5][2] :

int design_num;

getdesign(design.£(design_nuin) :

displaystart();

readinstr():

initarrays(keycntrs.times,keys);

for (pos=«0; pos < 4; pos**) {

gettext(pos.design[pos]);

readtext(design[pos].keycntrs[pos].times[pos],keys[pos]);

getquestions(design[pos]);

>

recordit(design.keycntrs.times,keys,Sdesign_nurn);

/* displayendC);

V } /* run •/

/* function getdesign -- gets the order of treatments from file */

getdesign(char * design, int *dnum)

{

FILE *dfileptr:

int this_record=0,rcntr;

getdesign_num(Sthis_record);

dfileptr « fopen(DESIGNFILE,"r");

for (rcntr»l; rcntr <= this_record: rcntr**) fgets(design,10.dfileptr):

fclose(dfileptr);

*dnum = this_record;

return;

} /* getdesign */

Figure A.26. Main experiment program listing.

201

getdesign_num(int *input_num)

/* function to allow input of the design number */

{

int input_key.num»0;

input_key » 'N':

clrscr():

while (input_key -- N' || input_key --'n') {

while (num < 1 || num > 24) {

printf("XnNnXn** ENTER THE NUMBER GIVEN YOU BY THE PROCTOR : " ) ;

scanf("%d",&num);

if (num<l II num>24) printf (••%c''. 7) :

>

printf ('•\n%d\n\n". num) ;

printf("Is this correct? (type 'y' or 'n') : " ) ;

input_key - getkeyO ;

if (!(input_key»-'Y'I|input_key-»'y')) (

input_key » 'N';

num - 99:

}

>

*input_nuni » num;

return;

} /* getdesign_num */

/* function to display messages at start of the program */

displaystart()

{

int mes_len*0;

clrscr();

writef((80-strlen(MESTHANKS))/2,10,WHITE,strlen(MESTHANKS),MESTHANKS);

writef((80-strlen(MESPREAD))/2,12,WHITE,strlen(MESPREAD).MESPREAD);

writef((80-strlen(MESBEFBEG))/2,14,WHITE,strlen(MESBEFBEG).MESBEFBEG);

mes_len » strlen(MESANY)•strlen(MESINST);

writef((80-mes_len)/2 ,18, WHITE, strlen(MESANY),MESANY);

writef( ((80-mes_len)/2)•strlen(MESANY), 18, WHITE, strlen(MESINST), MESINST)

gotoxy((((80-mes_len)/2) • mes_len), 18);

getkey();

} /* displaystart •/

/* function to read the instructions before starting */

readinstr()

( FILE *ifileptr;

char inst text[TLINELIMIT][TCOLLIMIT] ;


202

char *text_ptrs[TLINELIMIT];

int line =• O.bottora_line. toprow.bottomrow.row;

int input - NULL;

clrscr();

writef((80-strlen(MESWAIT))/2,15,WHITE,strlen(MESWAIT).MESWAIT);

gotoxy((((80-strlen(MESWAIT))/2)•strlen(MESWAIT)).15);

ifileptr - fopen(INSTFILE."r");

while (fgets(inst_text[line].TCOLLIMIT.ifileptr) !=NULL) {

text_ptrs[line] * inst_text[line]:

line**;

}

bottom_line » —line;

fclose(ifileptr);

toprow = 0;

if (bottom_line < TDISPLEN) bottomrow - bottom_line;

else bottomrow « TDISPLEN;

while (input !« ENDKEY) {

clrscr();

for (line»toprow,row-l; line<»bottomrow; line**,row**)

writef(1.row.WHITE.strlen(text_ptrs[line])-1.text_ptrs[line])

if (bottomrow < bottom_line)

writef(73.MENUROW-2.WHITE.strlen(MESMORE).MESMORE);

writef(1.MENUROW.HIGHLIGHTCOLOR.strlen(MESMENU).MESMENU):

gotoxy(strlen(MESMENU)•I,MENUROW);

input » getkey();

switch(input) (

case PGUPKEY :

toprow -- (TDISPLEN*1);

if (toprow < 0) toprow » 0:

bottomrow - toprow • TDISPLEN:

break;

case PGDNKEY :

if (bottomrow < bottom_line) (

toprow •« TDISPLEN*1;

bottomrow *- TDISPLEN*!;

if (bottomrow > bottom_line) bottomrow « bottom_line;

}

break;

case ENDKEY :

if (bottomrow < bottom_line) input = PGDNKEY;

break;

default :

printf("%c".7):

} /* switch */

) /* while input loop */

/• readinstr */


203

/* builds the name of the passagefile to be used as the text to read */

build_textfilename( char *design. char *textfile)

(

switch (*(design*l)) {

case '1' :

strcpy(textfile.PASSAGEl);

break;

case '2' :

strcpy(textfile.PASSAGE2) ;

break;

case '3'

strcpy(textfile,PASSAGES);

break;

case '4' :

strcpy(textfile.PASSAGE4);

break;

) switch (*design) {

case (TREATICODE) :

strncat(textfile,TREATIEXT,4);

break:

case (TREAT2C0DE) :

strncat(textfile,TREAT2EXT,4);

break:

case (TREAT3C0DE) :

strncat(textfile,TREAT3EXT,4);

break:

case (TREAT4C0DE) :

strncat(textfile,TREAT4EXT.4);

break;

}

}

initarrays(int *keycntrs , int *times, int *keys)

{

int cntr;

for (cntr=0:cntr<16:cntr**) *{keycntrs*cntr)=0;

for (cntr=0;cntr<4*MAXTIMES:cntr**) *(times*cntr)=0;

for (cntr=0;cntr<4*MAXTIMES;cntr**) *(keys*cntr)=99:

} /* initarrays */


204

/* function to read the texts of the passages */

readtext(char *design. int *keycntrs. int *times, int *keys)

(

FILE *ifileptr;

char text[TLINELIMIT][TCOLLIMIT];

char *text_ptrs[TLINELIMIT];

char textfile[13];

typedef enum { PgUp, PgDn, End, Other } keycodes:

int line - O,bottom_line,toprow,bottomrow,row.keypress_cntr»0; int input - NULL: long start_time=0.strike_time=0;

clrscr();

writef((80-strlen(MESWAIT))/2.15.WHITE.strlen(MESWAIT).MESWAIT):

gotoxy((((80-strlen(MESWAIT))/2)•strlen(MESWAIT)).15);

build_textfilename(design. textfile);

ifileptr = fopen(textfile."r");

while (fgets(text[line].TCOLLIMIT.ifileptr) I-NULL)

{

text_ptrs[line] • text[line]:

line**;

)

bottom_line = --line:

fclose(ifileptr);

toprow = 0;

if (bottom_line < TDISPLEN) bottomrow - bottom_line:

else bottomrow = TDISPLEN;

time(Sstart_time);

keypress_cntr = 0:

while (input !« ENDKEY) {

clrscr();

for (line-toprow.row-1; line<-bottomrow: line**.row**)

writef(1.row.WHITE.strlen(text_ptrs[line])-1.text_ptrs[line])

if (bottomrow < bottom_line)

writef(73.MENUROW-2.WHITE,strlen(MESMORE).MESMORE);

writef(1.MENUROW.HIGHLIGHTCOLOR,strlen(MESMENU).MESMENU):

gotoxy(strlen(MESMENU)*1,MENUROW):

input = getkey():

time(&strike time):


205

switch(input) {

case PGUPKEY :

toprow -- (TDISPLEN+1);

if (toprow < 0) {

toprow - 0:

printf (•'%c", 7);

}

bottomrow - toprow * TDISPLEN:

if (bottom_line < TDISPLEN) bottomrow - bottom_line:

keycntrs[PgUp] *- 1:

if (keypress_cntr < MAXTIMES) {

*(keys*keypress_cntr) » PgUp:

*(times*keypress_cntr) - (strike_time - start_time);

}

break:

case PGDNKEY :

if (bottomrow < bottom_line) {

toprow *- TDISPLEN * 1;

bottomrow *- TDISPLEN * 1;

if (bottomrow > bottom_line) bottomrow - bottom_line:

> else ( printf("%c",7);

}

keycntrs[PgDn] *- 1;


*(keys*keypress_cntr) = PgDn;

*(times*keypress_cntr) = (strike_time - start_time):

>

break;

case ENDKEY :

if (bottomrow < bottom_line) (

printf("%c",7);

input - PGDNKEY;

}

keycntrs[End] *- 1;

if (keypresa_cntr < MAXTIMES) {

*(keys*keypress_cntr) - End;

*(times*keypress_cntr) - (strike_time - start_time);

}

break;

default :

printf("%c",7) :

keycntrs[Other] *= 1;


*(keys*keypress_cntr) - Other:

*(times*keypress_cntr) = (strike_time - start_time):


206

}

) /* switch */

keypress_cntr *- 1;

) /* while input loop */

}

getquestions(char *design)

(

char message[80],passagename[2];

int mes_len-0,passagenum;

strcpy(message.MESGETQUEST);

strncpy(passagenaine, (design*!) .1) :

passagenum - atoi(passagename);

switch (passagenum) {

case 1 :

strncat(message,"A" .1) :

break;

case 2 :

strncat(message,"B",1);

break;

case 3 :

strncat(message,"C",1);

break;

case 4 :

strncat(message,"D".1);

} /*switch passagenum*/

clrscr():

writef((80-strlen(message))/2.10.WHITE.strlen(message).message) ;

mes_len - strlen(MESHOME)*strlen(MESQUEST):

writef((80-mes_len)/2.14.WHITE.strlen(MESHOME).MESHOME):

writef(((80-mes_len)/2)*strlen(MESHOME).14.WHITE.strlen(MESQUEST).MESQUEST)

gotoxy((((80-mes_len)/2)*mes_len),14);

while (getkeyO !- HOMEKEY) printf (••%c", 7) ;

gettext(int posit, char *design)

{

int mes_len,passagenum;

char message[80]:

char passagename[2];


207

switch (posit) {

case 0 :

strcpy(message,"Your first " ) : break: case 1 :

strcpy(message."Your second " ) ; break: case 2 :

strcpy(message."Your third " ) :

break;

case 3 :

strcpy(message,"Your fourth (last) " ) :

} /*switch posit*/

strcat(message.MESTEXT);

strncpy(passagename.(design*!).!):

passagenum - atoi(passagename):

switch (passagenum) (

case ! :

strncat(message."A".1);

break;

case 2 :

strncat(message."B",1) ;

break;

case 3 :

strncat(message."C",1);

break;

case 4 :

strncat(message."D".1):

} /*switch passagenum*/

clrscr():

writef((80-strlen(message))/2.10.WHITE.strlen(message).message):

mes_len - strlen(MESHOME)*strlen(MESREAD);

writef((80-me3_len)/2.14.WHITE.strlen(MESHOME).MESHOME):

writef(((80-mes_len)/2)*strlen(MESHOME).14.WHITE,strlen(MESREAD).MESREAD)

gotoxy((((80-mes_len)/2)*mes_len).14);

while (getkeyO !- HOMEKEY) printf ( "%c" . 7) :

recordit(char *design.int *keyscnt,int *times.int *keys. int *dnum)

{ FILE *out;

int i:

char string[!5].*path;

char outfile[13]="DUMMY".ext[4]:


itoa(*dnum.ext.lO):

strcpy(outfile.design):

outfile[8]='.•;

outfile[9]='A';

strcat(outfile.ext);

path » searchpath(outfile);

if (path !- NULL)

outfile[9]-'B*;

out = fopen(outfile."w");

f puts (•• START " . out) ;

itoa(*dnum.string,10);

fputs(string.out):

fputs(design,out):

for (i-0;i<!6:i**) {

if (*(keyscnt*i)<99) {

itoa(*(keyscnt*i),string,10);

fputs(string.out);

if ((i*l)%4 -- 0) fputs(" ".out);

}

else

fputs("99 ".out);

)

for (i-0;i<4*MAXTIMES;i**) {

if (*(times*i)<999) {

itoa(*(times*i).string.10);

if (i=-0) fputs("x".out);

fputs(string.out):

if (*(times*i)!=0) fputs(" ".out):

if ( (i*!)%MAXTIMES == 0) fputs("x".out)

}

else

fputs("999 ".out);

}

for (i=0;i<4*MAXTIMES;i**j {

if (*(keys*i)<9) {

itoa(*(keys*i).string.10);

fputs(string.out):

)

else

fputs("9".out);

}

fputs(" END",out);

fclose(out);


APPENDIX B

ADDITIONAL DATA TABLES

209

210

Table B.l Fog indices for full text treatment passages.

Passage Fog Index

A 18.86

B 15 .55

C 17 .96

D 17.92

211

Table B.2. Comprehension score results by subject for passage A.

Subj.

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Al

0 1 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 0 0 0

0 0 0 0

A2

1 1 1 1

1 0 1 1

1 1 1 1

1 0 0 0

0 0 1 1

1 1 0 0

A3

1 1 1 1

1 0 1 1

1 1 1 1

1 1 1 1

0 1 1 1

1 1 1 1

A4

1 1 1 0

0 0 0 1

1 1 1 0

1 0 1 0

1 0 0 1

1 0 0 0

A5

0 1 0 0

0 1 1 1

0 0 1 0

1 0 0 0

0 0 0 0

0 0 1 1

A6

1 1 0 1

1 0 0 1

0 1 1 1

1 0 0 0

0 0 1 1

1 1 0 0

A7

1 1 0 1

1 1 1 1

1 1 0 0

1 0 0 1

0 0 0 1

1 0 0 0

A8

0 0 0 1

0 0 0 0

1 1 1 0

0 0 0 1

1 1 1 1

0 1 0 0

Ave.

.625

.875

.375

.625

.500

.250

.500

.750

.625

.750

.750

.375

.750

.125

.250

.375

.375

.250

.500

.750

.625

.500

.250

.250

Total 2 16 22 12 8 13 13 10

212

Table B.3. Comprehension score results by subject for passage B.

Subj. Bl B2 B3 B4 B5 B6 B7 B8 Ave.

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 0 1

1 1 1 1

0 1 1 1

1 1 1 1

1 0 1 0

1 0 1 1

1 1 1 0

1 0 1 1

1 1 1 1

1 1 1 1

0 0 1 1

1 1 1 1

1 1 0 1

1 1 0 1

0 1 0 1

1 0 1 0

1 1 1 1

1 1 1 0

1 1 0 1

1 1 1 1

0 1 0 1

0 0 0 1

1 1 1 1

1 0 0 0

1 0 0 0

1 1 1 0

1 0 0 0

0 0 1 0

0 0 0 0

1 0 0 0

0 0 0 0

0 1 0 0

0 0 0 1

0 1 1 1

1 0 1 0

1 0 0 1

1 1 0 0

1 1 0 1

0 0 0 1

1 1 1 0

0 1 1 0

0 1 1 0

0 0 0 1

1 0 1 0

.375

.625

.375

.875

.625

.625

.875

.625

.625

.500

.875

.500

.875

.500

.625

.500

.750

.625

.125

.500

.875

.750

.625

.625

Total 23 18 20 18 12 4 13 11

213

Table B.4. Comprehension score results by subject for passage C.

Subj. CI C2 C3 C4 C5 C6 C7 C8 Ave.

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 0 1

1 1 1 1

1 0 1 0

1 0 1 1

0 1 1 1

1 0 1 1

0 1 0 0

0 0 1 1

1 1 1 0

1 1 1 1

0 1 1 1

1 1 1 1

1 1 0 1

1 1 0 1

0 1 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 1

0 1 0 0

1 1 1 1

1 1 1 1

0 1 1 1

1 1 1 1

1 1 1 1

1 1 0 1

1 0 0 1

1 1 0 0

0 1 1 0

0 1 0 0

1 0 0 1

1 1 0 0

1 1 1 1

1 1 1 1

1

1 1 1

1 1 0 1

1 1 1 1

1 1 1 1

1 1 1 0

0 1 0 1

0 1 1 0

0 1 1 1

1 1 1 1

1 1 0 1

.875

.750

.750

.500

.750

.750

.625

.750

.250

.875

.875

.625

.625

.750

.625

.750

.750

.750

.375

.875

.750

.875

.375

.750

Total 23 14 20 3 22 11 23 17

214

Table B.5. Comprehesion score results by subject for passage D.

Subj .

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Total

Dl

1 1 1 0

1 0 1 1

0 1 1 1

1 1 1 1

1 1 1 0

1 1 1 1

20

D2

1 1 1 1

0 0 0 1

1 1 1 1

1 0 1 1

0 0 0 1

0 1 0 1

15

D3

0 0 0 0

1 0 0 1

1 1 1 0

1 0 0 0

0 0 0 0

0 1 0 0

7

D4

0 1 1 0

0 0 1 . 1

1 1 1 1

1 1 1 1

0 0 1 1

1 1 1 1

18

D5

1 0 0 0

1 1 0 0

0 1 0 0

0 0 0 0

0 0 0 0

0 1 0 0

5

D6

0 0 1 0

1 1 0 1

1 1 1 1

0 1 1 1

0 1 1 1

1 1 1 1

18

D7

1 1 1 1

1 0 1 0

0 1 1 0

1 0 1 0

0 1 1 1

0 0 0 1

14

D8

0 0 0 0

0 0 0 0

0 0 0 0

0 0 1 0

0 0 0 0

0 1 0 1

3

Ave.

.500

.500

.625

.250

.625

.250

.375

.625

.500

.875

.750

.500

.625

.375

.750

.500

.125

.375

.500

.500

.375

.875

.375

.750

215

T a b l e B . 6 . t r ea tment .

Comprehension score r e s u l t s by subject across

Subject

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Abstract

.875

.625

.625

.875

.750

.750

.500

.750

.625

.500

.750

.500

.625

.750

.625

.375

.750

.750

.375

.500

.625

.875

.250

.750

Full Text

.625

.875

.750

.500

.625

.250

.875

.625

.500

.875

.875

.500

.750

.375

.625

.500

.750

.625

.500

.875

.875

.500

.375

.250

Long Extract

.375

.500

.375

.250

.625

.625

.375

.750

.625

.750

.875

.375

.625

.500

.750

.750

.375

.250

.500

.500

.375

.875

.375

.750

Short Extract

.500

.750

.375

.625

.500

.250

.625

.625

.250

.875

.750

.625

.875

.125

.250

.500

.125

.375

.125

.750

.750

.750

.625

.625

Average

.594

.688

.531

.563

.625

.469

.594

.688

.500

.750

.813

.500

.719

.438

.563

.531

.500

.500

.375

.656

.656

.750

.406

.594

216

Table B. passage.

Subject

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

7. Comprehension score

Passage A

.625

.875

.375

.625

.500

.250

.500

.750

.625

.750

.750

.375

.750

.125

.250

.375

.375

.250

.500

.750

.625

.500

.250

.250

Passage B

.375

.625

.375

.875

.625

.625

.875

.625

.625

.500

.875

.500

.875

.500

.625

.500

.750

.625

.125

.500

.875

.750

.625

.625

! results

Passage C

.875

.750

.750

.500

.750

.750

.625

.750

.250

.875

.875

.625

.625

.750

.625

.750

.750

.750

.375

.875

.750

.875

.375

.750

by subject

Passage D

.500

.500

.625

.250

.625

.250

.375

.625

.500

.875

.750

.500

.625

.375

.750

.500

.125

.375

.500

.500

.375

.875

.375

.750

across

Average

.594

.688

.531

.563

.625

.469

.594

.688

.500

.750

.813

.500

.719

.438

.563

.531

.500

.500

.375

.656

.656

.750

.406

.594

217

Table B.8. Reading time results by subject across treatment.

Subject

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Abstract

78 62 84 90

104 167 187 195

88 80 86 76

40 109 140 62

121 114 103 76

81 97 113 81

Full Text

306 246 241 298

251 670 745 509

288 311 241 174

194 325 354 230

361 464 309 217

352 431 379 332

Long Extract

131 97 236 142

154 429 165 186

97 189 98 201

182 125 248 103

188 308 308 158

198 162 165 157

Short Extract

108 59 92 163

181 115 216 169

119 119 219 100

80 213 353 60

136 96 208 273

90 143 92 103

218

Table B.9. Reading time results by subject across passage.

Subject

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Passage A

306 246 236 163

181 670 187 195

88 189 219 201

194 213 353 62

188 308 308 273

81 431 113 332

Passage B

131 62 92 90

154 429 745 509

97 80 98 174

80 125 140 230

121 464 208 76

352 143 92 103

Passage C

78 59

241 298

104 167 216 186

119 119 241 100

182 109 354 103

361 114 103 217

90 162 165 157

Passage D

108 97 84 142

251 115 165 169

288 311 86 76

40 325 248 60

136 96 309 158

198 97

379 81

219

Table B.IO. Reading difficulty results by subject across treatment.

Subject

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Abstract

.375

.077

.125

.279

.058

.192

.212

.260

.096

.019

.058

.067

.221

.337

.404

.596

.135

.125

.058

.279

.221

.221

.500

.154

Full Text

.615

.067

.500

.337

.250

.635

.529

.788

.625

.163

.087

.250

.548

.817

.356

.356

.596

.625

.625

.663

.337

.279

.183

.779

Long Extract

.192

.837

.817

.337

.183

.827

.587

.288

.587

.346

.087

.385

.394

.423

.538

.029

.288

.875

.500

.346

.212

.087

.827

.154

Short Extract

.269

.231

.625

.596

.692

.500

.269

.192

.010

.029

.510

.260

.087

.577

.788

.029

.212

.240

.308

.913

.048

.106

.817

.337

220

Table B.ll. Reading difficulty results by subject across passage.

Subject

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Passage A

.615

.067

.817

.596

.692

.635

.212

.260

.096

.346

.510

.385

.548

.577

.788

.596

.288

.875

.500

.913

.221

.279

.500

.779

Passage B

.192

.077

.625

.279

.183

.827

.529

.788

.587

.019

.087

.250

.087

.423

.404

.356

.135

.625

.308

.279

.337

.106

.817

.337

Passage C

.375

.231

.500

.337

.058

.192

.269

.288

.010

.029

.087

.260

.394

.337 ,356 .029

.596

.125

.058

.663

.048

.087

.827

.154

Passage D

.269

.837

.125

.337

.250

.500

.587

.192

.625

.163

.058

.067

.221

.817

.538

.029

.212

.240

.625

.346

.212

.221

.183

.154

221

Table B.12. Information availability results by subject across treatment.

Subject

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Abstract

.260

.096

.125

.721

.500

.817

.279

.375

.212

.163

.721

.385

.356

.308

.462

.163

.413

.490

.298

.096

.346

.654

.250

.346

Full Text

.500

.856

.875

.538

.962

.750

.779

.625

.837

.087

.452

.500

.798

.500

.471

.788

.721

.740

.885

.663

.596

.404

.817

.596

Long Extract

.231

.462

.942

.587

.750

.875

.404

.750

.538

.298

.615

.327

.538

.260

.538

.635

.596

.606

.375

.346

.154

.788

.760

.519

Short Extract

.115

.154

.250

.337

.625

.183

.519

.346

.212

.279

.492

.442

.096

.510

.337

.288

.231

.365

.125

.250

.471

.346

.625

.154

222

Table B.13. Information availability results by subject across passage.

Subject

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

Passage A

.500

.856

.942

.337

.625

.750

.279

.375

.212

.298

.492

.327

.798

.510

.337

.163

.596

.606

.375

.250

.346

.404

.250

.596

Passage B

.231

.096

.250

.721

.750

.875

.779

.625

.538

.163

.615

.500

.096

.260

.462

.788

.413

.740

.125

.096

.596 -.346 .625 .154

Passage C

.260

.154

.875

.538

.500

.817

.519

.750

.212

.279

.452

.442

.538

.308

.471

.635

.721

.490

.298

.663

.471

.788

.760

.519

Passage D

.115

.462

.125

.587

.962

.183

.404

.346

.837

.087

.721

.385

.356

.500

.538

.288

.231

.365

.885

.346

.154

.654

.817

.346

223

Table B . 1 4 . Mean and standard error by passage controlling for treatment for four dependent variables.

Passage

Abstract

Abstract

Abstract

Abstract

Full

Full

Full

Full

Long

Long

Long

Long

Text

Text

Text

Text

Ext.

Ext.

Ext.

Ext.

Short Ext.

Short Ext.

Short Ext.

Short Ext.

A

B

C

D

A

B

C

D

A

B

C

D

A

B

C

D

Comprel

mean

.52!

.646

.708

.688

.542

.667

.729

.542

.438

.604

.688

.458

.500

.563

.646

.396

tension

stderr

.075

.060

.070

.054

.105

.070

.060

.077

.070

.068

.070

.070

.107

.11!

.088

.075

Readii

mean

121.00

94.83

112.50

77.33

363.17

412.33

285.33

310.50

238.33

172.33

159.17

168.00

233.67

119.67

117.17

114.00

ng time

stderr

23.142

12.112

12.024

7.990

69.602

84.925

25.299

17.274

23.147

52.078

12.169

20.917

28.400

19.773

21.749

15.040

Reading

Diff:

mean

.314

.199

.191

.141

.487

.481

.423

.444

.535

.383

.296

.476

.679

.380

.141

.240

Lculty

stderr

.078

.059

.056

.029

.107

.083

.085

.114

.103

.116

.119

.092

.061

.118

.051

.062

Information

Availability

mean

.271

.325

.446

.431

.651

.671

.620

.681

.524

.545

.665

.415

.425

.266

.346

.255

stderr

.033

.102

.085

.090

.073

.047

.067

.135

.100

.106

.048

.063

.057

.081

.062

.040

224

Table B.15. Mean and standard error by treatment controlling for passage for four dependent variables.

Pass

Abstract

Full Text

Long Ext.

Short Ext.

Abstract

Full Text

Long Ext.

Short Ext.

Abstract

Full Text

Long Ext.

Short Ext.

Abstract

Full Text

Long Ext.

Short Ext.

age

A

A

A

A

B

B

B

B

C

C

c c

D

D

D

D

Compreh

mean

.521

.542

.438

.500

.646

.667

.604

.563

.708

.729

.688

.646

.688

.542

.458

.396

ension

stderr

.075

.105

.070

.107

.060

.070

.068

.111

.070

.060

.070

.088

.054

.077

.070

.075

Readii

mean

121.00

363.17

238.33

233.67

94.83

412.33

172.33

119.67

112.50

285.33

159.17

117.17

77.33

310.50

168.00

114.00

ng time

stderr

23.142

69.602

23.147

28.400

12.112

84.925

52.078

19.773

12.024

25.299

12.169

21.749

7.990

17.274

20.917

15.040

Reading

Diff:

mean

.314

.487

.535

.679

.199

.481

.383

.380

.191

.423

.296

.141

.141

.444

.476

.240

iculty

stderr

.078

.107

.103

.061

.059

.083

.116

.118

.056

.085

.119

.051

.029

.114

.092

.062

Information

Avail;

mean

.271

.651

.524

.425

.325

.671

.545

.266

.446

.620

.665

.346

.431

.681

.415

.255

ability

stderr

.033

.073

.100

.057

.102

.047

.106

.081

.085

.067

.048

.062

.090

.135

.063

.040