mapping the diachronic meaning of existential there

1

Mapping meaning with distributional methods:

a diachronic corpus-based study of existential there

Gard B. Jenset

Abstract

The semantics of existential there is discussed in a diachronic, corpus-based perspective. While

previous studies of there have been qualitative or relied on interpreting relative frequencies directly,

the present study combines sophisticated statistical techniques with linguistic theory by means of

distributional semantics (Lenci 2008). It is argued that there in earlier stages of English is not

semantically empty, and that its original meaning is deictic rather than locative. This analysis

combines key insights from previous studies of existential there with a cognitive construction

grammar perspective, and discusses some methodological concerns regarding statistical methods for

creating computational semantic maps from diachronic corpus data.

Keywords: construction grammar, distributional semantics, existential there, semantic maps

1. Introduction

The semantic status of the so-called existential there in English is complicated by two facts. Not only

is the word’s precise meaning disputed, the more fundamental question of whether or not it has any

meaning at all remains to be conclusively settled. The question of the semantics of there is made

more pertinent by the differences in theoretical assumptions underlying the various analyses. A

theoretical stance grounded in the importance of an autonomous syntactic domain would

presumably be more comfortable with postulating a use of the word there with no meaning,

compared to one based on cognitive linguistics where semantics has a more prominent explanatory

function (Gries and Divjak 2010:333).

Diachronically, existential there (1) has evolved from the locative adverb there (2), and already in Old

English (OE) existential uses are attested (Traugott 1992:218).

(1) There was a big squeeze on companies. (existential)

(2) I immediately knew why I was there. (locative)

However, unlike the Present-day English (PdE) examples in (1-2), both of which are from COCA

(Davies 2008), the distinction between locative and existential uses of there in OE can be quite

difficult to draw. In other words, added to the two initial problems facing PdE, the historical linguist

faces a third problem: how to adequately distinguish the two uses. Since historical linguistics is

barred from access to native speaker judgments, corpora are the inevitable and natural source of

information available. Moreover, a move towards more rigorous empirical methods employing

statistical techniques makes it more feasible to treat the complex, multidimensional nature of the

meaning of there.

Against this backdrop, the present study attempts to demonstrate the benefits of a quantitative,

corpus-driven approach to the question of the meaning of there in historical English. It is furthermore

2

argued that this approach is eminently compatible with the theoretical framework of cognitive

linguistics, specifically variants of construction grammar (CxG) as advocated by e.g. Goldberg (1995)

and Croft (2001). That is, this enquiry attempts to quantitatively identify key stages in the semantic

evolution of the word there, by employing statistical methods to single out its most salient

grammatical contexts in OE, Middle English (ME), and Early Modern English (EME). Essentially, the

study responds to the call for more data-driven, or bottom-up, multivariate approaches in historical

semantics (Gries 2011), by seeking to establish (a) whether there in early English has a meaning at all,

and (b) whether such meaning(s) can be distinguished and characterized in a manner consistent with

the corpus data. Further, the study discusses the relative merits of different multivariate statistical

methods in the context of historical corpus linguistics, a context which may differ considerably from

that of contemporary corpus linguistics with accompanying methodological consequences. The

findings of this research are expected to shed new light on the semantics of there, as well as

demonstrating the usefulness of distributional methods in historical semantics.

The article has the following structure: first, the literature on existential there is reviewed. Next, the

research methods and data used in the study are discussed. This is followed by a discussion of the

results of the enquiry. Finally, the article presents the implications, as well as limitations, of the

study.

2. Background and conceptual framework

The semantic status of existential there in English has received considerable attention in previous

studies. However, for methodological reasons a new perspective on this word is warranted. First,

previous studies are mostly limited to qualitative methods, an approach that is particularly

problematic in the context of historical studies of there. Second, to the extent that quantitative data

are used at all, the studies are limited to using raw frequencies, percentages, or simple statistical

tests. Combined, these shortcomings conspire to cloud the issue of the semantic status of existential

there, especially from a diachronic viewpoint. In the following paragraphs, the previous research on

the topic is discussed with the aim of highlighting the advantages of a new methodological approach.

2.1 To mean or not to mean

Two broad, competing perspectives offer different characterizations of the semantic status of

existential there. The first suggests that existential there is (virtually) semantically empty, and

primarily present in the sentence for syntactic and/or pragmatic reasons, as articulated by Abbot,

who does “not think that the word there itself in existential sentences means anything” (Abbott

1993:41). The competing view is that existential there is locative, either directly or in a more broad or

metaphorical sense, whereby it designates a metaphorical or mental space (Breivik 2003:219). The

argument that there has a broad, or metaphorical, locative meaning can be found in Bolinger (1977),

Lakoff (1987), Breivik (1997), and Breivik (2003), whereas Coopmans (1989), Freeze (1992), and to

some extent Lyons (1967), take the view that there is explicitly locative.

The view that there is semantically empty is reflected in terms such as “dummy subject”, applied to

grammatical subjects like existential there that supposedly do not refer to anything (Cruse 2010:390).

It seems reasonable to assume that such a view is at least partially based on the methodological bias

3

in linguistics towards introspection and native speaker judgments. Based on the existing research,

such judgments do not seem to converge in the case of existential there, with its multifaceted

grammatical affiliations.

In the context of historical studies, this position of there as a dummy subject been defended (to

various degrees) in Breivik (1981), Breivik (1990), Pérez-Guerra (1999), and Ingham (2001). Being

diachronic studies, they are corpus based and both Breivik (1981, 1990) and Pérez-Guerra (1999)

make use of graphs and statistics in their argumentation. However, since these studies were

published the availability of both annotated corpora and free, user-friendly statistics software has

improved considerably. In light of those methodological advances it is worth asking whether the

putative meaninglessness of there still holds.

Such a bipartite division, meaning or no meaning, is of course a simplification, as illustrated by Breivik

and Swan’s statement that although “the original semantics of [existential there] has been lost, this

loss has been balanced out by the development of a more abstract meaning” (Breivik and Swan

2000:28). In other words, some meaning (ostensibly of a more concrete kind) has been lost, whereas

another, more abstract yet related to the original one, has been gained. Nevertheless, drawing a line

between the two perspectives is an aid to gaining an overview of previous research on the semantics

of there.

2.2 The “locative existential there” hypothesis

As Breivik notes, the semantics of existential there, even in Present-day English (PdE), is a

controversial and thorny topic (Breivik 2003:219). When considering there in earlier English it

becomes even more difficult to tease out the semantics in question, even when taking context and

commutability into consideration, as the discussion in Breivik and Swan (2000:21–22) attests to.

Broadly speaking, studies that posit a locative meaning for existential there employ many of the

same methods as the ones stressing the word’s lack of reference. An exception is Breivik (1997)

which draws on a large collection of data but does not employ statistical tests. What is lacking is a

method which is both more empirically transparent, in that it clearly spells out the relation between

hypotheses and data, and which at the same time is comprehensive in its coverage of the data.

The acuteness of the methodological problem is illustrated by returning briefly to there in PdE, which

should be more semantically accessible than its diachronic counterpart. There is a tendency in the

literature to stress the naturalness or intuitiveness of a locative interpretation of existential there, as

attested by Lyons who states that “ it might appear reasonable to say that all existential sentence are

at least implicitly locative” (Lyons 1967:390). While this intuition may not be incorrect, it does not

provide a firm methodological foundation for investigating the semantics of there. Similarly, Lakoff

(1987), in what is arguably one of the most elaborate and comprehensive treatments of the

semantics of there, argues that existential there has a metaphorically locative meaning referring to a

mental space, based on intuition and introspection. Its elegance notwithstanding, the analysis lacks

empirical transparency and is difficult to objectively reproduce.

Lakoff (1987) posits that both existential and locative there belong in sets of constructions, radially

structured around a prototypical center, with the existential construction being a metaphorical

extension of the locative one. However, the introspective approach forming the basis of such an

analysis has been sharply (and rightly) criticized for being ad-hoc and “rather weak from a

4

methodological point of view” (Sandra and Rice 1995:123). A case in point is provided by Newmeyer

(1998:209–223), who considers a subset of the constructions posited by Lakoff and, based on the

same data, argues that no such radial structure exists. The point of the criticism is not that a radially

structured category of there-constructions does not exist, it might well do. The problem with such an

analysis is that it is highly dependent on the researcher’s introspections. This, coupled with the lack

of native speaker intuitions, makes it an especially poor methodology for diachronic semantics,

especially when the semantics in question is heavily intertwined with grammar.

2.3 Grammatical meaning and constructions

The locative meaning of there, i.e. its ability to refer to a concrete location, comes about through

deixis which can be defined as locating a referent by using a speech act or one of its participants as

reference point (Cruse 2010:401). It is clear that the existential use of there can only be

differentiated in writing from the locative use based on context (which licenses such differences as

vowel reduction and whether or not the expression can be accompanied by a pointing gesture): the

word forms are identical.

This begs the question of what precisely in the context distinguishes the two uses of there. Position in

the sentence is not enough in itself, since the deictic there may appear in initial position. Nor is the

difference only a result of different verb senses. Instead, the necessary defining context seems to be

supplied by the interplay of grammatical, semantic, and pragmatic factors. This suggests that the

proper context for evaluating the differences between the two uses of there is not at the word level,

but at the level of the construction.

The present study takes its notion of construction from construction grammars (CxG) such as

Goldberg (1995) and Croft (2001). Constructions are assumed to be “form-meaning correspondences

that exist independently of particular verbs [and which] themselves carry meaning, independently of

the words in the sentence” (Goldberg 1995:1). The construction-grammar view implies that

arguments are “licensed not directly as arguments of the verbs, but by the particular constructions”

(Goldberg 1995:10). It follows from this that “semantics” has a wider scope than merely “lexical

semantics” (Goldberg 1995:14). The existential semantics, and the function of the existential

sentence, are in this view properly supplied by the existential construction (EC), not by the verb or

the verb in conjunction with other arguments.

The discussion of there in the context of an EC begs the question of whether there is such a thing as a

separate existential construction. Following (Croft 2001) such constructions are taken to be language

specific, but since the present study is concerned with English only, “EC” is used a short-hand for the

more cumbersome “English Existential Construction”. Newmeyer (1998:221) implies that the notion

of construction itself is considerably less disputed (terminological differences aside) than the

existence of specific, named constructions. Hence, positing the (diachronic) existence of an EC

requires some substantiation. The strongest counter-evidence to the EC would come from

demonstration that its semantics is completely compositional. However, there are strong reasons to

favor a construction-based account.

If existential there is semantically empty then it could hardly contribute to the compositionality of

the semantics of the EC. This would mean that much of the semantic work would be left to the verb,

in most cases a form of be. However, it is not evident that the semantics of be is primarily existential.

5

It could be argued that be is locative or, as has been argued by Langacker (1988:393), neither. A final

argument comes from research on frame semantics and argument-structure constructions. If a

construction carries some kind of meaning, it can be expected to fill in missing meaning in verbs

occurring in it, even with semantically empty nonsense verbs as demonstrated experimentally by

Kako (2006). Although Kako does not deal with the EC, his results predict that native speakers of

English would assign a nonsense verb like logift a semantic property linked with existence and/or

occurrence were it to occur in the context of an EC, as in “there logift a mat on the floor”. As Kako

(2006:573) points out, this would be a property arising from the entire construction (or frame), not a

result of individual words in the construction.

2.4 A distributional approach to meaning

The alternative adopted by the present study is one based on corpus-driven, distributional semantics,

a one-level contextual approach (Cruse 2010:213–214). The notion that the meaning of a word can

be determined based on patterns of co-occurrence in a corpus was famously articulated by Firth

(1957). Distributional methods have been used to study verb semantics (Redington, Crater, and Finch

1998) and constructions (Stefanowitsch and Gries 2003), to mention two examples. Far from being a

“theory –free” approach to semantics, distributional semantics has been used to test linguistic

hypotheses, and there is some evidence supporting the view that semantic associations and textual

co-occurrences are related (Lenci 2008:17–18). Notably, distributional semantics is particularly well

suited to describing those aspects of meaning that interact with syntax, such as argument structure

(Lenci 2008:25).

The fundamental assumption in distributional semantics is that word meaning is considered to be

distributed, and lexical representations are assumed to be gradual, quantitative functions of their

global (corpus) distributions (Lenci 2008:12). Thus, conceptualizing meaning in terms of co-

occurrences has a number of advantages in the context of historical linguistics. First, meaning is seen

as gradual rather than categorical. Second, the influence of syntax can be naturally accounted for by

reference to context. Such a conceptualization of semantics owes much to structuralist thinking, and

might at first glance seem at odds with traditional work in cognitive linguistics, with its emphasis on

rich, elaborately connected meanings (Cuyckens, Sandra, and Rice 1997:36). However, the traditional

cognitive models are vague with respect to methodology (Sandra and Rice 1995), (Cuyckens, Sandra,

and Rice 1997). Quantitative, corpus-based methods offer an alternative (and more objective) usage-

based approach to semantics, which is gradually (albeit slowly) gaining ground in cognitive linguistics

(Gries and Divjak 2010).

Combining distributional semantics with a CxG approach has the benefit that it provides specific

motivation for a corpus-based methodology. Since constructions are form-meaning pairings in their

own right, they are identifiable as such in a corpus. This in turn opens the way for a systematic and

empirically transparent treatment of the meaning of there based on co-occurrence patterns. Such

patterns will act as proxies for constructions in the study, since the corpora do not annotate

constructions explicitly. The notion that semantics can be reduced to corpus co-occurrence patterns

is not uncontroversial, especially if taken literally. However, far from replacing linguistic theorizing

about semantics, corpus co-occurrences, or distributional semantics, are tools to operationalize and

test theoretical notions about meaning (Lenci 2008), (Gries 2011). The ability of corpus-driven

distributional methods to handle the complex interplay between words and their contexts is

6

particularly pertinent in a CxG framework where meaning is considered a function between words

and constructions.

3. Data and methods

Diachronic linguistics is almost inherently corpus-based, in one way or another. The present study

follows a quantitative, bottom-up (or data-driven) methodology, as advocated by Gries (2011).

Although independent from CxG, this distributional approach to semantics is highly compatible with

CxG through the latter’s growing emphasis on usage-based explanations, as argued by Lenci (2008).

Section 3.1 describes the data used in the study and discusses sample size, corpus annotation, and

data extraction. In section 3.2 the general framework of distributional semantics is discussed in the

context of computational semantic maps in linguistics. Finally, section 3.3 offers a comparison of

available techniques for creating such maps, with special reference to the challenges faced by

historical corpus linguistics.

3.1 Data

The data for the study were drawn from three diachronic, manually syntactically annotated corpora

of historical English: the York-Toronto-Helsinki Parsed Corpus of Old English or YCOE (Taylor et al.

2003), the Penn-Helsinki Parsed Corpus of Middle English or PPCME (Kroch and Taylor 2000), and the

Penn-Helsinki Parsed Corpus of Early Modern English or PPCEME (Kroch, Santorini, and Delfs 2004).

Together these corpora total approximately 4.5 million words drawn from historical English prose

between approximately AD 850 and 1700. All the three corpora follow a similar, relatively flat phrase

structure annotation scheme, exemplified in figure 1 below.

Figure 1 Syntactic tree illustrating the phrase structure corpus annotation with an example sentence (or matrix clause, annotated as “IP-MAT”) from the proceedings of the trial of Titus Oates (1683) included in PPCEME.

From these three corpora, a total of 23761 sentences containing either at least one locative adverb

or at least one instance of existential there were extracted using CorpusSearch 2.0. Of these, 9203

sentences were extracted from YCOE, 5471 from PPCME, and 9087 from PPCEME. The extracted

sentences were further processed with bespoke programs written in the scripting language Perl, and

the resulting files were enriched with meta-information from the corpus documentation (e.g. date,

author – if known, and genre) as well as with further information extracted from each sentence. This

7

included the first verb of the sentence, the syntactic tag of that verb, the length of the sentence (in

phrase-structure nodes) and other features. For the present study, the most important information

extracted from the corpora was the grammatical context.

Unlike the two other corpora, YCOE’s annotation does not distinguish between locative and

existential uses of there. To overcome this obstacle, the study treated all instances of there as un-

annotated, and instead rely on bottom-up approaches that attempt to infer properties of words from

their distributions (Schütze 1998). To achieve this, the data were extracted with an eye toward

providing rich contextual information. For each adverb or existential there a trigram was available:

the adverb itself, the element occurring in its immediate right-context (C 1), as well as the element

occurring in the next slot (C 2). This can be represented schematically as Adverb + Context 1 +

Context 2. A concrete example is provided below.

her + (BEPI is) + (ADVP-TMP (ADV^T nu …

The example, from Ælfric’s Homilies Supplemental (coaelhom,+AHom_1:84.56), represents the string

here is now, with grammatical annotation. As the example attests to, the context information

provided by the data is relatively rich. For a full discussion of the extraction and enriching process,

including example Perl scripts, see Jenset (2010).

3.2 Semantic maps and distributional semantics

The corpus data amount to frequencies of co-occurrence between locative adverbs (as well as

existential uses of there) and syntactic contexts. On their own, such frequencies are uninformative,

since the systematic patterns in such large collections of numeric data cannot readily be grasped by

humans. What is required is a statistical method which is unbiased and systematically can reduce the

frequencies into salient, interpretable patterns. The data, exemplified in table 1, are multivariate,

that is, we have multiple categories of rows (words) and multiple categories of columns (syntactic

environments), with observed frequencies of co-occurrence in each cell. Such multivariate data lend

themselves particularly well to reduction and visualization techniques, such as PRINCIPAL

COMPONENT ANALYSIS (PCA), CORRESPONDENCE ANALYSIS (CA), and MULTIDIMENSIONAL SCALING

(MDS).

Table 1: Excerpt from the data set extracted from PPCEME illustrating the matrix format with rows corresponding to locative adverbs (as well as existential there) and columns corresponding to syntactic tags found in the immediate right context of the adverbs. Each cell corresponds to frequencies of co-occurrence. The syntactic categories are (from left to right): be past tense, be present tense, complementizer, conjunction, that-complementizer clause. The full data set contains 62 rows and 135 columns. Similar matrices were constructed for the data from YCOE and PPCME. Note the prolific zero-frequency cells, a feature shared by all of the corpora.

BED BEP C CONJ CP-THT

aboute 0 1 0 0 0

above 0 3 0 2 0

abroade 0 0 0 0 0

afore 0 0 0 1 0

after 0 0 0 0 0

8

Common to multivariate techniques is the assumption that the full dataset, as illustrated in table 1,

can be considered a multidimensional space with columns corresponding to dimensions, and with

cells (i.e. cell values, in this case frequencies) providing coordinates for the row variables in this

multidimensional space. Such an approach allows the use of well-known mathematical techniques to

reduce the multivariate space into a smaller sub-space, which can be visualized as a two-dimensional

plot or map. A good visualization is one that captures as much as possible of the variation (i.e. the

patterns of co-occurrence) in the data in only two dimensions, expressed as a percentage of

explained variation. If the variation explained by the first two principal components is high we have

greater reason to trust the patterns or associations manifesting themselves in the plot. Conversely if

the explained variation is low, the degree of association in the data is low and patterns seen in the

plot are likely to be nothing more than distortions resulting from the visualization technique. For a

more technical discussion of such techniques, see Baayen (2008:127–148) or Rencher (2002). The

versatility of multivariate techniques is attested by the broad range of applications they have found

within linguistics, including research in morphology (Baayen 1994), typology (Croft and Poole 2008),

pragmatics (Iyeiri, Yaguchi, and Baba 2011), translation studies (Jenset and McGillivray 2012), and

language classification (Kroeber and Chrétien 1937).

Multivariate visualization techniques can be applied to linguistic data to produce computational

semantic maps (Croft and Poole 2008). This approach is well suited to approaches based on the

distributional hypothesis approach to semantics (Lenci 2008:11), since multivariate techniques

highlight salient associations that are weighted by frequency of use, or co-occurrence. The resulting

maps or plots can be considered proxies of linguistic (sub-) systems, since the visualization

techniques highlight frequent co-occurrence, while simultaneously taking the full dataset into

account. Thus, the semantic maps represent a best approximation to the overall, or total, structure

of multiple associations that exist in a multivariate set of linguistic data. The strength of these

techniques is that they pick out the most salient associations in the data in a systematic and unbiased

manner, and provide a quantifiable way of assessing the quality of the visualization. This can be

compared with a human being working with some data, who, after noticing a pattern which might be

real enough but perhaps of minor importance “thereafter […] notes mentally every corroborative

item, but unconsciously overlooks or weighs more lightly items which point in other directions”

(Kroeber and Chrétien 1937:97). Since multivariate techniques are based on visualization rather than

null-hypothesis testing (such as the chi-square or t-tests) they yield a richer, more readily

interpretable result: a plot of associations, rather than a single p-value. Hence, multivariate

techniques can be said to bridge the gap between quantitative and qualitative methods.

3.3 Exegesis on the choice of visualization technique

Deciding which multivariate clustering technique to use is a non-trivial methodological choice that

requires some knowledge of both the available techniques and the data at hand. The present section

offers some comments on three commonly used techniques, CA, MDS, and PCA, and provides

specific justification of the chosen technique in the context of historical corpus linguistics.

The multivariate techniques mentioned above are similar in many respects, but subtle differences

exist, both with respect to the recommended type of input data, and the actual reduction and

visualization technique. Hence, the choice of method requires some consideration. MDS is typically

used to visualize associations among columns based on ordinal data. In our case, the aim is to map

9

out associations between rows and columns based on frequencies of co-occurrence, and the

technique that is typically recommended for such data is CA (Greenacre 2007). The third technique,

PCA, is typically recommended for datasets with measurements (height, distance, weight,

temperature, or duration), not frequencies. In most cases, these recommendations found in standard

introductory textbooks such as Baayen (2008); Everitt and Hothorn (2006); Venables and Ripley

(2002) are sound, since they represent a form of best practice within applied statistics.

However, there are situations in which the above textbook approach might fall short. One such

situation is historical corpus linguistics. Table 1 illustrates a situation not uncommon in historical

linguistics: the data are categorized into multiple row and column categories. However, since the size

of historical corpora is limited by what material was passed on and survived, such corpora tend to be

dramatically smaller than corpora for e.g. PdE. As a case in point, compare the 4.5 million words of

historical English prose available through the three corpora used for the present study, covering over

a thousand years, with the over 400 million words in the Corpus of Contemporary American English

(Davies 2008), covering the time from 1990 up to the present. Since even a small sample of language

is likely to contain many syntactic categories, the smaller sample sizes found in historical data run a

particular risk of data matrices with many zero counts such as illustrated in table 1.

As table 1 shows, the data set with as many rows as there are adverbs (including existential there)

and as many columns as there are syntactic tags occurring to the right of the adverbs results in a

matrix which is very sparse. That is, we are left with a matrix in which many, perhaps most, cells have

zero as their value. Thus, corpus data typically result in precisely such matrices with frequency data,

and historical data are likely to have many categories with sparse attestations (due to typically

smaller datasets in historical linguistics); this methodological question is an important concern for

data-driven, computational semantic map-making in diachronic linguistics. The critical question with

such data is which multivariate technique to choose.

Croft and Poole (2008) use MDS for their semantic maps. MDS is a technique for representing (dis-)

similarities between items, whether they be stimuli, linguistic items, or individuals (Everitt and

Hothorn 2006:227). Such an approach is useful for representing the typological variation in e.g.

indefinite pronouns, that is, we have items from a number of different languages but are primarily

interested in the distances between the items (Croft and Poole 2008). The data for the present paper

differ in this respect, since we are not only interested in the items themselves (syntactic tags), but

also how they cluster together with the categories over which they have been sampled (locative

adverbs). Put differently, MDS is a technique for representing (dis-)similarities among columns,

whereas the present study is concerned with co-occurrence patterns of rows and columns. There is

also another reason for not choosing MDS in the present case. MDS will try to maximize the variation

accounted for in two or three dimensions (Baayen 2008:136), whereas techniques such as PCA and

CA will (typically) provide a larger number of dimensions ordered by descending explained variation.

In some cases this ordered, descending explained variation in the dimensions can contribute to the

final analysis, as argued in the present study. For another study where this feature of PCA is given a

linguistic interpretation, see Barðdal et al. (2012). Having excluded MDS, the choice stands between

PCA and CA.

In their study where they evaluate the use of PCA and CA for inferential purposes (rather than the

exploratory purposes highlighted in the present article), Lynn & McCulloch note that when frequency

10

data are highly correlated, PCA can perform as well as, or better than, CA (Lynn and McCulloch

2000:571). Since CA essentially compares proportions of representations, it tends to weigh heavier

the rare instances. The effect of this with sparse, historical data is to produce a bi-plot which is hard

to interpret, if interpretable at all, and where rare co-occurrences are given a disproportional

influence on the final plot. With a dataset such as the one illustrated in table 1, we are not really

interested in proportional representation. We know that there will be frequent zero-count cells, and

many rare co-occurrences with counts close to zero. The real aim with a dataset such as the one

exemplified in table 1 is to reliably pull out the co-occurrences that are highly salient from the great

mass of low-correlational background noise which Meehl (1990:123–127) calls the “crud factor”. One

way of achieving this is to run a PCA on the observed frequencies in the matrix, without normalizing

or standardizing them (i.e. without turning them into proportions). This ensures that the differences

in magnitudes are better represented, and although at odds with the conventional advice, it is

supported theoretically by the results presented in Lynn and McCulloch (2000).

To test this, the data for the present paper were analyzed twice: first with CA using the ca package

(Nenadić and Greenacre 2007) and secondly with PCA. All analyses were carried out in the statistical

software package R (R Development Core Team 2011). As predicted, the CA bi-plots were hard to

interpret, with all categories clustered near the center and no clearly visible associations. PCA with

non-normalized (i.e. raw, rather than relative) frequencies performed much better, with clear,

interpretable associations along the first two dimensions. Again as predicted, using PCA with

normalized frequencies yielded results no better than with CA, providing further support of the

approach advocated here. The above line of reasoning might run the risk of being labeled as

methodological opportunism by simply going for the result that seems to best suit the researcher,

had it not been for the fact that the results discussed in Lynn and McCulloch (2000) also provide

theoretical support for such an approach. It should be stressed that the procedure advocated here is

dependent on the input data. In many cases, CA will be superior with frequency data. In some cases,

the two approaches will be similar, as observed by Iyeiri, Yaguchi, and Baba (2011), who find that for

their data PCA with and without normalized frequencies (i.e. observed frequencies vs. proportions)

give very similar results.

4. Results

The following sections describe the results of applying distributional methods to the data at hand.

Section 4.1 reports on the multivariate analyses with PCA of the three datasets from OE, ME, and

EME. The accompanying bi-plots representing semantic maps are described and interpreted in the

context of the corpus data. Section 4.2 provides a second, more specialized investigation of

association patterns in the data using another statistical technique, viz. logistic regression.

4.1 First investigation: semantic maps

PCA produces a sub-space of the initial multidimensional space represented by each of the three full

matrices. These sub-spaces can be visualized by plotting them in a two-dimensional figure, and it is

this two-dimensional figure that will form the basis of the semantic analysis. The two-dimensional

plot, or bi-plot, of the PCA is a spatial representation of the similarities and differences that arise

from differences in distributions of co-occurrence of locative adverbs and grammatical tags. Hence,

11

the distances that arise in the bi-plot can be interpreted as a proxy of distributional semantic

similarity. Since the dimensions in the PCA sub-space are ordered from the greatest to the smallest

explanatory value (i.e. PC 1 > PC 2 > … > PC n, where n is the total number of dimensions in the

analysis), we can see that some co-occurrences are more important than others. A rule of thumb is

that for a dimension to have some explanatory value, it should account for at least 5% of the

variation. In other words, if a dimension has too little explanatory value, we should be careful about

trying to interpret it in semantic terms, since it might simply be a result of random noise.

Figure 2: PCA bi-plot of the associations between locative adverbs and right-context tags. The total explained variation is 99.2%. The second dimension (PC2) representing a distinction between there (+d+ar) and here (h+ar) is substantial. Note the association between there and the verb be (BEDI) as well as a complementizer position (C).

Figure 2 is a bi-plot of the PCA analysis of the OE data. The quality of the analysis is excellent, with

the first two dimensions accounting for 99.2% of the total variation (PC 1 + PC 2). PC 1, by necessity,

accounts for the largest portion of the variation, but since PC 2 accounts for 15.7%, well above the

rule of thumb threshold of 5%, we can safely interpret the vertical as well as the horizontal

dimensions in the plot. Starting with PC1, we see that this dimension is dominated by the difference

between there (OE đær, coded as +d+ar in YCOE) and all the other adverbs. Turning to the second

dimension, we see that here (OE hær, coded as h+ar in YCOE) stands out from the rest as well. There

is associated with the tags “C”, i.e. a complementizer position, and to a somewhat lesser extent the

verb be. The co-occurrence with a complementizer position is exemplified in (3). In the majority of

cases the complementizer position is empty and the most interesting fact is not the C position itself,

but what it means in the corpus. In YCOE, most subordinate clauses are annotated as complements

12

of complementizer phrases, hence a complementizer position indicates that a subordinate clause

follows next, and it is clear that this context instantiates a locative use of there.

(3) on ælcum lande þær hys geleafa byđ. “in every country there his faith is” (coaelhom,+AHom_5:185.797)

The syntactic structure of (3) following the corpus annotation in YCOE is outlined in figure 3,

illustrating the syntactic relationship between there and the complementizer position which appears

in the bi-plot in figure 2.

Figure 3: Syntactic tree illustrating the corpus annotation’s phrase-structure representation of the example phrase in (3). Only the relevant PP context is shown, the full sentence having been omitted for reasons of space.

In example (4), on the other hand, we see a use of there which, despite the corpus annotation’s

insistence on a locative label, is at least potentially interpretable as an existential use. As the

corresponding syntactic structure in figure 4 suggests, this sentence is very close to the prototypical

structure of an existential clause consisting of there, be, and a nominative NP.

Figure 4: Syntactic tree illustrating a phrase structure that is ambiguous as to whether there is used as a locative adverb or an existential subject. The syntactic structure corresponds very closely to that of an EC.

(4) þær wæs eac ođer cyricweard

“there was also another sexton”

(cogregdC,GDPref_and_3_[C]:25.228.4.3143)

(5) Her Ignatus biscep þrowude.

13

“Here bishop Ignatus suffered.” (cochronA-1,ChronA_[Plummer]:110.1.92)

Turning to here, we can see from (5) and the corresponding syntactic structure in figure 5, that co-

occurrence with a nominative NP points to a locative use.

Figure 5: Syntactic tree illustrating example sentence (5) where here is used in an unambiguous locative position.

As far as OE is concerned, we can conclude that both there and here appear with contexts that are

clearly locative and contexts that may be interpreted as existential, whereas a secondary distinction

exists between here and there, and that this secondary distinction represents a tendency for here to

occur in more typically locative contexts than there. We see a division of labor between there and

here in that they define or outline different dimensions in PCA solution (PC1 vs. PC2, respectively).

The substantial proportion of variance explained by PC2 leads to the conclusion that there and here

are distributionally complementary signs.

14

Figure 6: PCA bi-plot of associations between locative adverbs / existential there and their respective grammatical contexts in ME. The total explained variation is very high (98.5%), making it a good representation. Compared with OE (figure 2) the second dimension has a much lower explanatory potential, which points to less differentiation between here and there.

In ME, the situation is at first glance very similar to OE. The bi-plot in figure 6 shows that the first two

dimensions account for 98.5% of the total variation, again an excellent result. As in OE, most of the

adverbs are clustered together so tightly that none stand out, a sign of overlapping distributions. Also

like in OE, there defines the horizontal axis (PC1) and here defines the vertical axis (PC2), with the

former occurring in both locative and existential contexts, whereas the latter only occurs in locative

contexts. Unlike in OE, we can see that the share of explained variance accounted for by PC2, i.e.

here, is declining. The vertical axis explains only 4.2% of the total variance, which is less than the rule

of thumb threshold value of 5%. Even if this is only a rule of thumb, the reduction in explained

variance which here contributes with is dramatic. However, an explained variance of 4.2% still seems

close enough to the threshold value to merit some consideration, but the overall impression is that

of a gradual decline in contrast. Although there still occurs in locative contexts frequently enough to

show signs of its deictic opposition to here, the magnitude, or importance, of this distinction is clearly

on the ebb.

(6) +ter is a noble Cytee +tat is called Tours. “there is a noble city that is called Tours.” (CMBRUT3,9.234)

(7) +ter he fastyde fourty dayes “there he fasted forty days” (CMAELR3,42.466)

15

(8) God wolde ye had nat come here “God would that you had not come here” (CMMALORY,50.1675)

Some of the salient contexts for here and there in ME are exemplified in (6-8). In (6) there co-occurs

with be, a typical existential context, whereas in (7) there co-occurs with a subject NP (he) which

seems to indicate a locative position. In (8) a sentence with here illustrates another locative position,

the end of the sentence (labeled as “END” in the bi-plot).

Figure 7: Bi-plot of associations between locative adverbs / existential there and their grammatical contexts in EME. The total explained variation is still very high (99.9%), but the second dimension no longer has any meaningful explanatory value. The first dimension, dominated by there, accounts for all the systematic variation in the data.

Finally, turning to the bi-plot for EME in figure 10, we see that the two first dimensions account for a

total of 99.9%, i.e. practically all the variation. PC 1 is as before dominated by there and accounts for

a staggering 97.1% of the total variation, while the second dimension, which is defined by here, only

accounts for 2.8%. The percentage accounted for by PC 2 falls far short of the 5% rule of thumb. This

can only be interpreted to mean that the systematical opposition between there and here no longer

represents the most salient usage. Since the present study’s definition of semantics relies crucially on

precisely contrasts, or oppositions, as the means by which meaning is created, this has a clear

implication: the main opposition (i.e. meaning-bearing distinction) that is found in usage between

there and all other locative adverbs is now the most salient source of the word’s meaning, not its

distinction with here. There is now tied very closely to one specific context, namely different forms of

16

be, exemplified in (9). Although the bi-plot (figure 7) suggests that here still occurs before subjects

(10) and at the end of sentences more often than in other contexts, the minute explanatory value of

PC 2 tells us that this association is no longer strong enough to reliably differentiate here from all

other adverbs. Put differently, while there has become more discriminating about the contexts in

which it appears, here has become more uncritical and occurs in more or less the same contexts as

any other locative adverb.

(9) there is one that accuseth you “there is one who accuses you” (AUTHNEW-E2-H,V,40J.635) (10) Here we refreshed our selues very well with fresh water, “Here we refreshed ourselves very well with fresh water,” (COVERTE-E2-H,18.130)

Following the distributional hypothesis, it is not unreasonable to interpret the PCA bi-plots above as

models of systems of linguistic signs. The crucial aspect of these models is that they primarily

consider the most frequently attested correlations and oppositions, and that they do so in a

systematic and unbiased manner. As such, the models attest to a gradual, diachronic shift in the

system. In OE, the deictic opposition between here and there was clearly highly salient in usage.

Although these two adverbs, through deixis, could refer to locations, this deixis set them apart from

the great mass of other locative adverbs. However, in ME, and even more so in EME, it is clear that

from the point of view of a usage-based model of this system, the deictic properties of here and

there become less prominent. Instead, here becomes more alike all the other locative adverbs, while

there becomes defined by its opposition to these locative adverbs. However, the bi-plots make use of

a rather limited grammatical context. The next section considers to what extent these contexts

represent grammatical constructions and, following CxG, hence semantics.

4.2 Second investigation: searching for constructions

An important question that remains despite the semantic maps presented above is whether we can

really identify an EC in OE by statistical means alone. Although a clear association between there and

the verb be was found, this does not necessarily prove that an EC is the underlying source of that

association. The bigram there – is could easily constitute a deictic use, as in (11). The case for having

identified a potential EC would be much stronger if another element, namely the indefinite NP, could

be identified as well. Since the use and meaning of existential there is most controversial in OE, the

second investigation will deal with OE data only.

(11) There is that book I’ve been looking for all day. (locative)

(12) There is a book on the table. (existential)

The following section will test whether a significant positive correlation exists between there and

indefinite NPs occurring in the second n-gram slot after the adverb, corresponding to the boldface

constituent in (12). If such a correlation can be found, it is reasonable to assume that it reflects the

EC.

To test the association between locative adverbs and indefinite NPs in OE, the context of the second-

most left context of the adverbs in the OE dataset was converted into binary values, with “true”

17

indicating the presence of an indefinite NP in the second left-context of the adverb, and “false”

indicating its absence. The criteria used for defining an NP as “indefinite” was the presence of either

an indefinite quantifier or man, represented in the corpus as either “Q^N” or “MAN^N”.

Furthermore, unambiguous plural nouns occurring without a determiner were also included. A

further restriction was that the NP should be in the nominative case (cf. the “^N” tag), to more

closely correspond to the “logical subject” of an EC. This resulted in 221 explicit indefinite NP

contexts being identified, alongside 15 plurals. Of the total 236 indefinite contexts, 192 occurred with

there, 41 with here and 3 with other locative adverbs. The total dataset numbers 9203 observations,

which makes this a relatively rare phenomenon; however, this is to be expected since it was only in

ME that the existential use of there was fully conventionalized.

Since the EC is distinguished by the combination of there, an indefinite NP, and a form of the verb be,

information about whether a form of be is present should be taken into account. The presence of all

three elements (there, be, and an indefinite NP) together are highly indicative of the EC. Thus, if a

significant correlation can be established between all the elements, this would strengthen our

confidence in the results presented in section 4.1 above.

To determine the presence and nature of any correlations, a binary logistic regression model was

used, with indefinite NP (true/false) as the response variable and the locative adverbs, categorized as

here, there and “other” for better reliability (Manning and Schütze 1999:192), as the first predictor.

The second predictor used in the model is a binary variable indicating whether or not a form of be

was present as the first left context of the adverb (C 1). The third predictor combines information

about the sentence’s total number of tree-nodes, the total number of NPs, and the total number of

finite verbs (as represented by the corpus annotation) into an index of syntactic complexity (Jenset

2010). For a more in-depth explanation of logistic regression, see Baayen (2008:195-208). The

regression model, created with the rms package (Harrell 2012) in R, can be expressed as follows:

( ) ( ).

In other words: the probability of finding an indefinite NP in the second left context of a locative

adverb can be modeled as a linear (log-transformed) function of the overall mean rate of indefinite

NPs (µ), a modifying factor for the adverb categories ( Adverb), the presence (or absence) of the

verb be ( Be), and the overall annotation-complexity of the sentence ( Complexity), as well as

some random variation (ε). The overall model is statistically significant (p < 0.0001). Furthermore,

although the coverage is modest (R2 = 0.15), the predictive capability (C = 0.79) is reasonably close to

the 0.80 threshold proposed by Baayen (2008:204). However, some problems with prediction and

coverage in this case are not surprising: the response data only contained 236 indefinite NPs out of a

total of 9203, which amounts to a weak signal compared to the overall size of the dataset. Thus,

keeping in mind that all models are anyway imperfect simplifications, it seems warranted to accept

the model as being reasonably useful for our purposes.

Table 2 summarizes the model. In line with the initial hypothesis, we see that the category “other

locative adverbs” has a negative coefficient; hence it is not associated with indefinite NPs. Here and

there on the other hand both have positive coefficients, revealing a positive correlation with

indefinite NPs. The small standard errors show that the uncertainties inherent in the estimates are

smaller than the correlations themselves. Together with the significant p-values, this indicates that

we can be reasonably confident in the correlation and its size. The coefficient itself is a log odds ratio,

18

a measure that is not easy to interpret in an intuitive manner. Instead, Gelman and Hill (2007:82)

recommend dividing the coefficient by four as a convenient estimate of the maximum impact that a

variable may have as a percentage increase in the probability of the response. The last column of

table 2 shows the percentage increase in the probability of finding an indefinite NP when switching

from the baseline of “other locative adverbs” to either here or there. Although the absolute

estimated correlation rates between indefinite NPs and here or there are modest (respectively 1.8

and 3.6 per hundred occurrences), the relative impact that here or there has on the probability of

finding such an NP is substantial, and warrants the conclusion that a real, non-trivial correlation

exists between here / there and indefinite NPs in the OE data.

Table 2: Summary of a binary logistic regression model of the OE data showing that compared to all other locative adverbs, here and there are positively correlated with indefinite NPs in the second context-slot, and both are associated with a considerable increase in the probability of finding an indefinite NP. Conversely, increasing the structural complexity of the sentence decreases the probability of finding an indefinite NP. The model was tested for interactions between be and adverbs, but the interaction was not significant and hence removed from the model. All predictors are statistically significant at least at the 0.01 level.

Predictor Coefficient Standard error P-value ± Pr(IndefNP)

Mean: other advs -6.19 0.58 <0.0001 (baseline) There 2.49 0.58 <0.0001 + 62% Here 1.77 0.60 0.0032 + 44% Context 1 = be 1.82 0.14 <0.0001 + 46% Complexity -0.37 0.07 <0.0001 - 9%

Figure 8: Plot visualizing the logistic regression model. Here, there, and be all have similar, positive associations with indefinite NPs. Other locative adverbs are highly negatively correlated with indefinite NPs. The complexity index is also negatively correlated with indefinite NPs, but the effect is much smaller. For each variable, the coefficient in table 2 is

19

represented as a dot. The lines extending from the dot represent confidence intervals, i.e. the maximum uncertainty associated with the coefficient. The dotted vertical line represents no correlation.

In summary, a positive correlation exists not only between there and be, but also between there and

indefinite NPs in the second n-gram slot, as well as between be in the first n-gram slot and indefinite

NPs in the second slot. In effect, it is possible to make reasonable, quantitative predictions about the

presence of one element of the construction in question based on the presence of other elements.

Such a correlation provides a strong indication of the existence of an EC in OE which involves there. It

is worth noting that the effect size of here is also substantial, a fact that will be further discussed

below. Finally, the structural, syntactic complexity of the sentence, as captured by the index

summarizing the corpus annotation, clearly plays a role. It is hardly surprising to discover an

interaction between syntax and semantics in the EC; however, a close scrutiny of the diachronic

syntax of English ECs falls outside the scope of the present study and will be dealt with in subsequent

work.

5. Discussion

The results from the first investigation, discussed in section 4.1 above, testify to a gradually vanishing

opposition between here and there, in their most salient uses. Concurrently with this, here becomes

gradually more similar to other locative adverbs in its most salient use. Of course, this is in some

respect a matter of perspective, but the PCA allows us to unpick the complexities, allowing for a

more fine-grained and nuanced analysis than if the relationships were to be stated in simple

categorical terms. The correlation was substantiated by investigation two, which attested the close

association with another crucial element of an EC, namely indefinite NPs.

5.1 The transition of there

Starting from the first bi-plot (figure 2), its most striking feature is that most of the locative adverbs

in OE are in free variation with their context (that is, the contexts that allow locative adverbs). These

are the ones lumped together in the center of the plot. Compared to this, here and there are highly

associated with the same contexts. If we next turn to the horizontal axis, it is clear that although

there are similarities between here and there, we can also observe differences. From the perspective

of PC 2, there are clear differences between here and there. Note that no similar meaningful

distinctions can be found among the other locative adverbs: the two first dimensions explain so

much of the variation that any remaining differences are likely to be random noise.

In ME (figure 6), we see a broadly similar pattern, but from the perspective of PC 1, here and there

are less similar in the sense that they share contexts to a lesser degree. This is evident from the fact

that here has moved closer to the undifferentiated mass of adverbs in the center of the plot, whereas

there remains far out to one side. At the same time, crucially, the distinction between here and there

in PC 2 is disappearing: the vertical axis now only accounts for 4.8% of the variation, marginally below

the 5% rule of thumb lower threshold for taking it into consideration in the first place. A reasonable

linguistic interpretation is that the deictic opposition is no longer the primary defining difference

between here and there in terms of usage.

20

Turning next to EME (figure 7), we see broadly the same pattern as in ME but with nuances: the

contexts defining there are more narrowly circumscribed, and PC 2 now accounts for such a small

proportion of the variation that it would be irresponsible to attach any linguistic significance to it.

Thus, as far as usage patterns are concerned, here is now indistinguishable from the great mass of

locative adverbs – the fact that the word is clearly visible on the bi-plot means nothing as long as the

axis representing that difference has too little explanatory value. Although there in EME still

functions as a locative adverb, this is no longer a salient feature to our model: the most prominent

usage feature of there is that it is used differently from all the other locative adverbs in a consistent

and decisive manner.

The chronology outlined above is broadly similar to that outlined in e.g. Breivik and Swan (2000). In

OE, the distinction between existential and locative uses of there is less clear than in PdE. By ME, the

use of there as a subject in the EC is established, and by EME it is conventionalized to such a degree

that the locative uses of there can no longer influence the model in any discernible way. As such, the

bi-plot analyses are useful and interesting, but it is only when they are interpreted in the context of

distributional semantics that they can shed new light on the semantics of there.

5.2 A distributional semantics interpretation

A central premise in this context is that a sign, in the linguistic sense, acquires its meaning through

contrasts with other signs (Saussure 1983). Based on this, the following proposal is offered: the

crucial test is not whether there contrasts with its absence in the EC, as argued by e.g. (Bolinger

1977) and Breivik (1990). Rather, the fundamental distributional property governing the semantics of

there is its systematic contrastiveness with all other (relevant) words in all (relevant) syntactic

contexts. This is essentially the position discussed in Cruse (2010:217–218) under the label

“combinatorial normality”. The position entails that word meaning can be captured in two

dimensions: all possible (well-formed) syntactic contexts it may appear in, coupled with all the

possible substitutes for that word. Cruse expresses skepticism toward the usefulness of corpora

(Cruse 2010:216–217), and his criticism of collocational profiles is warranted to the extent that

corpus semantics restricts itself to looking at raw frequencies of co-occurrence between a word A

and a word B. However, as this study has aimed to demonstrate, there is no reason whatsoever for

corpus-based semantics to adopt such a simplistic approach. By creating matrices of co-occurrence

frequencies of all relevant words (in this case all locative adverbs and existential there) and all their

contexts, the resulting semantic maps represented by the bi-plots go a long way toward mirroring a

usage-based, linguistic (sub-) system. The primary point of difference from the position discussed by

Cruse (2010) is the use of corpus frequencies rather than native speaker judgments.

Based on these frequencies, the PCA algorithm has produced what arguably constitutes the most

salient adverb – context associations for each adverb and each context, while simultaneously

creating the best possible representation of the system as a whole. The bi-plots show a marked

decline (and eventually the absence) of the deictic opposition between here and there (i.e. a loss of

contrast, or meaning, as far as usage goes), but we also see new contrasts emerging, specifically

between the combination there + be and the rest of the linguistic (sub-) system. In other words, the

loss of contrast (i.e. meaning) is followed by the gradual acquisition of contrast, that is, the lexical

item there + be is arguably gaining meaning. Hence, from this perspective it does not make sense to

claim that existential there is semantically empty: the usage data clearly indicate a contrastive

21

situation which again implies a difference in meaning, also at the constructional level (Goldberg

1995:67–69).

5.3 The meaning of existential there

The crucial question, then, is whether this newly acquired meaning is somehow related to the

original, locative one, as suggested by the mental space analyses in Lakoff (1987) and Breivik (1997).

The proposition that existential there in earlier English referred to a mental space via a metaphorical

extension is virtually impossible to test quantitatively. It may well be a valid observation; however, it

is commonly assumed that conceptual metaphors exist to furnish target domains (in this case

“existence”) with the structure necessary for everyday functional communication (Cruse 2010:250).

However, in early English it was possible to form an EC without there (Williams 2000). This implies

that we would have to assume two syntactically distinct ECs, one with and one without there.

Furthermore, following the principle of no synonymy, we would have to assume that the two ECs

were either semantically or pragmatically distinct (Goldberg 1995:67), and that one of the two ECs

was felt to be lacking in some respect, requiring some sort of locative elaboration or substantiation.

The present study takes a rather different view, grounded in the principles of distributional

semantics. The position entails that the final analysis should always place great emphasis on the most

salient patterns and association in the data, i.e. the attested usage. It follows from the bi-plots that

both uses of there are equally meaningful (in terms of distinctions), in a related way. The basic,

locative use has a fairly “grammatical” meaning, since its function is to designate a distal location

distinct from its deictic counterpart here. As the analysis of the bi-plot for the OE data attests to, this

is clearly a highly salient meaning of the sign there. The existential use, on the other hand, acts as

what Breivik calls a “presentative signal” (Breivik 1990:150–156) and serves pragmatically to bring

new information to the hearer’s attention in discourse. As such, both uses of there are related to the

psychological concept of selection, i.e. the ability to focus on what is relevant and take focus away

from what is irrelevant (Croft and Cruse 2004:47), and previous scholars have also pointed to the

similarity between deixis and existential statements (Lyons 1967:391). In the locative use, there

serves to selectively focus on a location that is speaker-distal; whereas in the existential use, the

word is used in a construction which selectively focuses on information that is hearer-distal, i.e. new

(Birner and Ward 1993).

The bi-plots highlighted a gradual shift in the (locative) adverb-system from OE to EME. While the

deictic adverbs here and there could point to locations in OE, figure 2 showed that they nevertheless

both stood out in usage from the great mass of locative adverbs. However, the subsequent

development attested in figures 6 and 7 points to a situation in which the locative meaning of here

becomes more prominent, while the locative meaning of there becomes less prominent at the

expense of its existential meaning. Thus, both uses of there have a meaning that can easier be

described in grammatical rather than traditional lexical semantic terms. This grammatical meaning is

in both cases connected with clear, somewhat abstract concepts, and the combinatorial distinctions

present in the relevant linguistic sub-system attest to differences in meaning arising out of contrasts

with other linguistic signs. Far from being an empty subject, the existential use of there arguably had

a meaning in early English. That a pragmatic, presentative-signal meaning is associated with there as

argued by Breivik (1990) seems relatively uncontroversial. However, according to distributional

22

semantics, the question of whether there has any meaning in addition to discourse pragmatics (and

what that meaning is) can only be settled by distributional evidence.

The second investigation, in section 4.2, highlighted an interesting distributional fact: both here and

there share an association with the EC (cf. also figure 2 which shows the association between here

and be). This fact indicates that the “background location”, or mental space, meaning of there cannot

have been a primary factor behind its involvement in the EC. If the “background location” meaning

was an important motivating factor, we would simply not expect to find such a reliable association

between the EC and here, with its meaning of proximal location. Since both here and there appear to

have a stake in the EC in OE, I submit that the primary semantic factor responsible for this is what

they share: a deictic, grammatical meaning, rather than reference to some location. The primary

meaning of deixis is to locate, or select, a referent using the current speech act as reference point

(Cruse 2010:401). This closely corresponds to the EC, which functions to highlight a referent (the new

information represented by the indefinite NP) with reference to the current speech act. Thus, here

and there share an important semantic property with the EC; however, the property is not a locative

one, but deixis or the psychological act of selection itself (Croft and Cruse 2004:47).

If this analysis is correct, the grammaticalization process of there appears to closely follow the

general account proposed by Croft (2001:260–268). In the first stage, the word there shares

important semantic properties with the EC, i.e. the word and construction are “profile equivalent” to

use his terminology (Croft 2001:257). A large semantic overlap implies that only a subtle semantic

shift is required to switch attention away from the individual word over to the construction as a

whole. Diachronically, this shift could have been brought about by the increasingly pragmatic

function of the EC, as it gradually became more entrenched as a means to introduce new information

into a discourse, as argued by Breivik (1990). Since the existential use of there was bound to the EC, it

would not be surprising if the semantic profile of the EC itself were to take on a greater salience over

there, ultimately leaving there as a grammatical element whose meaning has been left somewhat

redundant or usurped by the EC. This suggestion receives indirect support from experimental studies

showing that constructions can supply meaning to the words that comprise them (Kako and Wagner

2001), (Kako 2006).

An important corollary of the analysis above is that the grammaticalization of there is described as a

matter of degree rather than kind. The focus is not that there evolves from being referential/lexical

to being “empty”, but that a grammatical meaning (deixis) in a specific context gradually fused with

the meaning of its host construction (EC), and that the semantic similarities of there and the EC

probably colluded in this process.

This analysis has a number of advantages. First, it provides a principled explanation for why some

language varieties of Swedish and Norwegian can use here as an existential subject (Falk 1993:273),

(Jenset 2010:19), a possibility which also existed in ME (Jenset 2010:213-215). Focusing on the

mental-space reference of existential there in English leaves this fact unexplained, whereas under the

present account the choice of here rather than there is secondary since both are deictic. Second, it

provides a more precise characterization of the diachronic semantics of there. Rather than going

from locative to empty, or from referential to grammatical, the meaning of there appears to have

expanded, adding one grammatical meaning (existential marker) to another, existing grammatical

meaning (deixis). Third, it provides a comprehensive, empirically transparent characterization of the

23

relevant semantic properties involved in the inclusion of there into the EC based on corpus data,

rather than on intuitions about a language for which native speakers can no longer be found. In this

third capacity, the analysis highlights the potential that distributional methods represent for positing

new semantic explanations in historical linguistics.

6. Conclusion

To conclude, the existential use of there was neither empty nor locative, but rather exhibited a

special form of deixis that was eminently compatible with the semantics of the EC itself. The

evidence for this conclusion is twofold. First, the bi-plots in section 4.1 attest to a gradually

weakening distinction between here and there, as the latter becomes gradually more and more

entrenched as a component of the EC. However, following the principle that differences in form

imply differences in meaning, a new meaning can be observed arising from the growing distinction

between there (in the context of the EC) and all other locative adverbs. Second, the fundamental

importance of deixis can be seen in the statistical analyses of the OE data, where section 4.2

demonstrated that both here and there are significantly correlated with the EC. Such a correlation

would not be expected if the most relevant semantic property of the existential subject was to

denote a distal location acting as a (mental) background. Thus, while the discourse pragmatic

meaning of existential there proposed in previous studies seems natural, the present investigation

argues against the proposed locative meaning of existential there in earlier English. In this, the study

has shown the potential of corpus-driven, distributional semantics in historical linguistics. Thus, by

combining annotated historical corpora and the judicious use of appropriate statistical techniques,

historical linguists can not only cover more data in their analyses, but also expect to answer more

questions and do so in a more empirically transparent way.

References

Abbott, Barbara. 1993. A pragmatic account of the definiteness effect in existencial sentences. Journal of Pragmatics 19.39–55. doi:10.1016/0378-2166(93)90069-2.

Baayen, R. Harald. 1994. Derivational productivity and text typology. Journal of Quantitative Linguistics 1.16–34.

--- 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.

Barðdal, Jóhanna, Thomas Smitherman, Valgerður Bjarnadóttir, Serena Danesi, Gard B Jenset, and Barbara McGillivray. 2012. Reconstructing Constructional Semantics: The Dative Subject Construction in Old Norse-Icelandic, Latin, Ancient Greek, Old Russian and Old Lithuanian. Studies in Language 36.511–547.

Birner, Betty, and Gregory Ward. 1993. There-sentences and inversion as distinct constructions: A functional account. Proceedings of the nineteenth annual meeting of the Berkeley Linguistics Society, 19: http://elanguage.net/journals/bls/article/download/2915/2853.

Bolinger, Dwight. 1977. Meaning and form. London: Longman. Breivik, Leiv Egil. 1981. On the interpretation of existential there. Language 57.1–25. --- 1990. Existential there: a synchronic and diachronic study. 2nd ed. Oslo: Novus.

24

--- 1997. There in space and time. Language in time and space: studies in honour of Wolfgang Viereck on the occation of his 60th birthday, ed by. Heinrich Ramisch and Kenneth Wynne, 32–45. Stuttgart: Franz Steiner Verlag.

--- 2003. On Relative Clauses and Locative Expressions in English Existential Sentences. Pragmatics 13. http://elanguage.net/journals/pragmatics/article/view/380.

Breivik, Leiv Egil, and Toril Swan. 2000. The desemanticisation of existential there in a synchronic-diachronic perspective. Words: Structure, meaning, function–A Festschrift for Dieter Kastovsky, ed by. Christiane Dalton-Puffer and Nikolaus Ritt, 19–34. Berlin: Mouton de Gruyter.

Coopmans, Peter. 1989. Where stylistic and syntactic processes meet: Locative inversion in English. Language 65.728–751.

Croft, William. 2001. Radical Construction Grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.

Croft, William, and D. Alan Cruse. 2004. Cognitive linguistics. Cambridge: Cambridge University Press. Croft, William, and Keith T. Poole. 2008. Inferring universals from grammatical variation:

Multidimensional scaling for typological analysis. Theoretical Linguistics 34.1–37. Cruse, D. Alan. 2010. Meaning in Language: An Introduction to Semantics and Pragmatics. 3rd ed.

Oxford: Oxford University Press. Cuyckens, Hubert, Dominiek Sandra, and Sally Rice. 1997. Towards an empirical lexical semantics.

Human contact through language and linguistics, ed by. Birgit Smieja, 35–54. Wiesbaden: Peter Lang.

Davies, Mark. 2008. The Corpus of Contemporary American English (COCA): 410+ million words, 1990-present. Brigham Young University. http://www.americancorpus.org/.

Everitt, Brian S, and Torsten Hothorn. 2006. A handbook of statistical analyses using R. Boca Raton, Fl.: Chapman & Hall/CRC.

Falk, Cecilia. 1993. Non-referential subjects in the history of Swedish. Lund: Department of Scandinavian Languages, University of Lund.

Firth, J. R. 1957. Papers in linguistics 1934-1951. London: Oxford University Press. Freeze, Ray. 1992. Existentials and Other Locatives. Language 68.553–595. Gelman, Andrew, and Jennifer Hill. 2007. Data analysis using regression and multilevel / hierarchical

models. Cambridge: Cambridge University Press. Goldberg, Adele E. 1995. Constructions: A construction grammar approach to argument structure.

Chicago: University of Chicago Press. Greenacre, Michael. 2007. Correspondence analysis in practice. 2nd ed. Boca Raton, FL.: Chapman &

Hall/CRC. Gries, Stefan Th. 2011. Commentary: corpus-based methods. Current methods in historical semantics,

ed by. Kathryn Allan and Justyna Robinson, 184–195. Berlin: Mouton de Gruyter. Gries, Stefan Th, and Dagmar S Divjak. 2010. Quantitative approaches in usage-based cognitive

semantics: myths, erroneous assumptions, and a proposal. Quantitative methods in cognitive semantics: corpus-driven approaches, ed by. Dylan Glynn and Kerstin Fischer, 333–354. Berlin: Mouton de Gruyter.

Harrell, Frank E. 2012. rms: Regression Modeling Strategies. http://CRAN.R-project.org/package=rms. Ingham, Richard. 2001. The structure and function of expletive there in pre-modern English. Reading

working papers in linguistics 5.231–249. Iyeiri, Yoko, Michiko Yaguchi, and Yasumasa Baba. 2011. Principal component analysis of turn-initial

words in spoken interactions. Literary and Linguistic Computing 26.139 –152. Jenset, Gard B. 2010. A Corpus-based Study on the Evolution of There: Statistical Analysis and

Cognitive Interpretation. Bergen: University of Bergen (phd thesis). http://hdl.handle.net/1956/4444.

Jenset, Gard B, and Barbara McGillivray. 2012. Multivariate analyses of affix productivity in translated English. Quantitative Methods in Corpus-Based Translation Studies, ed by. Michael P Oakes and Meng Ji, 301–323. Amsterdam: Jonn Benjamins Publishing Company.

25

Kako, Edward. 2006. The semantics of syntactic frames. Language and cognitive processes 21.562–575.

Kako, Edward, and Laura Wagner. 2001. The semantics of syntactic structures. Trends in Cognitive Sciences 5.102–108.

Kroch, Anthony, Beatrice Santorini, and Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early Modern English.

Kroch, Anthony, and Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English. 2nd ed. Kroeber, A. L, and C. D Chrétien. 1937. Quantitative Classification of Indo-European Languages.

Language 13.83–103. Lakoff, George. 1987. Women, fire, and dangerous things: What categories reveal about the mind.

Chicago: University of Chicago Press. Langacker, Ronald W. 1988. Women, Fire, and Dangerous Things: What Categories Reveal about the

Mind by George Lakoff - Review by Ronald W. Langacker. Language 64.384–395. Lenci, Alessandro. 2008. Distributional semantics in linguistic and cognitive research. Italian journal

of linguistics 20.1–31. Lynn, Henry S., and Charles E. McCulloch. 2000. Using Principal Component Analysis and

Correspondence Analysis for Estimation in Latent Variable Models. Journal of the American Statistical Association 95.561–572.

Lyons, John. 1967. A Note on Possessive, Existential and Locative Sentences. Foundations of Language 3.390–396.

Manning, Christopher, and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA.: MIT Press.

Meehl, Paul E. 1990. Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles that Warrant It. Psychological Inquiry 1.108–141. doi:10.1207/s15327965pli0102_1.

Nenadić, Oleg, and Michael Greenacre. 2007. Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package. Journal of Statistical Software 20.1–13.

Newmeyer, Frederick. 1998. Language form and language function. Cambridge, MA.: MIT Press. Pérez-Guerra, Javier. 1999. Historical English syntax: A statistical corpus-based study on the

organisation of Early Modern English sentences. Lincom studies in Germanic linguistics 11. München: LINCOM.

R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. Vienna. http://www.r-project.org.

Redington, Martin, Nick Crater, and Steven Finch. 1998. Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22.425–469.

Rencher, Alvin C. 2002. Methods of Multivariate Analysis. 2nd ed. New York: Wiley-Interscience. Sandra, Dominiek, and Sally Rice. 1995. Network analyses of prepositional meaning: Mirroring whose

mind—the linguist’s or the language user’s? Cognitive Linguistics 6.89–130. doi:10.1515/cogl.1995.6.1.89.

Saussure, Ferdinand de. 1983. Course in general linguistics. London: Duckworth. Schütze, Hinrich. 1998. Automatic word sense discrimination. Computational Linguistics 24.97–123. Stefanowitsch, Anatol, and Stefan Th Gries. 2003. Collostructions: Investigating the interaction of

words and constructions. International journal of corpus linguistics 8.209–243. Taylor, Ann, Anthony Warner, Susan Pintzuk, and Frank Beths. 2003. The York-Toronto-Helsinki

Parsed Corpus of Old English Prose. Traugott, Elizabeth Closs. 1992. Syntax. The Cambridge History of the English Language, ed by.

Richard Hogg, I: Old English:168–289. Cambridge: Cambridge University Press. Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. 4th ed. New York: Springer. Williams, Alexander. 2000. Null subjects in Middle English existentials. Diachronic syntax: Models and

mechanisms, ed by. S. Pintzuk, G. Tsoulos, and A. Warner, 285–310. Oxford: Oxford University Press.

mapping the diachronic meaning of existential there

Documents