mapping the diachronic meaning of existential there
TRANSCRIPT
1
Mapping meaning with distributional methods:
a diachronic corpus-based study of existential there
Gard B. Jenset
Abstract
The semantics of existential there is discussed in a diachronic, corpus-based perspective. While
previous studies of there have been qualitative or relied on interpreting relative frequencies directly,
the present study combines sophisticated statistical techniques with linguistic theory by means of
distributional semantics (Lenci 2008). It is argued that there in earlier stages of English is not
semantically empty, and that its original meaning is deictic rather than locative. This analysis
combines key insights from previous studies of existential there with a cognitive construction
grammar perspective, and discusses some methodological concerns regarding statistical methods for
creating computational semantic maps from diachronic corpus data.
Keywords: construction grammar, distributional semantics, existential there, semantic maps
1. Introduction
The semantic status of the so-called existential there in English is complicated by two facts. Not only
is the word’s precise meaning disputed, the more fundamental question of whether or not it has any
meaning at all remains to be conclusively settled. The question of the semantics of there is made
more pertinent by the differences in theoretical assumptions underlying the various analyses. A
theoretical stance grounded in the importance of an autonomous syntactic domain would
presumably be more comfortable with postulating a use of the word there with no meaning,
compared to one based on cognitive linguistics where semantics has a more prominent explanatory
function (Gries and Divjak 2010:333).
Diachronically, existential there (1) has evolved from the locative adverb there (2), and already in Old
English (OE) existential uses are attested (Traugott 1992:218).
(1) There was a big squeeze on companies. (existential)
(2) I immediately knew why I was there. (locative)
However, unlike the Present-day English (PdE) examples in (1-2), both of which are from COCA
(Davies 2008), the distinction between locative and existential uses of there in OE can be quite
difficult to draw. In other words, added to the two initial problems facing PdE, the historical linguist
faces a third problem: how to adequately distinguish the two uses. Since historical linguistics is
barred from access to native speaker judgments, corpora are the inevitable and natural source of
information available. Moreover, a move towards more rigorous empirical methods employing
statistical techniques makes it more feasible to treat the complex, multidimensional nature of the
meaning of there.
Against this backdrop, the present study attempts to demonstrate the benefits of a quantitative,
corpus-driven approach to the question of the meaning of there in historical English. It is furthermore
2
argued that this approach is eminently compatible with the theoretical framework of cognitive
linguistics, specifically variants of construction grammar (CxG) as advocated by e.g. Goldberg (1995)
and Croft (2001). That is, this enquiry attempts to quantitatively identify key stages in the semantic
evolution of the word there, by employing statistical methods to single out its most salient
grammatical contexts in OE, Middle English (ME), and Early Modern English (EME). Essentially, the
study responds to the call for more data-driven, or bottom-up, multivariate approaches in historical
semantics (Gries 2011), by seeking to establish (a) whether there in early English has a meaning at all,
and (b) whether such meaning(s) can be distinguished and characterized in a manner consistent with
the corpus data. Further, the study discusses the relative merits of different multivariate statistical
methods in the context of historical corpus linguistics, a context which may differ considerably from
that of contemporary corpus linguistics with accompanying methodological consequences. The
findings of this research are expected to shed new light on the semantics of there, as well as
demonstrating the usefulness of distributional methods in historical semantics.
The article has the following structure: first, the literature on existential there is reviewed. Next, the
research methods and data used in the study are discussed. This is followed by a discussion of the
results of the enquiry. Finally, the article presents the implications, as well as limitations, of the
study.
2. Background and conceptual framework
The semantic status of existential there in English has received considerable attention in previous
studies. However, for methodological reasons a new perspective on this word is warranted. First,
previous studies are mostly limited to qualitative methods, an approach that is particularly
problematic in the context of historical studies of there. Second, to the extent that quantitative data
are used at all, the studies are limited to using raw frequencies, percentages, or simple statistical
tests. Combined, these shortcomings conspire to cloud the issue of the semantic status of existential
there, especially from a diachronic viewpoint. In the following paragraphs, the previous research on
the topic is discussed with the aim of highlighting the advantages of a new methodological approach.
2.1 To mean or not to mean
Two broad, competing perspectives offer different characterizations of the semantic status of
existential there. The first suggests that existential there is (virtually) semantically empty, and
primarily present in the sentence for syntactic and/or pragmatic reasons, as articulated by Abbot,
who does “not think that the word there itself in existential sentences means anything” (Abbott
1993:41). The competing view is that existential there is locative, either directly or in a more broad or
metaphorical sense, whereby it designates a metaphorical or mental space (Breivik 2003:219). The
argument that there has a broad, or metaphorical, locative meaning can be found in Bolinger (1977),
Lakoff (1987), Breivik (1997), and Breivik (2003), whereas Coopmans (1989), Freeze (1992), and to
some extent Lyons (1967), take the view that there is explicitly locative.
The view that there is semantically empty is reflected in terms such as “dummy subject”, applied to
grammatical subjects like existential there that supposedly do not refer to anything (Cruse 2010:390).
It seems reasonable to assume that such a view is at least partially based on the methodological bias
3
in linguistics towards introspection and native speaker judgments. Based on the existing research,
such judgments do not seem to converge in the case of existential there, with its multifaceted
grammatical affiliations.
In the context of historical studies, this position of there as a dummy subject been defended (to
various degrees) in Breivik (1981), Breivik (1990), Pérez-Guerra (1999), and Ingham (2001). Being
diachronic studies, they are corpus based and both Breivik (1981, 1990) and Pérez-Guerra (1999)
make use of graphs and statistics in their argumentation. However, since these studies were
published the availability of both annotated corpora and free, user-friendly statistics software has
improved considerably. In light of those methodological advances it is worth asking whether the
putative meaninglessness of there still holds.
Such a bipartite division, meaning or no meaning, is of course a simplification, as illustrated by Breivik
and Swan’s statement that although “the original semantics of [existential there] has been lost, this
loss has been balanced out by the development of a more abstract meaning” (Breivik and Swan
2000:28). In other words, some meaning (ostensibly of a more concrete kind) has been lost, whereas
another, more abstract yet related to the original one, has been gained. Nevertheless, drawing a line
between the two perspectives is an aid to gaining an overview of previous research on the semantics
of there.
2.2 The “locative existential there” hypothesis
As Breivik notes, the semantics of existential there, even in Present-day English (PdE), is a
controversial and thorny topic (Breivik 2003:219). When considering there in earlier English it
becomes even more difficult to tease out the semantics in question, even when taking context and
commutability into consideration, as the discussion in Breivik and Swan (2000:21–22) attests to.
Broadly speaking, studies that posit a locative meaning for existential there employ many of the
same methods as the ones stressing the word’s lack of reference. An exception is Breivik (1997)
which draws on a large collection of data but does not employ statistical tests. What is lacking is a
method which is both more empirically transparent, in that it clearly spells out the relation between
hypotheses and data, and which at the same time is comprehensive in its coverage of the data.
The acuteness of the methodological problem is illustrated by returning briefly to there in PdE, which
should be more semantically accessible than its diachronic counterpart. There is a tendency in the
literature to stress the naturalness or intuitiveness of a locative interpretation of existential there, as
attested by Lyons who states that “ it might appear reasonable to say that all existential sentence are
at least implicitly locative” (Lyons 1967:390). While this intuition may not be incorrect, it does not
provide a firm methodological foundation for investigating the semantics of there. Similarly, Lakoff
(1987), in what is arguably one of the most elaborate and comprehensive treatments of the
semantics of there, argues that existential there has a metaphorically locative meaning referring to a
mental space, based on intuition and introspection. Its elegance notwithstanding, the analysis lacks
empirical transparency and is difficult to objectively reproduce.
Lakoff (1987) posits that both existential and locative there belong in sets of constructions, radially
structured around a prototypical center, with the existential construction being a metaphorical
extension of the locative one. However, the introspective approach forming the basis of such an
analysis has been sharply (and rightly) criticized for being ad-hoc and “rather weak from a
4
methodological point of view” (Sandra and Rice 1995:123). A case in point is provided by Newmeyer
(1998:209–223), who considers a subset of the constructions posited by Lakoff and, based on the
same data, argues that no such radial structure exists. The point of the criticism is not that a radially
structured category of there-constructions does not exist, it might well do. The problem with such an
analysis is that it is highly dependent on the researcher’s introspections. This, coupled with the lack
of native speaker intuitions, makes it an especially poor methodology for diachronic semantics,
especially when the semantics in question is heavily intertwined with grammar.
2.3 Grammatical meaning and constructions
The locative meaning of there, i.e. its ability to refer to a concrete location, comes about through
deixis which can be defined as locating a referent by using a speech act or one of its participants as
reference point (Cruse 2010:401). It is clear that the existential use of there can only be
differentiated in writing from the locative use based on context (which licenses such differences as
vowel reduction and whether or not the expression can be accompanied by a pointing gesture): the
word forms are identical.
This begs the question of what precisely in the context distinguishes the two uses of there. Position in
the sentence is not enough in itself, since the deictic there may appear in initial position. Nor is the
difference only a result of different verb senses. Instead, the necessary defining context seems to be
supplied by the interplay of grammatical, semantic, and pragmatic factors. This suggests that the
proper context for evaluating the differences between the two uses of there is not at the word level,
but at the level of the construction.
The present study takes its notion of construction from construction grammars (CxG) such as
Goldberg (1995) and Croft (2001). Constructions are assumed to be “form-meaning correspondences
that exist independently of particular verbs [and which] themselves carry meaning, independently of
the words in the sentence” (Goldberg 1995:1). The construction-grammar view implies that
arguments are “licensed not directly as arguments of the verbs, but by the particular constructions”
(Goldberg 1995:10). It follows from this that “semantics” has a wider scope than merely “lexical
semantics” (Goldberg 1995:14). The existential semantics, and the function of the existential
sentence, are in this view properly supplied by the existential construction (EC), not by the verb or
the verb in conjunction with other arguments.
The discussion of there in the context of an EC begs the question of whether there is such a thing as a
separate existential construction. Following (Croft 2001) such constructions are taken to be language
specific, but since the present study is concerned with English only, “EC” is used a short-hand for the
more cumbersome “English Existential Construction”. Newmeyer (1998:221) implies that the notion
of construction itself is considerably less disputed (terminological differences aside) than the
existence of specific, named constructions. Hence, positing the (diachronic) existence of an EC
requires some substantiation. The strongest counter-evidence to the EC would come from
demonstration that its semantics is completely compositional. However, there are strong reasons to
favor a construction-based account.
If existential there is semantically empty then it could hardly contribute to the compositionality of
the semantics of the EC. This would mean that much of the semantic work would be left to the verb,
in most cases a form of be. However, it is not evident that the semantics of be is primarily existential.
5
It could be argued that be is locative or, as has been argued by Langacker (1988:393), neither. A final
argument comes from research on frame semantics and argument-structure constructions. If a
construction carries some kind of meaning, it can be expected to fill in missing meaning in verbs
occurring in it, even with semantically empty nonsense verbs as demonstrated experimentally by
Kako (2006). Although Kako does not deal with the EC, his results predict that native speakers of
English would assign a nonsense verb like logift a semantic property linked with existence and/or
occurrence were it to occur in the context of an EC, as in “there logift a mat on the floor”. As Kako
(2006:573) points out, this would be a property arising from the entire construction (or frame), not a
result of individual words in the construction.
2.4 A distributional approach to meaning
The alternative adopted by the present study is one based on corpus-driven, distributional semantics,
a one-level contextual approach (Cruse 2010:213–214). The notion that the meaning of a word can
be determined based on patterns of co-occurrence in a corpus was famously articulated by Firth
(1957). Distributional methods have been used to study verb semantics (Redington, Crater, and Finch
1998) and constructions (Stefanowitsch and Gries 2003), to mention two examples. Far from being a
“theory –free” approach to semantics, distributional semantics has been used to test linguistic
hypotheses, and there is some evidence supporting the view that semantic associations and textual
co-occurrences are related (Lenci 2008:17–18). Notably, distributional semantics is particularly well
suited to describing those aspects of meaning that interact with syntax, such as argument structure
(Lenci 2008:25).
The fundamental assumption in distributional semantics is that word meaning is considered to be
distributed, and lexical representations are assumed to be gradual, quantitative functions of their
global (corpus) distributions (Lenci 2008:12). Thus, conceptualizing meaning in terms of co-
occurrences has a number of advantages in the context of historical linguistics. First, meaning is seen
as gradual rather than categorical. Second, the influence of syntax can be naturally accounted for by
reference to context. Such a conceptualization of semantics owes much to structuralist thinking, and
might at first glance seem at odds with traditional work in cognitive linguistics, with its emphasis on
rich, elaborately connected meanings (Cuyckens, Sandra, and Rice 1997:36). However, the traditional
cognitive models are vague with respect to methodology (Sandra and Rice 1995), (Cuyckens, Sandra,
and Rice 1997). Quantitative, corpus-based methods offer an alternative (and more objective) usage-
based approach to semantics, which is gradually (albeit slowly) gaining ground in cognitive linguistics
(Gries and Divjak 2010).
Combining distributional semantics with a CxG approach has the benefit that it provides specific
motivation for a corpus-based methodology. Since constructions are form-meaning pairings in their
own right, they are identifiable as such in a corpus. This in turn opens the way for a systematic and
empirically transparent treatment of the meaning of there based on co-occurrence patterns. Such
patterns will act as proxies for constructions in the study, since the corpora do not annotate
constructions explicitly. The notion that semantics can be reduced to corpus co-occurrence patterns
is not uncontroversial, especially if taken literally. However, far from replacing linguistic theorizing
about semantics, corpus co-occurrences, or distributional semantics, are tools to operationalize and
test theoretical notions about meaning (Lenci 2008), (Gries 2011). The ability of corpus-driven
distributional methods to handle the complex interplay between words and their contexts is
6
particularly pertinent in a CxG framework where meaning is considered a function between words
and constructions.
3. Data and methods
Diachronic linguistics is almost inherently corpus-based, in one way or another. The present study
follows a quantitative, bottom-up (or data-driven) methodology, as advocated by Gries (2011).
Although independent from CxG, this distributional approach to semantics is highly compatible with
CxG through the latter’s growing emphasis on usage-based explanations, as argued by Lenci (2008).
Section 3.1 describes the data used in the study and discusses sample size, corpus annotation, and
data extraction. In section 3.2 the general framework of distributional semantics is discussed in the
context of computational semantic maps in linguistics. Finally, section 3.3 offers a comparison of
available techniques for creating such maps, with special reference to the challenges faced by
historical corpus linguistics.
3.1 Data
The data for the study were drawn from three diachronic, manually syntactically annotated corpora
of historical English: the York-Toronto-Helsinki Parsed Corpus of Old English or YCOE (Taylor et al.
2003), the Penn-Helsinki Parsed Corpus of Middle English or PPCME (Kroch and Taylor 2000), and the
Penn-Helsinki Parsed Corpus of Early Modern English or PPCEME (Kroch, Santorini, and Delfs 2004).
Together these corpora total approximately 4.5 million words drawn from historical English prose
between approximately AD 850 and 1700. All the three corpora follow a similar, relatively flat phrase
structure annotation scheme, exemplified in figure 1 below.
Figure 1 Syntactic tree illustrating the phrase structure corpus annotation with an example sentence (or matrix clause, annotated as “IP-MAT”) from the proceedings of the trial of Titus Oates (1683) included in PPCEME.
From these three corpora, a total of 23761 sentences containing either at least one locative adverb
or at least one instance of existential there were extracted using CorpusSearch 2.0. Of these, 9203
sentences were extracted from YCOE, 5471 from PPCME, and 9087 from PPCEME. The extracted
sentences were further processed with bespoke programs written in the scripting language Perl, and
the resulting files were enriched with meta-information from the corpus documentation (e.g. date,
author – if known, and genre) as well as with further information extracted from each sentence. This
7
included the first verb of the sentence, the syntactic tag of that verb, the length of the sentence (in
phrase-structure nodes) and other features. For the present study, the most important information
extracted from the corpora was the grammatical context.
Unlike the two other corpora, YCOE’s annotation does not distinguish between locative and
existential uses of there. To overcome this obstacle, the study treated all instances of there as un-
annotated, and instead rely on bottom-up approaches that attempt to infer properties of words from
their distributions (Schütze 1998). To achieve this, the data were extracted with an eye toward
providing rich contextual information. For each adverb or existential there a trigram was available:
the adverb itself, the element occurring in its immediate right-context (C 1), as well as the element
occurring in the next slot (C 2). This can be represented schematically as Adverb + Context 1 +
Context 2. A concrete example is provided below.
her + (BEPI is) + (ADVP-TMP (ADV^T nu …
The example, from Ælfric’s Homilies Supplemental (coaelhom,+AHom_1:84.56), represents the string
here is now, with grammatical annotation. As the example attests to, the context information
provided by the data is relatively rich. For a full discussion of the extraction and enriching process,
including example Perl scripts, see Jenset (2010).
3.2 Semantic maps and distributional semantics
The corpus data amount to frequencies of co-occurrence between locative adverbs (as well as
existential uses of there) and syntactic contexts. On their own, such frequencies are uninformative,
since the systematic patterns in such large collections of numeric data cannot readily be grasped by
humans. What is required is a statistical method which is unbiased and systematically can reduce the
frequencies into salient, interpretable patterns. The data, exemplified in table 1, are multivariate,
that is, we have multiple categories of rows (words) and multiple categories of columns (syntactic
environments), with observed frequencies of co-occurrence in each cell. Such multivariate data lend
themselves particularly well to reduction and visualization techniques, such as PRINCIPAL
COMPONENT ANALYSIS (PCA), CORRESPONDENCE ANALYSIS (CA), and MULTIDIMENSIONAL SCALING
(MDS).
Table 1: Excerpt from the data set extracted from PPCEME illustrating the matrix format with rows corresponding to locative adverbs (as well as existential there) and columns corresponding to syntactic tags found in the immediate right context of the adverbs. Each cell corresponds to frequencies of co-occurrence. The syntactic categories are (from left to right): be past tense, be present tense, complementizer, conjunction, that-complementizer clause. The full data set contains 62 rows and 135 columns. Similar matrices were constructed for the data from YCOE and PPCME. Note the prolific zero-frequency cells, a feature shared by all of the corpora.
BED BEP C CONJ CP-THT
aboute 0 1 0 0 0
above 0 3 0 2 0
abroade 0 0 0 0 0
afore 0 0 0 1 0
after 0 0 0 0 0
8
Common to multivariate techniques is the assumption that the full dataset, as illustrated in table 1,
can be considered a multidimensional space with columns corresponding to dimensions, and with
cells (i.e. cell values, in this case frequencies) providing coordinates for the row variables in this
multidimensional space. Such an approach allows the use of well-known mathematical techniques to
reduce the multivariate space into a smaller sub-space, which can be visualized as a two-dimensional
plot or map. A good visualization is one that captures as much as possible of the variation (i.e. the
patterns of co-occurrence) in the data in only two dimensions, expressed as a percentage of
explained variation. If the variation explained by the first two principal components is high we have
greater reason to trust the patterns or associations manifesting themselves in the plot. Conversely if
the explained variation is low, the degree of association in the data is low and patterns seen in the
plot are likely to be nothing more than distortions resulting from the visualization technique. For a
more technical discussion of such techniques, see Baayen (2008:127–148) or Rencher (2002). The
versatility of multivariate techniques is attested by the broad range of applications they have found
within linguistics, including research in morphology (Baayen 1994), typology (Croft and Poole 2008),
pragmatics (Iyeiri, Yaguchi, and Baba 2011), translation studies (Jenset and McGillivray 2012), and
language classification (Kroeber and Chrétien 1937).
Multivariate visualization techniques can be applied to linguistic data to produce computational
semantic maps (Croft and Poole 2008). This approach is well suited to approaches based on the
distributional hypothesis approach to semantics (Lenci 2008:11), since multivariate techniques
highlight salient associations that are weighted by frequency of use, or co-occurrence. The resulting
maps or plots can be considered proxies of linguistic (sub-) systems, since the visualization
techniques highlight frequent co-occurrence, while simultaneously taking the full dataset into
account. Thus, the semantic maps represent a best approximation to the overall, or total, structure
of multiple associations that exist in a multivariate set of linguistic data. The strength of these
techniques is that they pick out the most salient associations in the data in a systematic and unbiased
manner, and provide a quantifiable way of assessing the quality of the visualization. This can be
compared with a human being working with some data, who, after noticing a pattern which might be
real enough but perhaps of minor importance “thereafter […] notes mentally every corroborative
item, but unconsciously overlooks or weighs more lightly items which point in other directions”
(Kroeber and Chrétien 1937:97). Since multivariate techniques are based on visualization rather than
null-hypothesis testing (such as the chi-square or t-tests) they yield a richer, more readily
interpretable result: a plot of associations, rather than a single p-value. Hence, multivariate
techniques can be said to bridge the gap between quantitative and qualitative methods.
3.3 Exegesis on the choice of visualization technique
Deciding which multivariate clustering technique to use is a non-trivial methodological choice that
requires some knowledge of both the available techniques and the data at hand. The present section
offers some comments on three commonly used techniques, CA, MDS, and PCA, and provides
specific justification of the chosen technique in the context of historical corpus linguistics.
The multivariate techniques mentioned above are similar in many respects, but subtle differences
exist, both with respect to the recommended type of input data, and the actual reduction and
visualization technique. Hence, the choice of method requires some consideration. MDS is typically
used to visualize associations among columns based on ordinal data. In our case, the aim is to map
9
out associations between rows and columns based on frequencies of co-occurrence, and the
technique that is typically recommended for such data is CA (Greenacre 2007). The third technique,
PCA, is typically recommended for datasets with measurements (height, distance, weight,
temperature, or duration), not frequencies. In most cases, these recommendations found in standard
introductory textbooks such as Baayen (2008); Everitt and Hothorn (2006); Venables and Ripley
(2002) are sound, since they represent a form of best practice within applied statistics.
However, there are situations in which the above textbook approach might fall short. One such
situation is historical corpus linguistics. Table 1 illustrates a situation not uncommon in historical
linguistics: the data are categorized into multiple row and column categories. However, since the size
of historical corpora is limited by what material was passed on and survived, such corpora tend to be
dramatically smaller than corpora for e.g. PdE. As a case in point, compare the 4.5 million words of
historical English prose available through the three corpora used for the present study, covering over
a thousand years, with the over 400 million words in the Corpus of Contemporary American English
(Davies 2008), covering the time from 1990 up to the present. Since even a small sample of language
is likely to contain many syntactic categories, the smaller sample sizes found in historical data run a
particular risk of data matrices with many zero counts such as illustrated in table 1.
As table 1 shows, the data set with as many rows as there are adverbs (including existential there)
and as many columns as there are syntactic tags occurring to the right of the adverbs results in a
matrix which is very sparse. That is, we are left with a matrix in which many, perhaps most, cells have
zero as their value. Thus, corpus data typically result in precisely such matrices with frequency data,
and historical data are likely to have many categories with sparse attestations (due to typically
smaller datasets in historical linguistics); this methodological question is an important concern for
data-driven, computational semantic map-making in diachronic linguistics. The critical question with
such data is which multivariate technique to choose.
Croft and Poole (2008) use MDS for their semantic maps. MDS is a technique for representing (dis-)
similarities between items, whether they be stimuli, linguistic items, or individuals (Everitt and
Hothorn 2006:227). Such an approach is useful for representing the typological variation in e.g.
indefinite pronouns, that is, we have items from a number of different languages but are primarily
interested in the distances between the items (Croft and Poole 2008). The data for the present paper
differ in this respect, since we are not only interested in the items themselves (syntactic tags), but
also how they cluster together with the categories over which they have been sampled (locative
adverbs). Put differently, MDS is a technique for representing (dis-)similarities among columns,
whereas the present study is concerned with co-occurrence patterns of rows and columns. There is
also another reason for not choosing MDS in the present case. MDS will try to maximize the variation
accounted for in two or three dimensions (Baayen 2008:136), whereas techniques such as PCA and
CA will (typically) provide a larger number of dimensions ordered by descending explained variation.
In some cases this ordered, descending explained variation in the dimensions can contribute to the
final analysis, as argued in the present study. For another study where this feature of PCA is given a
linguistic interpretation, see Barðdal et al. (2012). Having excluded MDS, the choice stands between
PCA and CA.
In their study where they evaluate the use of PCA and CA for inferential purposes (rather than the
exploratory purposes highlighted in the present article), Lynn & McCulloch note that when frequency
10
data are highly correlated, PCA can perform as well as, or better than, CA (Lynn and McCulloch
2000:571). Since CA essentially compares proportions of representations, it tends to weigh heavier
the rare instances. The effect of this with sparse, historical data is to produce a bi-plot which is hard
to interpret, if interpretable at all, and where rare co-occurrences are given a disproportional
influence on the final plot. With a dataset such as the one illustrated in table 1, we are not really
interested in proportional representation. We know that there will be frequent zero-count cells, and
many rare co-occurrences with counts close to zero. The real aim with a dataset such as the one
exemplified in table 1 is to reliably pull out the co-occurrences that are highly salient from the great
mass of low-correlational background noise which Meehl (1990:123–127) calls the “crud factor”. One
way of achieving this is to run a PCA on the observed frequencies in the matrix, without normalizing
or standardizing them (i.e. without turning them into proportions). This ensures that the differences
in magnitudes are better represented, and although at odds with the conventional advice, it is
supported theoretically by the results presented in Lynn and McCulloch (2000).
To test this, the data for the present paper were analyzed twice: first with CA using the ca package
(Nenadić and Greenacre 2007) and secondly with PCA. All analyses were carried out in the statistical
software package R (R Development Core Team 2011). As predicted, the CA bi-plots were hard to
interpret, with all categories clustered near the center and no clearly visible associations. PCA with
non-normalized (i.e. raw, rather than relative) frequencies performed much better, with clear,
interpretable associations along the first two dimensions. Again as predicted, using PCA with
normalized frequencies yielded results no better than with CA, providing further support of the
approach advocated here. The above line of reasoning might run the risk of being labeled as
methodological opportunism by simply going for the result that seems to best suit the researcher,
had it not been for the fact that the results discussed in Lynn and McCulloch (2000) also provide
theoretical support for such an approach. It should be stressed that the procedure advocated here is
dependent on the input data. In many cases, CA will be superior with frequency data. In some cases,
the two approaches will be similar, as observed by Iyeiri, Yaguchi, and Baba (2011), who find that for
their data PCA with and without normalized frequencies (i.e. observed frequencies vs. proportions)
give very similar results.
4. Results
The following sections describe the results of applying distributional methods to the data at hand.
Section 4.1 reports on the multivariate analyses with PCA of the three datasets from OE, ME, and
EME. The accompanying bi-plots representing semantic maps are described and interpreted in the
context of the corpus data. Section 4.2 provides a second, more specialized investigation of
association patterns in the data using another statistical technique, viz. logistic regression.
4.1 First investigation: semantic maps
PCA produces a sub-space of the initial multidimensional space represented by each of the three full
matrices. These sub-spaces can be visualized by plotting them in a two-dimensional figure, and it is
this two-dimensional figure that will form the basis of the semantic analysis. The two-dimensional
plot, or bi-plot, of the PCA is a spatial representation of the similarities and differences that arise
from differences in distributions of co-occurrence of locative adverbs and grammatical tags. Hence,
11
the distances that arise in the bi-plot can be interpreted as a proxy of distributional semantic
similarity. Since the dimensions in the PCA sub-space are ordered from the greatest to the smallest
explanatory value (i.e. PC 1 > PC 2 > … > PC n, where n is the total number of dimensions in the
analysis), we can see that some co-occurrences are more important than others. A rule of thumb is
that for a dimension to have some explanatory value, it should account for at least 5% of the
variation. In other words, if a dimension has too little explanatory value, we should be careful about
trying to interpret it in semantic terms, since it might simply be a result of random noise.
Figure 2: PCA bi-plot of the associations between locative adverbs and right-context tags. The total explained variation is 99.2%. The second dimension (PC2) representing a distinction between there (+d+ar) and here (h+ar) is substantial. Note the association between there and the verb be (BEDI) as well as a complementizer position (C).
Figure 2 is a bi-plot of the PCA analysis of the OE data. The quality of the analysis is excellent, with
the first two dimensions accounting for 99.2% of the total variation (PC 1 + PC 2). PC 1, by necessity,
accounts for the largest portion of the variation, but since PC 2 accounts for 15.7%, well above the
rule of thumb threshold of 5%, we can safely interpret the vertical as well as the horizontal
dimensions in the plot. Starting with PC1, we see that this dimension is dominated by the difference
between there (OE đær, coded as +d+ar in YCOE) and all the other adverbs. Turning to the second
dimension, we see that here (OE hær, coded as h+ar in YCOE) stands out from the rest as well. There
is associated with the tags “C”, i.e. a complementizer position, and to a somewhat lesser extent the
verb be. The co-occurrence with a complementizer position is exemplified in (3). In the majority of
cases the complementizer position is empty and the most interesting fact is not the C position itself,
but what it means in the corpus. In YCOE, most subordinate clauses are annotated as complements
12
of complementizer phrases, hence a complementizer position indicates that a subordinate clause
follows next, and it is clear that this context instantiates a locative use of there.
(3) on ælcum lande þær hys geleafa byđ. “in every country there his faith is” (coaelhom,+AHom_5:185.797)
The syntactic structure of (3) following the corpus annotation in YCOE is outlined in figure 3,
illustrating the syntactic relationship between there and the complementizer position which appears
in the bi-plot in figure 2.
Figure 3: Syntactic tree illustrating the corpus annotation’s phrase-structure representation of the example phrase in (3). Only the relevant PP context is shown, the full sentence having been omitted for reasons of space.
In example (4), on the other hand, we see a use of there which, despite the corpus annotation’s
insistence on a locative label, is at least potentially interpretable as an existential use. As the
corresponding syntactic structure in figure 4 suggests, this sentence is very close to the prototypical
structure of an existential clause consisting of there, be, and a nominative NP.
Figure 4: Syntactic tree illustrating a phrase structure that is ambiguous as to whether there is used as a locative adverb or an existential subject. The syntactic structure corresponds very closely to that of an EC.
(4) þær wæs eac ođer cyricweard
“there was also another sexton”
(cogregdC,GDPref_and_3_[C]:25.228.4.3143)
(5) Her Ignatus biscep þrowude.
13
“Here bishop Ignatus suffered.” (cochronA-1,ChronA_[Plummer]:110.1.92)
Turning to here, we can see from (5) and the corresponding syntactic structure in figure 5, that co-
occurrence with a nominative NP points to a locative use.
Figure 5: Syntactic tree illustrating example sentence (5) where here is used in an unambiguous locative position.
As far as OE is concerned, we can conclude that both there and here appear with contexts that are
clearly locative and contexts that may be interpreted as existential, whereas a secondary distinction
exists between here and there, and that this secondary distinction represents a tendency for here to
occur in more typically locative contexts than there. We see a division of labor between there and
here in that they define or outline different dimensions in PCA solution (PC1 vs. PC2, respectively).
The substantial proportion of variance explained by PC2 leads to the conclusion that there and here
are distributionally complementary signs.
14
Figure 6: PCA bi-plot of associations between locative adverbs / existential there and their respective grammatical contexts in ME. The total explained variation is very high (98.5%), making it a good representation. Compared with OE (figure 2) the second dimension has a much lower explanatory potential, which points to less differentiation between here and there.
In ME, the situation is at first glance very similar to OE. The bi-plot in figure 6 shows that the first two
dimensions account for 98.5% of the total variation, again an excellent result. As in OE, most of the
adverbs are clustered together so tightly that none stand out, a sign of overlapping distributions. Also
like in OE, there defines the horizontal axis (PC1) and here defines the vertical axis (PC2), with the
former occurring in both locative and existential contexts, whereas the latter only occurs in locative
contexts. Unlike in OE, we can see that the share of explained variance accounted for by PC2, i.e.
here, is declining. The vertical axis explains only 4.2% of the total variance, which is less than the rule
of thumb threshold value of 5%. Even if this is only a rule of thumb, the reduction in explained
variance which here contributes with is dramatic. However, an explained variance of 4.2% still seems
close enough to the threshold value to merit some consideration, but the overall impression is that
of a gradual decline in contrast. Although there still occurs in locative contexts frequently enough to
show signs of its deictic opposition to here, the magnitude, or importance, of this distinction is clearly
on the ebb.
(6) +ter is a noble Cytee +tat is called Tours. “there is a noble city that is called Tours.” (CMBRUT3,9.234)
(7) +ter he fastyde fourty dayes “there he fasted forty days” (CMAELR3,42.466)
15
(8) God wolde ye had nat come here “God would that you had not come here” (CMMALORY,50.1675)
Some of the salient contexts for here and there in ME are exemplified in (6-8). In (6) there co-occurs
with be, a typical existential context, whereas in (7) there co-occurs with a subject NP (he) which
seems to indicate a locative position. In (8) a sentence with here illustrates another locative position,
the end of the sentence (labeled as “END” in the bi-plot).
Figure 7: Bi-plot of associations between locative adverbs / existential there and their grammatical contexts in EME. The total explained variation is still very high (99.9%), but the second dimension no longer has any meaningful explanatory value. The first dimension, dominated by there, accounts for all the systematic variation in the data.
Finally, turning to the bi-plot for EME in figure 10, we see that the two first dimensions account for a
total of 99.9%, i.e. practically all the variation. PC 1 is as before dominated by there and accounts for
a staggering 97.1% of the total variation, while the second dimension, which is defined by here, only
accounts for 2.8%. The percentage accounted for by PC 2 falls far short of the 5% rule of thumb. This
can only be interpreted to mean that the systematical opposition between there and here no longer
represents the most salient usage. Since the present study’s definition of semantics relies crucially on
precisely contrasts, or oppositions, as the means by which meaning is created, this has a clear
implication: the main opposition (i.e. meaning-bearing distinction) that is found in usage between
there and all other locative adverbs is now the most salient source of the word’s meaning, not its
distinction with here. There is now tied very closely to one specific context, namely different forms of
16
be, exemplified in (9). Although the bi-plot (figure 7) suggests that here still occurs before subjects
(10) and at the end of sentences more often than in other contexts, the minute explanatory value of
PC 2 tells us that this association is no longer strong enough to reliably differentiate here from all
other adverbs. Put differently, while there has become more discriminating about the contexts in
which it appears, here has become more uncritical and occurs in more or less the same contexts as
any other locative adverb.
(9) there is one that accuseth you “there is one who accuses you” (AUTHNEW-E2-H,V,40J.635) (10) Here we refreshed our selues very well with fresh water, “Here we refreshed ourselves very well with fresh water,” (COVERTE-E2-H,18.130)
Following the distributional hypothesis, it is not unreasonable to interpret the PCA bi-plots above as
models of systems of linguistic signs. The crucial aspect of these models is that they primarily
consider the most frequently attested correlations and oppositions, and that they do so in a
systematic and unbiased manner. As such, the models attest to a gradual, diachronic shift in the
system. In OE, the deictic opposition between here and there was clearly highly salient in usage.
Although these two adverbs, through deixis, could refer to locations, this deixis set them apart from
the great mass of other locative adverbs. However, in ME, and even more so in EME, it is clear that
from the point of view of a usage-based model of this system, the deictic properties of here and
there become less prominent. Instead, here becomes more alike all the other locative adverbs, while
there becomes defined by its opposition to these locative adverbs. However, the bi-plots make use of
a rather limited grammatical context. The next section considers to what extent these contexts
represent grammatical constructions and, following CxG, hence semantics.
4.2 Second investigation: searching for constructions
An important question that remains despite the semantic maps presented above is whether we can
really identify an EC in OE by statistical means alone. Although a clear association between there and
the verb be was found, this does not necessarily prove that an EC is the underlying source of that
association. The bigram there – is could easily constitute a deictic use, as in (11). The case for having
identified a potential EC would be much stronger if another element, namely the indefinite NP, could
be identified as well. Since the use and meaning of existential there is most controversial in OE, the
second investigation will deal with OE data only.
(11) There is that book I’ve been looking for all day. (locative)
(12) There is a book on the table. (existential)
The following section will test whether a significant positive correlation exists between there and
indefinite NPs occurring in the second n-gram slot after the adverb, corresponding to the boldface
constituent in (12). If such a correlation can be found, it is reasonable to assume that it reflects the
EC.
To test the association between locative adverbs and indefinite NPs in OE, the context of the second-
most left context of the adverbs in the OE dataset was converted into binary values, with “true”
17
indicating the presence of an indefinite NP in the second left-context of the adverb, and “false”
indicating its absence. The criteria used for defining an NP as “indefinite” was the presence of either
an indefinite quantifier or man, represented in the corpus as either “Q^N” or “MAN^N”.
Furthermore, unambiguous plural nouns occurring without a determiner were also included. A
further restriction was that the NP should be in the nominative case (cf. the “^N” tag), to more
closely correspond to the “logical subject” of an EC. This resulted in 221 explicit indefinite NP
contexts being identified, alongside 15 plurals. Of the total 236 indefinite contexts, 192 occurred with
there, 41 with here and 3 with other locative adverbs. The total dataset numbers 9203 observations,
which makes this a relatively rare phenomenon; however, this is to be expected since it was only in
ME that the existential use of there was fully conventionalized.
Since the EC is distinguished by the combination of there, an indefinite NP, and a form of the verb be,
information about whether a form of be is present should be taken into account. The presence of all
three elements (there, be, and an indefinite NP) together are highly indicative of the EC. Thus, if a
significant correlation can be established between all the elements, this would strengthen our
confidence in the results presented in section 4.1 above.
To determine the presence and nature of any correlations, a binary logistic regression model was
used, with indefinite NP (true/false) as the response variable and the locative adverbs, categorized as
here, there and “other” for better reliability (Manning and Schütze 1999:192), as the first predictor.
The second predictor used in the model is a binary variable indicating whether or not a form of be
was present as the first left context of the adverb (C 1). The third predictor combines information
about the sentence’s total number of tree-nodes, the total number of NPs, and the total number of
finite verbs (as represented by the corpus annotation) into an index of syntactic complexity (Jenset
2010). For a more in-depth explanation of logistic regression, see Baayen (2008:195-208). The
regression model, created with the rms package (Harrell 2012) in R, can be expressed as follows:
( ) ( ).
In other words: the probability of finding an indefinite NP in the second left context of a locative
adverb can be modeled as a linear (log-transformed) function of the overall mean rate of indefinite
NPs (µ), a modifying factor for the adverb categories ( Adverb), the presence (or absence) of the
verb be ( Be), and the overall annotation-complexity of the sentence ( Complexity), as well as
some random variation (ε). The overall model is statistically significant (p < 0.0001). Furthermore,
although the coverage is modest (R2 = 0.15), the predictive capability (C = 0.79) is reasonably close to
the 0.80 threshold proposed by Baayen (2008:204). However, some problems with prediction and
coverage in this case are not surprising: the response data only contained 236 indefinite NPs out of a
total of 9203, which amounts to a weak signal compared to the overall size of the dataset. Thus,
keeping in mind that all models are anyway imperfect simplifications, it seems warranted to accept
the model as being reasonably useful for our purposes.
Table 2 summarizes the model. In line with the initial hypothesis, we see that the category “other
locative adverbs” has a negative coefficient; hence it is not associated with indefinite NPs. Here and
there on the other hand both have positive coefficients, revealing a positive correlation with
indefinite NPs. The small standard errors show that the uncertainties inherent in the estimates are
smaller than the correlations themselves. Together with the significant p-values, this indicates that
we can be reasonably confident in the correlation and its size. The coefficient itself is a log odds ratio,
18
a measure that is not easy to interpret in an intuitive manner. Instead, Gelman and Hill (2007:82)
recommend dividing the coefficient by four as a convenient estimate of the maximum impact that a
variable may have as a percentage increase in the probability of the response. The last column of
table 2 shows the percentage increase in the probability of finding an indefinite NP when switching
from the baseline of “other locative adverbs” to either here or there. Although the absolute
estimated correlation rates between indefinite NPs and here or there are modest (respectively 1.8
and 3.6 per hundred occurrences), the relative impact that here or there has on the probability of
finding such an NP is substantial, and warrants the conclusion that a real, non-trivial correlation
exists between here / there and indefinite NPs in the OE data.
Table 2: Summary of a binary logistic regression model of the OE data showing that compared to all other locative adverbs, here and there are positively correlated with indefinite NPs in the second context-slot, and both are associated with a considerable increase in the probability of finding an indefinite NP. Conversely, increasing the structural complexity of the sentence decreases the probability of finding an indefinite NP. The model was tested for interactions between be and adverbs, but the interaction was not significant and hence removed from the model. All predictors are statistically significant at least at the 0.01 level.
Predictor Coefficient Standard error P-value ± Pr(IndefNP)
Mean: other advs -6.19 0.58 <0.0001 (baseline) There 2.49 0.58 <0.0001 + 62% Here 1.77 0.60 0.0032 + 44% Context 1 = be 1.82 0.14 <0.0001 + 46% Complexity -0.37 0.07 <0.0001 - 9%
Figure 8: Plot visualizing the logistic regression model. Here, there, and be all have similar, positive associations with indefinite NPs. Other locative adverbs are highly negatively correlated with indefinite NPs. The complexity index is also negatively correlated with indefinite NPs, but the effect is much smaller. For each variable, the coefficient in table 2 is
19
represented as a dot. The lines extending from the dot represent confidence intervals, i.e. the maximum uncertainty associated with the coefficient. The dotted vertical line represents no correlation.
In summary, a positive correlation exists not only between there and be, but also between there and
indefinite NPs in the second n-gram slot, as well as between be in the first n-gram slot and indefinite
NPs in the second slot. In effect, it is possible to make reasonable, quantitative predictions about the
presence of one element of the construction in question based on the presence of other elements.
Such a correlation provides a strong indication of the existence of an EC in OE which involves there. It
is worth noting that the effect size of here is also substantial, a fact that will be further discussed
below. Finally, the structural, syntactic complexity of the sentence, as captured by the index
summarizing the corpus annotation, clearly plays a role. It is hardly surprising to discover an
interaction between syntax and semantics in the EC; however, a close scrutiny of the diachronic
syntax of English ECs falls outside the scope of the present study and will be dealt with in subsequent
work.
5. Discussion
The results from the first investigation, discussed in section 4.1 above, testify to a gradually vanishing
opposition between here and there, in their most salient uses. Concurrently with this, here becomes
gradually more similar to other locative adverbs in its most salient use. Of course, this is in some
respect a matter of perspective, but the PCA allows us to unpick the complexities, allowing for a
more fine-grained and nuanced analysis than if the relationships were to be stated in simple
categorical terms. The correlation was substantiated by investigation two, which attested the close
association with another crucial element of an EC, namely indefinite NPs.
5.1 The transition of there
Starting from the first bi-plot (figure 2), its most striking feature is that most of the locative adverbs
in OE are in free variation with their context (that is, the contexts that allow locative adverbs). These
are the ones lumped together in the center of the plot. Compared to this, here and there are highly
associated with the same contexts. If we next turn to the horizontal axis, it is clear that although
there are similarities between here and there, we can also observe differences. From the perspective
of PC 2, there are clear differences between here and there. Note that no similar meaningful
distinctions can be found among the other locative adverbs: the two first dimensions explain so
much of the variation that any remaining differences are likely to be random noise.
In ME (figure 6), we see a broadly similar pattern, but from the perspective of PC 1, here and there
are less similar in the sense that they share contexts to a lesser degree. This is evident from the fact
that here has moved closer to the undifferentiated mass of adverbs in the center of the plot, whereas
there remains far out to one side. At the same time, crucially, the distinction between here and there
in PC 2 is disappearing: the vertical axis now only accounts for 4.8% of the variation, marginally below
the 5% rule of thumb lower threshold for taking it into consideration in the first place. A reasonable
linguistic interpretation is that the deictic opposition is no longer the primary defining difference
between here and there in terms of usage.
20
Turning next to EME (figure 7), we see broadly the same pattern as in ME but with nuances: the
contexts defining there are more narrowly circumscribed, and PC 2 now accounts for such a small
proportion of the variation that it would be irresponsible to attach any linguistic significance to it.
Thus, as far as usage patterns are concerned, here is now indistinguishable from the great mass of
locative adverbs – the fact that the word is clearly visible on the bi-plot means nothing as long as the
axis representing that difference has too little explanatory value. Although there in EME still
functions as a locative adverb, this is no longer a salient feature to our model: the most prominent
usage feature of there is that it is used differently from all the other locative adverbs in a consistent
and decisive manner.
The chronology outlined above is broadly similar to that outlined in e.g. Breivik and Swan (2000). In
OE, the distinction between existential and locative uses of there is less clear than in PdE. By ME, the
use of there as a subject in the EC is established, and by EME it is conventionalized to such a degree
that the locative uses of there can no longer influence the model in any discernible way. As such, the
bi-plot analyses are useful and interesting, but it is only when they are interpreted in the context of
distributional semantics that they can shed new light on the semantics of there.
5.2 A distributional semantics interpretation
A central premise in this context is that a sign, in the linguistic sense, acquires its meaning through
contrasts with other signs (Saussure 1983). Based on this, the following proposal is offered: the
crucial test is not whether there contrasts with its absence in the EC, as argued by e.g. (Bolinger
1977) and Breivik (1990). Rather, the fundamental distributional property governing the semantics of
there is its systematic contrastiveness with all other (relevant) words in all (relevant) syntactic
contexts. This is essentially the position discussed in Cruse (2010:217–218) under the label
“combinatorial normality”. The position entails that word meaning can be captured in two
dimensions: all possible (well-formed) syntactic contexts it may appear in, coupled with all the
possible substitutes for that word. Cruse expresses skepticism toward the usefulness of corpora
(Cruse 2010:216–217), and his criticism of collocational profiles is warranted to the extent that
corpus semantics restricts itself to looking at raw frequencies of co-occurrence between a word A
and a word B. However, as this study has aimed to demonstrate, there is no reason whatsoever for
corpus-based semantics to adopt such a simplistic approach. By creating matrices of co-occurrence
frequencies of all relevant words (in this case all locative adverbs and existential there) and all their
contexts, the resulting semantic maps represented by the bi-plots go a long way toward mirroring a
usage-based, linguistic (sub-) system. The primary point of difference from the position discussed by
Cruse (2010) is the use of corpus frequencies rather than native speaker judgments.
Based on these frequencies, the PCA algorithm has produced what arguably constitutes the most
salient adverb – context associations for each adverb and each context, while simultaneously
creating the best possible representation of the system as a whole. The bi-plots show a marked
decline (and eventually the absence) of the deictic opposition between here and there (i.e. a loss of
contrast, or meaning, as far as usage goes), but we also see new contrasts emerging, specifically
between the combination there + be and the rest of the linguistic (sub-) system. In other words, the
loss of contrast (i.e. meaning) is followed by the gradual acquisition of contrast, that is, the lexical
item there + be is arguably gaining meaning. Hence, from this perspective it does not make sense to
claim that existential there is semantically empty: the usage data clearly indicate a contrastive
21
situation which again implies a difference in meaning, also at the constructional level (Goldberg
1995:67–69).
5.3 The meaning of existential there
The crucial question, then, is whether this newly acquired meaning is somehow related to the
original, locative one, as suggested by the mental space analyses in Lakoff (1987) and Breivik (1997).
The proposition that existential there in earlier English referred to a mental space via a metaphorical
extension is virtually impossible to test quantitatively. It may well be a valid observation; however, it
is commonly assumed that conceptual metaphors exist to furnish target domains (in this case
“existence”) with the structure necessary for everyday functional communication (Cruse 2010:250).
However, in early English it was possible to form an EC without there (Williams 2000). This implies
that we would have to assume two syntactically distinct ECs, one with and one without there.
Furthermore, following the principle of no synonymy, we would have to assume that the two ECs
were either semantically or pragmatically distinct (Goldberg 1995:67), and that one of the two ECs
was felt to be lacking in some respect, requiring some sort of locative elaboration or substantiation.
The present study takes a rather different view, grounded in the principles of distributional
semantics. The position entails that the final analysis should always place great emphasis on the most
salient patterns and association in the data, i.e. the attested usage. It follows from the bi-plots that
both uses of there are equally meaningful (in terms of distinctions), in a related way. The basic,
locative use has a fairly “grammatical” meaning, since its function is to designate a distal location
distinct from its deictic counterpart here. As the analysis of the bi-plot for the OE data attests to, this
is clearly a highly salient meaning of the sign there. The existential use, on the other hand, acts as
what Breivik calls a “presentative signal” (Breivik 1990:150–156) and serves pragmatically to bring
new information to the hearer’s attention in discourse. As such, both uses of there are related to the
psychological concept of selection, i.e. the ability to focus on what is relevant and take focus away
from what is irrelevant (Croft and Cruse 2004:47), and previous scholars have also pointed to the
similarity between deixis and existential statements (Lyons 1967:391). In the locative use, there
serves to selectively focus on a location that is speaker-distal; whereas in the existential use, the
word is used in a construction which selectively focuses on information that is hearer-distal, i.e. new
(Birner and Ward 1993).
The bi-plots highlighted a gradual shift in the (locative) adverb-system from OE to EME. While the
deictic adverbs here and there could point to locations in OE, figure 2 showed that they nevertheless
both stood out in usage from the great mass of locative adverbs. However, the subsequent
development attested in figures 6 and 7 points to a situation in which the locative meaning of here
becomes more prominent, while the locative meaning of there becomes less prominent at the
expense of its existential meaning. Thus, both uses of there have a meaning that can easier be
described in grammatical rather than traditional lexical semantic terms. This grammatical meaning is
in both cases connected with clear, somewhat abstract concepts, and the combinatorial distinctions
present in the relevant linguistic sub-system attest to differences in meaning arising out of contrasts
with other linguistic signs. Far from being an empty subject, the existential use of there arguably had
a meaning in early English. That a pragmatic, presentative-signal meaning is associated with there as
argued by Breivik (1990) seems relatively uncontroversial. However, according to distributional
22
semantics, the question of whether there has any meaning in addition to discourse pragmatics (and
what that meaning is) can only be settled by distributional evidence.
The second investigation, in section 4.2, highlighted an interesting distributional fact: both here and
there share an association with the EC (cf. also figure 2 which shows the association between here
and be). This fact indicates that the “background location”, or mental space, meaning of there cannot
have been a primary factor behind its involvement in the EC. If the “background location” meaning
was an important motivating factor, we would simply not expect to find such a reliable association
between the EC and here, with its meaning of proximal location. Since both here and there appear to
have a stake in the EC in OE, I submit that the primary semantic factor responsible for this is what
they share: a deictic, grammatical meaning, rather than reference to some location. The primary
meaning of deixis is to locate, or select, a referent using the current speech act as reference point
(Cruse 2010:401). This closely corresponds to the EC, which functions to highlight a referent (the new
information represented by the indefinite NP) with reference to the current speech act. Thus, here
and there share an important semantic property with the EC; however, the property is not a locative
one, but deixis or the psychological act of selection itself (Croft and Cruse 2004:47).
If this analysis is correct, the grammaticalization process of there appears to closely follow the
general account proposed by Croft (2001:260–268). In the first stage, the word there shares
important semantic properties with the EC, i.e. the word and construction are “profile equivalent” to
use his terminology (Croft 2001:257). A large semantic overlap implies that only a subtle semantic
shift is required to switch attention away from the individual word over to the construction as a
whole. Diachronically, this shift could have been brought about by the increasingly pragmatic
function of the EC, as it gradually became more entrenched as a means to introduce new information
into a discourse, as argued by Breivik (1990). Since the existential use of there was bound to the EC, it
would not be surprising if the semantic profile of the EC itself were to take on a greater salience over
there, ultimately leaving there as a grammatical element whose meaning has been left somewhat
redundant or usurped by the EC. This suggestion receives indirect support from experimental studies
showing that constructions can supply meaning to the words that comprise them (Kako and Wagner
2001), (Kako 2006).
An important corollary of the analysis above is that the grammaticalization of there is described as a
matter of degree rather than kind. The focus is not that there evolves from being referential/lexical
to being “empty”, but that a grammatical meaning (deixis) in a specific context gradually fused with
the meaning of its host construction (EC), and that the semantic similarities of there and the EC
probably colluded in this process.
This analysis has a number of advantages. First, it provides a principled explanation for why some
language varieties of Swedish and Norwegian can use here as an existential subject (Falk 1993:273),
(Jenset 2010:19), a possibility which also existed in ME (Jenset 2010:213-215). Focusing on the
mental-space reference of existential there in English leaves this fact unexplained, whereas under the
present account the choice of here rather than there is secondary since both are deictic. Second, it
provides a more precise characterization of the diachronic semantics of there. Rather than going
from locative to empty, or from referential to grammatical, the meaning of there appears to have
expanded, adding one grammatical meaning (existential marker) to another, existing grammatical
meaning (deixis). Third, it provides a comprehensive, empirically transparent characterization of the
23
relevant semantic properties involved in the inclusion of there into the EC based on corpus data,
rather than on intuitions about a language for which native speakers can no longer be found. In this
third capacity, the analysis highlights the potential that distributional methods represent for positing
new semantic explanations in historical linguistics.
6. Conclusion
To conclude, the existential use of there was neither empty nor locative, but rather exhibited a
special form of deixis that was eminently compatible with the semantics of the EC itself. The
evidence for this conclusion is twofold. First, the bi-plots in section 4.1 attest to a gradually
weakening distinction between here and there, as the latter becomes gradually more and more
entrenched as a component of the EC. However, following the principle that differences in form
imply differences in meaning, a new meaning can be observed arising from the growing distinction
between there (in the context of the EC) and all other locative adverbs. Second, the fundamental
importance of deixis can be seen in the statistical analyses of the OE data, where section 4.2
demonstrated that both here and there are significantly correlated with the EC. Such a correlation
would not be expected if the most relevant semantic property of the existential subject was to
denote a distal location acting as a (mental) background. Thus, while the discourse pragmatic
meaning of existential there proposed in previous studies seems natural, the present investigation
argues against the proposed locative meaning of existential there in earlier English. In this, the study
has shown the potential of corpus-driven, distributional semantics in historical linguistics. Thus, by
combining annotated historical corpora and the judicious use of appropriate statistical techniques,
historical linguists can not only cover more data in their analyses, but also expect to answer more
questions and do so in a more empirically transparent way.
References
Abbott, Barbara. 1993. A pragmatic account of the definiteness effect in existencial sentences. Journal of Pragmatics 19.39–55. doi:10.1016/0378-2166(93)90069-2.
Baayen, R. Harald. 1994. Derivational productivity and text typology. Journal of Quantitative Linguistics 1.16–34.
--- 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.
Barðdal, Jóhanna, Thomas Smitherman, Valgerður Bjarnadóttir, Serena Danesi, Gard B Jenset, and Barbara McGillivray. 2012. Reconstructing Constructional Semantics: The Dative Subject Construction in Old Norse-Icelandic, Latin, Ancient Greek, Old Russian and Old Lithuanian. Studies in Language 36.511–547.
Birner, Betty, and Gregory Ward. 1993. There-sentences and inversion as distinct constructions: A functional account. Proceedings of the nineteenth annual meeting of the Berkeley Linguistics Society, 19: http://elanguage.net/journals/bls/article/download/2915/2853.
Bolinger, Dwight. 1977. Meaning and form. London: Longman. Breivik, Leiv Egil. 1981. On the interpretation of existential there. Language 57.1–25. --- 1990. Existential there: a synchronic and diachronic study. 2nd ed. Oslo: Novus.
24
--- 1997. There in space and time. Language in time and space: studies in honour of Wolfgang Viereck on the occation of his 60th birthday, ed by. Heinrich Ramisch and Kenneth Wynne, 32–45. Stuttgart: Franz Steiner Verlag.
--- 2003. On Relative Clauses and Locative Expressions in English Existential Sentences. Pragmatics 13. http://elanguage.net/journals/pragmatics/article/view/380.
Breivik, Leiv Egil, and Toril Swan. 2000. The desemanticisation of existential there in a synchronic-diachronic perspective. Words: Structure, meaning, function–A Festschrift for Dieter Kastovsky, ed by. Christiane Dalton-Puffer and Nikolaus Ritt, 19–34. Berlin: Mouton de Gruyter.
Coopmans, Peter. 1989. Where stylistic and syntactic processes meet: Locative inversion in English. Language 65.728–751.
Croft, William. 2001. Radical Construction Grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.
Croft, William, and D. Alan Cruse. 2004. Cognitive linguistics. Cambridge: Cambridge University Press. Croft, William, and Keith T. Poole. 2008. Inferring universals from grammatical variation:
Multidimensional scaling for typological analysis. Theoretical Linguistics 34.1–37. Cruse, D. Alan. 2010. Meaning in Language: An Introduction to Semantics and Pragmatics. 3rd ed.
Oxford: Oxford University Press. Cuyckens, Hubert, Dominiek Sandra, and Sally Rice. 1997. Towards an empirical lexical semantics.
Human contact through language and linguistics, ed by. Birgit Smieja, 35–54. Wiesbaden: Peter Lang.
Davies, Mark. 2008. The Corpus of Contemporary American English (COCA): 410+ million words, 1990-present. Brigham Young University. http://www.americancorpus.org/.
Everitt, Brian S, and Torsten Hothorn. 2006. A handbook of statistical analyses using R. Boca Raton, Fl.: Chapman & Hall/CRC.
Falk, Cecilia. 1993. Non-referential subjects in the history of Swedish. Lund: Department of Scandinavian Languages, University of Lund.
Firth, J. R. 1957. Papers in linguistics 1934-1951. London: Oxford University Press. Freeze, Ray. 1992. Existentials and Other Locatives. Language 68.553–595. Gelman, Andrew, and Jennifer Hill. 2007. Data analysis using regression and multilevel / hierarchical
models. Cambridge: Cambridge University Press. Goldberg, Adele E. 1995. Constructions: A construction grammar approach to argument structure.
Chicago: University of Chicago Press. Greenacre, Michael. 2007. Correspondence analysis in practice. 2nd ed. Boca Raton, FL.: Chapman &
Hall/CRC. Gries, Stefan Th. 2011. Commentary: corpus-based methods. Current methods in historical semantics,
ed by. Kathryn Allan and Justyna Robinson, 184–195. Berlin: Mouton de Gruyter. Gries, Stefan Th, and Dagmar S Divjak. 2010. Quantitative approaches in usage-based cognitive
semantics: myths, erroneous assumptions, and a proposal. Quantitative methods in cognitive semantics: corpus-driven approaches, ed by. Dylan Glynn and Kerstin Fischer, 333–354. Berlin: Mouton de Gruyter.
Harrell, Frank E. 2012. rms: Regression Modeling Strategies. http://CRAN.R-project.org/package=rms. Ingham, Richard. 2001. The structure and function of expletive there in pre-modern English. Reading
working papers in linguistics 5.231–249. Iyeiri, Yoko, Michiko Yaguchi, and Yasumasa Baba. 2011. Principal component analysis of turn-initial
words in spoken interactions. Literary and Linguistic Computing 26.139 –152. Jenset, Gard B. 2010. A Corpus-based Study on the Evolution of There: Statistical Analysis and
Cognitive Interpretation. Bergen: University of Bergen (phd thesis). http://hdl.handle.net/1956/4444.
Jenset, Gard B, and Barbara McGillivray. 2012. Multivariate analyses of affix productivity in translated English. Quantitative Methods in Corpus-Based Translation Studies, ed by. Michael P Oakes and Meng Ji, 301–323. Amsterdam: Jonn Benjamins Publishing Company.
25
Kako, Edward. 2006. The semantics of syntactic frames. Language and cognitive processes 21.562–575.
Kako, Edward, and Laura Wagner. 2001. The semantics of syntactic structures. Trends in Cognitive Sciences 5.102–108.
Kroch, Anthony, Beatrice Santorini, and Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early Modern English.
Kroch, Anthony, and Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English. 2nd ed. Kroeber, A. L, and C. D Chrétien. 1937. Quantitative Classification of Indo-European Languages.
Language 13.83–103. Lakoff, George. 1987. Women, fire, and dangerous things: What categories reveal about the mind.
Chicago: University of Chicago Press. Langacker, Ronald W. 1988. Women, Fire, and Dangerous Things: What Categories Reveal about the
Mind by George Lakoff - Review by Ronald W. Langacker. Language 64.384–395. Lenci, Alessandro. 2008. Distributional semantics in linguistic and cognitive research. Italian journal
of linguistics 20.1–31. Lynn, Henry S., and Charles E. McCulloch. 2000. Using Principal Component Analysis and
Correspondence Analysis for Estimation in Latent Variable Models. Journal of the American Statistical Association 95.561–572.
Lyons, John. 1967. A Note on Possessive, Existential and Locative Sentences. Foundations of Language 3.390–396.
Manning, Christopher, and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA.: MIT Press.
Meehl, Paul E. 1990. Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles that Warrant It. Psychological Inquiry 1.108–141. doi:10.1207/s15327965pli0102_1.
Nenadić, Oleg, and Michael Greenacre. 2007. Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package. Journal of Statistical Software 20.1–13.
Newmeyer, Frederick. 1998. Language form and language function. Cambridge, MA.: MIT Press. Pérez-Guerra, Javier. 1999. Historical English syntax: A statistical corpus-based study on the
organisation of Early Modern English sentences. Lincom studies in Germanic linguistics 11. München: LINCOM.
R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. Vienna. http://www.r-project.org.
Redington, Martin, Nick Crater, and Steven Finch. 1998. Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22.425–469.
Rencher, Alvin C. 2002. Methods of Multivariate Analysis. 2nd ed. New York: Wiley-Interscience. Sandra, Dominiek, and Sally Rice. 1995. Network analyses of prepositional meaning: Mirroring whose
mind—the linguist’s or the language user’s? Cognitive Linguistics 6.89–130. doi:10.1515/cogl.1995.6.1.89.
Saussure, Ferdinand de. 1983. Course in general linguistics. London: Duckworth. Schütze, Hinrich. 1998. Automatic word sense discrimination. Computational Linguistics 24.97–123. Stefanowitsch, Anatol, and Stefan Th Gries. 2003. Collostructions: Investigating the interaction of
words and constructions. International journal of corpus linguistics 8.209–243. Taylor, Ann, Anthony Warner, Susan Pintzuk, and Frank Beths. 2003. The York-Toronto-Helsinki
Parsed Corpus of Old English Prose. Traugott, Elizabeth Closs. 1992. Syntax. The Cambridge History of the English Language, ed by.
Richard Hogg, I: Old English:168–289. Cambridge: Cambridge University Press. Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. 4th ed. New York: Springer. Williams, Alexander. 2000. Null subjects in Middle English existentials. Diachronic syntax: Models and
mechanisms, ed by. S. Pintzuk, G. Tsoulos, and A. Warner, 285–310. Oxford: Oxford University Press.