a proposed approach to handling unbounded dependencies in automatic parsers

8/13/2019 A Proposed Approach to handling unbounded dependencies in automatic parsers

1/149

University of Alexandria

Faculty of Arts

English Language Department

A Proposed Approach to Handling Unbounded

Dependencies in Automatic Parsers

A THESIS SUBMITTED TO THE ENGLISH LANGUAGE

DEPARTMENT, FACULTY OF ARTS, THE UNIVERSITY OF

ALEXANDRIA IN FULFILLMENT OF THE REQUIREMENTS FOR

THE DEGREE OF MASTER OF ARTS IN COMPUTATIONAL

LINGUISTICS

By

Ramy Muhammad Magdi Ragab Abdel Azim

Supervised by:

Dr. Sameh Al-Ansary

Associate Professor of Computational

Linguistics

Department of Phonetics and Linguistics

Faculty of Arts

Alexandria University

Dr. Heba Labib

Assistant Professor of Linguistics

Department of English Language and

Literature

Faculty of Arts

Alexandria University


2/149

2


3/149

3

to the memory of

Professor Hassan Atiyya Taman

(2010)


4/149

4

Contents

Abstract

Acknowledgements

Symbols and Abbreviations

List of Figures

List of Tables

1. INTRODUCTION 171.1.Motivation 181.2.The Problem 211.3.Aims and Contributions 231.4.Thesis Structure 251.5.UDs defined 271.6.The class of UDs 30

1.6.1. Strong UDs 311.6.2. Weak UDs 32

1.7.Nomenclature 342. UDS AND SYNTACTIC FORMALISMS 37

2.1.Derivational Approaches 382.2.Generalized phrase structure grammar (GPSG) 442.3.Head-driven phrase structure grammar (HPSG) 522.4.Categorial grammar (CG) 602.5.Lexical functional grammar (LFG) 63


5/149

5

2.6.Towards an Ontology of Gaps 702.6.1. Gaps between Objects and Subjects 712.6.2. The Distribution of Gaps 732.6.3. The Ontology 77

3. Parsing and Formal Languages 813.1.The Concept of a Formal Language 823.2.Defining a Generative Grammar 833.3.Formal Grammars and their Relation to Formal Languages 843.4.The Chomsky Hierarchy 863.5.Automata 893.6.Parsing Theories and Strategies 913.7.The Universal Parsing Problem 923.8.Major Parsing Direction 933.9.Top-down Parsing 95

3.10.Bottom-up Parsing 963.11.The Cocke-Kasami-Younger Algorithm 983.12.The Earley Algorithm 1003.13.Statistical or Grammarless Parsing 1033.14.Text vs. Grammar Parsing: the Nivre Model 1043.15.Text Parsing and the Problem of UDs 105

4. UDs Parsing Complexity 1084.1.The Rimell-Clark-Steedman (RCS) Test 1104.2.The Parsers Set 112


6/149


7/149

7

"In the beginning was the word. But by the time the second word was

added to it, there was trouble. For with it came syntax, the thing that

tripped up so many people."

John Simon,Paradigms Lost

This is a fertile area of research, in which definitive answers have not

yet been found.

Sag &Wasow, Syntactic Theory: a formal introduction


8/149

8

Abstract

Unbounded dependencies (UDs) represent a set of syntactic constructions in the

English language that face syntactic and computational analyses with a number of

challenges. Unbounded dependencies cover such constructions as wh-questions,

relative clauses, topicalized sentences, tough movement clauses, it-clefts and many

more. Though each of the previous constructions may have received considerable

attention in the syntactic literature, the awareness of the unity of all these constructions

and their likeminded behavior that make them form a coherent whole was largely

missing in such treatments.

This thesis explores the linguistic nature of UDs and how they were handled within

the current flurry of syntactic theories. The thesis provides analyses of UDs within the

Principles & Parameters model (as representative of derivational approaches to syntax),

Generalized Phrase Structure Grammars, Head-driven Phrase Structure Grammars,

Lexical-functional Grammars, and Categorial Grammars (as representatives of non-

derivational approaches). The thesis, then offers a newly devised gaps-ontology that

aims at gathering all the information and rules related to the behavior of gaps in

unbounded dependencies in one integral theoretical entity that can be utilized in

computational environments.

The thesis claims that the problem of UDs parsing is basically a computational

problem not a syntactic one, i.e. the solution of the problem lies in the parsing strategy

and techniques used not the theoretical underpinnings of the different syntactic analyses

available. Accordingly, the thesis proposes two types of solutions to the parsing


9/149

9

problem of UDs: the first introduces modifications on the architectural design of the

universal parser, subscribing to the highly useful technique of modularity and thus

devising what the thesis calls a Small-scale Latent Parser. The other proposes

processing modifications represented by the techniques of gap-threading and

memoization. The thesis also claims that top-down parsing cannot be endorsed as a

possible strategy for parsing UDs and favors, thus, bottom-up parsing strategies

instead.


10/149

01

Acknowledgments

My interests in computer science and the study of computational linguistics were

triggered 13 years ago when I began my work with Dr. Nabil Ali. Dr. Ali, an engineer

by training and the father of Arabic informatics and computational linguistics, brought

to my attention many important works and gave me the opportunity to see how a real

computational system looks like. The late Prof. Hasan Taman, original supervisor of

the thesis, is the one who should be accredited with the current organization of the

thesis. He insisted, against my disposition to work on theoretical issues alone, on a

problem-solving method that finds a problem and proposes solutions, which explains

the title of the thesis itself (his exact phrasing). Prof. Tamans belief in me and in my

academic abilities was so crucial in infusing me with the spirit that made me work on

this thesis and recover from so many bouts of despair. May his soul rest in peace.

Prof. Azza el-Khoulys and Prof. Sahar Hamoudas kindness and support made this

thesis see the light of day. Dr. Sameh al-Ansarys patience, unflinching support and

understanding also revitalized the hope of finishing this thesis. Without him I would

not have been able to finish the thesis in the first place, not to mention his comments

and suggestions that improved the outlook and organization of the thesis. My debt to

him will always be remembered.

Also, Prof. Olga Matars kind approval to be one of the examiners brought me such

happiness because she was the first one I hoped could supervise my work even before

Prof. Taman, but unfortunately at that time she was unable to slot me in her already full

schedule of graduate theses supervisions. Dr. Heba Labibs sweet kindness and


11/149


12/149

02

Symbols and Abbreviations

_ Underscores represent the position (s) of gaps in a sentence./ Represents the SLASH feature in which the slashed feature on the

right-hand side of the slash is missing.

e Null or empty categories.

' Adding up to in HPSG

AB There is a category A missing somewhere within it a B, within

Moortgats version of CG.

In LFG, a variable that refers to the lexical item being categorized.= In LFG, an equation meaning that the features of the nodes below and

above are being shared.

Lambda, a symbol referring to a string consisting of zero elements.

L Language in formal languages theory.

G Grammar in the theory of formal languages.

VN Nonterminal variables

VT Terminal variables

L(G) The grammar of a languageLin formal language theory.

(N, , S,P) Elements of a formal grammar G.

The left-hand side elements are rewritten as the right-hand side

elements, e.g. S NP VP

xS xbelongs to or a member of S. Refers either to the root of a sentence or, in formal language theory, to

terminals of sentence in contrast toNwhich refers to non-terminals.

In the Earley parsing algorithm, the dot is used on the right-hand side

of the grammar rule to tell us where the rule has reached or to what

extent it progressed, e.g. SVP, [0, 0]

Boxed numbers, or tags, in AVMs indicate structure sharing in HPSG


13/149

03

NLP Natural Language Processing

UDs Unbounded Dependencies

ST Syntactic Theory

GPSG Generalized Phrase Structure Grammar

LFG Lexical Functional Grammar

HPSG Head-driven Phrase Structure Grammar

CG(s) Categorial Grammar(s)

CCG Combinatory Categorial Grammar

TG Transformational Grammar

ATN Augmented Transition Networks

PSG Phrase Structure Grammar

GB Government and Binding theory

P&P Principles and Parameters theory

MP Minimalist Program

TP A clause consisting of an NP and a VP.

C Complementizer within a P&P context.

CP Complementizer phrase within a P&P context.

DP Determiner Phrase within a P&P context.

SPEC Specifier within a P&P context.

CF-PSG Context-free Phrase Structure Grammar.

FFP Foot Feature Principle within a GPSG context.

ID Immediate Dominance rules within a GPSG context.

LP Linear Precedence within a GPSG context.

HFP Head Feature principle within a GPSG context.

CSLI Stanfords University Center for the Study of Language and Information.QUE A feature of questions in HPSG.

REL A feature of relative clauses in HPSG.

INHER Inheritance feature in HPSG.

AVM Attribute Value Matrix in HPSG and Unification Grammars.

SYNSEM Syntax-semantics interface in HPSG.

SPR Specifiers in HPSG.

3sg Third person singular in HPSG.


14/149


15/149

05

List of Figures and Tables

Figures

(1) The Class of Unbounded Dependency Constructions.(2) A derivational analysis of the sentence Who do you think Jim Kissed?.(3) A derivational analysis of the sentence Who do you think Jim Kissed?

(modified).

(4) A derivational analysis of the sentence Who do you think Jim Kissed?(modified).

(5)

A derivational analysis of the sentence Who do you think Jim Kissed?(modified).

(6) A derivational analysis of the sentence Which city did Ian visit?.(7) Tree geometry of the structure of a UD in GPSG.(8) A GPSG analysis of the sentence Sandy we want to succeed.(9) An HPSG analysis of the sentence Kim,we know Sandy claims Dana hates.(10) An attribute value matrix (AVM) for the verb sees in HPSG.(11) An HPSG structural description (SD) of gaps in UDs.(12) A CG analysis of the sentences Whom do you think he loves? and Who do

you think loves him?.(13) A CG analysis of the sentence Who Jo hits?(14) An LFG analysis of the sentence What Rachel thinks Ross put on the shelf?(15) The c-structure of What Rachel thinks Ross put on the table?(16) The f-structure of What Rachel thinks Ross put on the table?(17) C-structure of What did the strange, green entity seem to try to quickly hide?

(Asudeh 2009)(18) F-structure of What did the strange, green entity seem to try to quickly hide?

(Asudeh 2009)(19) A subject-predicate analysis of the topicalized sentence The others I know are

genuine. CGEL.(20) A proposed Gap ontology.(21) GAPS AVM.(22) The Chomsky Hierarchy and its corresponding automata.(23) a top-down analysis of the sentenceBook that flight.(24) A bottom-up analysis of the sentenceBook that flight.(25) A CKY parsing of the sentenceBook the flight through Houston.(26) An illustration of an attachment ambiguity in the sentence I shot the elephant in

my pajamas.


16/149

06

(27) Components of a language processing system.(28) The structure of a compiler within a language processing system.(29) The Parser within the compiler.(30) Small-scale Latent Parser.(31) GAPS AVM.(32) Flowchart of UDs SPL algorithm.(33) Gap-threading in the sentence John, Sally gave a book to.(34) A parse of the sentence Who do you claim that you like?using Python.(35) A parser blueprint incorporating all proposed modifications.

Tables

(1) Position/function of Gaps.(2) Multi-locus Gaps.(3) Formal elements of a PSG.(4) Chomsky hierarchy grammars and their corresponding automata.(5) An Earley algorithm analysis of the sentenceBook that flight.(6) Examples of the seven types of UDs used in the RSC Test.(7) Parser accuracy on the UDs corpus according to the RCS Test.


17/149

07

Chapter1: Introduction


18/149

08

Chapter1:

Introduction

1.1. Motivation:Since the beginning of the year 2000 up until the end of 2003, I have worked on natural

language processing (NLP) solutions for two major companies in Egypt. My first-hand

experience with actual large-scale parsers made me aware of some problems facing

those parsers in the processing of certain grammatical constructions. I decided, back

then, to tackle one of the most difficult problems facing those parsers unbounded

dependencies.

Complex syntactic phenomena stand out as a challenge to computational

implementation in NLP applications. The challenge resides in the problematic nature of

these phenomena: they are syntactically rich with details, and as a consequence of

complexity, they are interleaved with many other linguistic phenomena. In addition,

they exhibit a sufficiently perplexing tendency towards being polymorphous and

diverse. Unbounded dependencies (or, alternatively, long-distance dependencies, filler-

gap constructions, wh-movement constructions, A-bar dependency constructions,

extraction dependencies, etc.) are classic examples of how complex and theoretically as

well as computationally challenging these syntactic phenomena can be. Terry

Winograd (Winograd 1983) gives us an unequivocal statement about the significance of

UDs to the then current syntactic theory. He says:


19/149

09

The need to account for this phenomenon [UDs] is one of the major forces

shaping grammar formalisms. It was one of the motivations for the original

idea of transformations, and in some recent versions of TG, the only

remaining transformations are those needed to handle it. The hold register

in ATN grammars, the distant binding arrows of LFG, and the derived

categories of PSG are other examples of special devices that have been

added on top of simpler underlying mechanisms in order to handle it.

(Winograd 1983: 478)Since the 1970s, it has been generally assumed that a number of grammatical

constructions show a type of uniform behavior and architecture that they should be

considered en masse. Chomsky (1977) notes that the rule of wh-movement has, inter

alia, the following general characteristics:

1- it leaves a gap.2- where there is a bridge, there is an apparent violation of subjacency.3- it observes wh-islands. (Chomsky 1977: 86)

Grammatical phenomena that fall under the rubric of UDs cover the following

constructions: topicalization, wh-questions, wh-relatives, it-clefts, tough movement, etc.

The most important feature marking all these constructions is the existence of gaps as

Chomsky noted above.

UDs represent a unique class of grammatical constructions that require some

especially devised mechanisms in order to successfully process them syntactically and

computationally. A basic example on UDs is given in the following sentence:

(a) Sam, I think he told me he tried to understand __.The above sentence can be represented in the following, largely theory-neutral, tree diagram


20/149

21

S

NP VP

VS

NP VP

V NP S

NP VP

V IP

I

V NP

S

NP

Sam, I think he told me he tried to understand___

Sentence (a) above is a topicalized sentence where the object of the sentence is

fronted to add emphasis to the intended message of the construction. The fronting of

Sam, i.e. its displacement from the normal object position in the English language (an

SVO language) left a trace in the position of the displaced object that tells us about the

history or the original constitution of the structure before the displacement process.

This trace is usually marked with a hyphen or a dash representing the displaced

element. This account somehow subscribes to a movement-based hypothesis that is part

of the derivational approach to UDs evidenced in TG, GB, P&P and MP theories of

syntax.1

1 The example above and the subsequent explanation should not be taken as a sign of the researchers

subscription to the Chomskyan model and its various manifestations and developments. On the contrary.

The present work openly criticizes those approaches and spots many deficiencies in them as will be seen

in chapter 2.


21/149

20

1.2. The ProblemUDs represent a unique instance in the history of contemporary syntactic theory

(henceforth: ST) and NLP. In fact, they became the raison d'tre of a handful of

extremely influential syntactic formalisms and a number of novel computational

theorems and techniques. Robust syntactic formalisms such as Generalized Phrase

Structure Grammar (henceforth: GPSG), Lexical Functional Grammar(s) (henceforth:

LFG), Head-driven Phrase Structure Grammar(s) (henceforth: HPSG), and modern

Categorial Grammar(s) all owe, some way or another, many of their formative concepts

and notational devices to studies of UDs. Ivan Sag (Sag 1982) expresses this fact

succinctly by saying that:

Few linguists would take seriously a theory of grammar which did not

address the fundamental problems of English that were dealt with in the

framework of standard transformational grammar by such rules as There-

insertion, It-extraposition, Passive, SubjectSubject raising, and Subject

Object raising. (Sag 1982, p. 427)

UDs happen to be one of those constructions. This is not the whole picture, though.

UDs form an integrated component in most syntactic theories that have attained a

considerable degree of maturity. Its internal complexity and the sophistication needed

to handle them formally and computationally made them a benchmark against which

the validity, expressive power, and exhaustiveness of treatment of any given syntactic

theory are gauged. None the less, only few works have paid attention to handling UDs

in a uniform manner, i.e. works dealing with UDs as a uniform whole surveying their


22/149

22

treatment in different syntactic formalisms that subscribe to different linguistic

frameworks are quite meager.

1

As regards the computational handling of UDs, there have been various attempts at

unraveling their syntactic complexity through computers. The basic idea was to test the

robustness of a particular grammar formalism or computational system (Oltmans 1999).

The idea of robustness is of the essence here. A computational system is deemed robust

if it exhibits graceful behavior in the presence of exceptional conditions. Robustness in

NLP is concerned with the systems behavior when input falls outside of its initial

coverage. For instance, if the system is fed with rules describing and specifying the

behavior and structure of relative clauses in English, it will not be negatively affected if

these rules are not covered in full. But the question remains: why study UDs from a

computational viewpoint? The answer to this question seems to be unanimous in the

computational literature. UDs have been always identified in computational linguistics

works as a problem. Charniak (1993) mentions the following concerning UDs:

Another standard problem with CFGs is long distance dependencies This

problem can be solved within a CFG, although it gets a bit complicated. (Charniak

1993: 8-9)

In Mellishet al.(1994) the situation is even more clear-cut:

The problem is more severe when we come to consider long distance

dependencies, or more correctly unbounded dependencies in which two unrelated

pieces of structure may be arbitrarily far apart and not in the same level in the tree.

(Mellish et al. 1994: 129-130)

1Only recently Robert Levine and Thomas Hukari have produced a uniform treatment of UDs in their full

manifestations in their: R. Levine & T. Hukari (2006) The Unity of Unbounded Dependency

Constructions. CSLI Publications, Stanford University. Unfortunately, I was unable to secure a copy ofthe book, but I read a detailed academic review of it by Robert Borsley. However, the main thrust of the

book is on the syntax-theoretic aspects of UDs within the framework of HPSG without any reference to

computational issues (see Borsley 2009).


23/149

23

Pereira (1981) finds out that one of the most important benefits of connecting parsing

with deduction is the [h]andling of gaps and unbounded dependencies on the fly

without adding special mechanisms. Such an excessive interest and engagement with

UDs, give us a clear unhampered view of the status of UDs as a computational

problem. There seems to be a common realization amongst computational linguists and

syntacticians of the problematic nature of UDs; a fact that precipitated many of the

current theoretical frameworks both in pure syntax and in computational linguistics.

Statistically speaking, there is a common belief that UDs and similar phenomena do

not represent a sizable portion of any general large-scale corpus, hence ignoring their

treatment. Surprisingly, however, around three quarters of the Wall Street Journal

corpus (WSJC) in the Penn Treebank (PTB) are non-local dependencies, which happen

to include UDs most of the time. The internal sophistication of UDs, their typological

diversity, the existence of gaps, their considerable corpus frequency not only lay bare

UDs as an engaging problem (syntactically and computationally) but as a compelling

one as well.

1.3. Aims and Contributions:

The main goal of this thesis is to provide outlines for solutions of the problem of UDs

as a computational problem. The overall aims of the thesis can be summarized in the

following points:

Placing UDs in their proper positions as regards simple, non-theoretic,grammatical analysis.


24/149

24

Uncovering the role of UDs in the formation and evolution of many syntactictheories and formalisms.

Laying hands on the key element(s) that could enable us to unravel thegrammatical complexity of UDs.

Proposing a syntax-theoretic solution for UDs in terms of a proposed gaps-ontology.

Highlighting the complexity of processing UDs computationally, i.e. UDs as aparsing problem.

Proposing two types of solutions regarding the automatic parsing of UDs: thefirst has to do with the overall parser design (some tweaking and modifications

of the parser architecture); while the second offers two parsing techniques that

may enable the parser to process UDs in a robust and efficient way.

However, before embarking on discussing the general outlines of my study, it is of

paramount importance to examine a question of method which confronts the researcher

at the outset. A linguist who has been trained on the dynamics and sophisticated details

of the many linguistic theories currently available while hardly having any formal

training in computability theory or computer science is unlikely to offer any detailed or

profoundly technical treatment of a phenomenon such as UDs from a computational

viewpoint. Besides, in order to prepare aseriouscomputationally viable study of UDs,

a linguist needs an intricate set of computational tools that can only be secured and

afforded by such large commercial/research entities (IBM, Microsoft, Carnegie Mellon,

etc.), not to mention the academic and technical expertise that cannot be obviated.


25/149

25

Thus, what a linguist can do on their own is to adumbrate certain guidelines that relate

to the interface between theoretical and computational linguistics. Many a research has

been marred because its author was unable to resist the temptation of going

computational: a temptation that normally leads to a chaotic morass of computational

nuances that, with the wisdom of hindsight, prove quite hard to disentangle. This

aptitude towards things computational can be ascribed to the current hype given to

anything that has to do with computers, without having, on the part of the researcher,

any proper knowledge, training or experience to do so.

I have attempted to get around this dilemma by focusing on the theoretical syntactic

issues that relate directly to computational parsing offering a broad, semi-technical

approach to solutions. As such, none of the arguments or proposals in the

computational section of this work should be judged as technical; they are just a

number of theoretical postulations, conjectures and refutations on how, in my opinion

and according to my knowledge of computer science, these problems can be solved.

1.4. Thesis Structure:The thesis is broadly divided into three sections: the first focuses on the extensive

theoretical backdrop of the phenomenon, providing an eclectic approach towards a

uniform view of one of the lynchpin components of both the theoretical division of the

work and the computational onegaps. The second represents a rough treatment of the

computational and the parsing problems involved using the second part as a

springboard. The third represents the researchers contribution to the problem of UDs


26/149

26

parsing by giving two sets of proposed modifications on the architectural and

processing levels of the parser.

The first section can be seen as the syntax-theoretic part that deals with the

definitions, typology and grammatical analysis of UDs (Chapter 1 and 2) and how a

number of syntactic theories and formalisms dealt with them. In addition to the

analytical exposition, this section is permeated with critiques of those theories and

formalisms in their treatment of UDs, along with an attempt (a perfunctory one though)

at digging up their intellectual milieus and methodological underpinnings (Chapter 2).

Section 2.6 proposes a gaps-ontology in which an eclectic, but hopefully harmonious,

mlange of the theoretical component of gaps and gaps handling is offered. This

concludes the syntax-theoretic section of the thesis.

The second section of the thesis focuses on parsing theory and its roots in the study

of formal languages (Chapter 3). Sections 3.6 - 14 discuss the various strategies and

techniques of parsing available in the literature. Chapter 4 considers the complexity of

UDs parasability as evidenced in a recent computational experiment. Sections 4.3 - 5

examine the architecture and design of mainstream parsers and how they are built.

The final section of the thesis represents the contributions part of the work where the

proposed modifications mentioned earlier are found. Chapter 5 proposes architectural

and design modifications on the universal parser by introducing the notion of

modularity and by devising a Small-scale Latent parser. Chapter 6 proposes the next set

of modifications that relate to the processing of the parser itself. This final section

concludes with a brief account of the conclusions of the thesis.


27/149

27

1.5. UDs Defined

The syntactic phenomenon of unbounded dependencies has been, as alluded to above, a

major springboard for many theoretical proposals and syntactic formalisms. Naturally,

this multiplicity of origins generated concomitantly a multiplicity of definitions and

designations in the literature. First, I will look at the different definitions of UDs and

how these differences can be accounted for. Then I will survey the various designations

found in the relevant syntactic literature.

The concept of "unbounded dependencies" was first introduced by Gerald Gazdar

(1981) to refer to a set of syntactic structures handled within transformational

frameworks in terms of movement or, more specifically, wh-movement. The use of the

adjective "unbounded" in such contexts, however, goes back to J. Bresnan (1976)

during the heyday of transformational approaches to grammatical analysis. Originally,

however, the idea of "unboundedness" is a mathematical concept used in algebraic and

computational studies of unbounded operators, set theory, number theory and

algorithmics (Gowers 2009). The mathematical undertones of the term will be

discussed later in the following section.

Crystal (2008) defines an unbounded dependency as

[a] term used in some theories of grammar (such as GPSG) to refer to a

construction in which a syntactic relationship holds between two

constituents such that there is no restriction on the structural distance

between them (e.g. a restriction which would require that both be

constituents of the same clause); also called a long-distance clause. In

English, cleft sentences, topicalization, wh-questions and relative clauses

have been proposed as examples of constructions which involve this kind

of dependency; for instance, a wh-constituent may occur at the beginning


28/149

28

of a main clause, while the construction with which it is connected may be

one, two or more clauses away, as in What has John done?/What do they

think John has done?/ What do they think we have said John has done?,

etc. In GB theory, unbounded dependencies are analyzed in terms of

movement. In GPSG, use is made of the feature SLASH. The term is

increasingly used outside the generative context1. (Crystal 2008: 501)

Crystal's definition deserves a while of analytical contemplation. First, we need to

establish the fact that Crystal (2008) is a relatively basic specialized dictionary targeted

at professional as well as lay readers. This means that encountering detailed

argumentative analyses of linguistic phenomena would be a rare incident in his work.

He establishes his definition of UDs upon an abstract postulate that describes UDs as

having a syntactic relationship between two constituents "such that there is no

restriction on the structural distance between them." The idea of having no restriction

on the structural distance between two dependencies is a mathematically or logically

oriented idea rather than a natural language based one. In other words, natural language

cannot permit such infinitely continuous clausal concatenations. It has to have a bound

(i.e. a sentence must end somewhere in a linguistic text). The idea of unboundedness is

thus a potentiality rather than an actuality. Mathematically-oriented thinking about

language, however, has a natural proclivity towards abstraction and higher-order

1 The final two sentences in Crystal's definition are interesting from an error analysis viewpoint,

however. First, he describes GB as handling UDs in terms of movement, which is essentially correct.

However, he continues his description by stipulating another fact about the handling of UDs in GPSGthrough the feature SLASH. The feature SLASH, as we will see later, is postulated in GPSG to account

primarily for the existence of gaps in UDs, while describing movement only as the main technique for

handling UDs in GB. This entails an intrinsic mistake in proposing that GB theory has no theorem for

handling gaps, which is incorrect. Second, Crystal describes GPSG, HPSG, LFG and CGs as theories

"outside the generative context." In fact all these theories are "generative" in essence; they are only non-

transformational.


29/149

29

language. A more linguistically-real term would be "long-distance dependencies",

which was later adopted by most non-transformational syntactic theories and syntactic

formalisms handling the phenomenon of UDs.

Trask (1993) defines UDs in a more poised manner. He notes how UDs present "a

major headache for syntactic analysis," and that "all sorts of special machinery have

been postulated to deal with them." He takes a more development-oriented approach to

the handling of the phenomenon: for example, he mentions that classical TG made a

liberal use of the theoretically problematic unbounded movement rules, and that GB

and GPSG both reanalyzed UDs in terms of chains of local dependencies. GB used

traces and GPSG came up with a feature SLASH. LFG, on the other hand, used arcs in

its f-structures. I shall deal with all these formative concepts in more detail later in this

work.

Matthews (1997) defines the phenomenon of UDs as a "[r]elation between syntactic

elements that is not subject to a restriction on the complexity of intervening structures."

His definition is a restriction-based one, bearing in mind the formative concepts of

island and cross-over constraints.

Another definition based on psycho-syntactic realization of UDs is found in Slack

(1990). According to him UDs represent a unique linguistic phenomenon, he writes:

One linguistic phenomenon which, more than any other, focuses on the

problem of addressing structural configurations is that of unbounded

dependency. Typically, in sentences like The boy who John gave the book

to __ last week was Bill, the phrase The boyis taken as the filler for the

missing argument, or gap, of the gave predicate, as indicated by the


30/149

31

underline. At the level of constituent structure there are no constraints on

the number of lexical items that can intervene between a filler and its

corresponding gap. (Slack 1990: 268)

Slack (1990) dissects the phenomenon of UDs in a more profound manner. He states

that UDs belong to a class of linguistic phenomena in which the structural address of

an element is determined by information which is only accessible over some arbitrary

distance in the structure.According to him it is necessary to determine the address of

the gap to which a filler belongs. The arbitrariness of the distance separating the gaps

and their fillers in the input strings, makes the specification of the set of potential

predicate-argument relations that the filler can be involved in (and thus the

identification of a direct address of the gap) quite an impossible task (ibid.).

The former definitions can be classified as non-partisan, i.e. they do not subscribe to

any particular syntactic theory, framework or formalism. Also being mostly dictionary

entries they are naturally confined by the constraints of brevity, simplicity and

neutrality. Apart from encyclopedic definitions, I need to establish the fact that the

study of UDs have been originally formulated within more arcane journal articles and

research monographs. For that matter Gazdar et al. (1985) presents the first

perspicuous and formally rigorous definition of UDs. I shall not dwell further on GPSG

and its treatment of UDs for I have included a whole section dedicated to this classic

and most influential treatment of UDs (see 3.2.).

1.6. The Class of UDs:

Any rigorous treatment of the phenomenon of Unbounded Dependencies should rest on

a uniform, holistic comprehension of its nature. By "holistic" I refer to the necessity of


31/149

30

treating UDs in an undivided manner; i.e. studying relative clauses, wh-questions or

topicalized constructions separately will not shed enough light on the nature and

dynamics of the phenomenon. The study of UDs should be applied to the complete set

of constructions recognized and classified as unbounded dependency constructions.

These constructions are included within the following two subsets: strong UDs and

weak UDs.

1.6.1. Strong UDs:

In what sense is the first subset of UDs "strong"? "Strength" here is rather a misnomer

for compatibility or isomorphism. They are strong because they require the filler and

the gap to be of the same syntactic category. According to Pollard & Sag (1994: 157-

158), the first subset clearly represents strong UDs because there is an overt constituent

in a non-argument position (sentences 1-5 group A) (normally the wh-phrase) that is

strongly associated with the gap indicated by "_". Strong UDs include the following

structures:

GROUP (A)

Topicalization:

(1) This sort of problemi, my motherjis difficult to talk to_jabout_i.1

Wh-questions:

(2) Which violiniare these sonatasjdifficult for them to play_jon_i?

Wh-relative clauses:

(3) This is the bookithat the manjwe told the story to _jbought_i.

I t-clefts:

1Underscores and small subscripts (j, i, etc.) in this and the following sentences represent gaps or empty

elements (traces of nominal or pronominal antecedents); this is a notational convention found in the

majority of syntactic analyses of UDs and similar grammatical constructions.


32/149

32

(4) It is Kim whoiSandy loves _i.

Pseudo-clefts:

(5) This is whatiKim loves _i.

1.6.2. Weak UDs:

Weak UDs, on the other hand, have no overt filler in a non-argument position

(sentences 1-4 group B); instead they have a constituent in an argument position that is

"loosely" co-referential with the gap or the trace. Weak UDs include the following

structures:

GROUP (B)

Tough movement:

(1)Sandyiis hard to love _i.Purpose in f in i tives:

(2)I bought itifor Sandy to eat _i.Non-wh r elati ves

(3)This is the politicianiSandy loves _i.Non-wh clefts

(4)It's KimiSandy loves _i.Two important points have to be mentioned here. First, UDs are indeed unbounded,

which means that the dependency may, theoretically speaking, extend ad infinitum.

Second, there is a syntactic category-matching condition between the filler and the gap,

especially in strong UDs. The following examples illustrate these two points:

(1)

a) Kimi, Sandy trusts _i.

b) [On Kim]i, Sandy depends _i.

(2)

a) Kimi,Chris knows Sandy trusts _i.

b) [On Kim]i,

Chris knows Sandy depends _i.


33/149

33

(3)

a) Kimi,Dana believes Chris knows Sandy trusts _i.

b) [On Kim]i, Dana believes Chris knows Sandy depends _i.

In (1) the gap is an argument of the main clause, in (2) it is an argument of an

embedded complement clause, and in (3) it is an argument of a complement clause

within a complement clause. Mathematically speaking, there is no bound on the depth

of embedding. The following diagram represents the above-mentioned in a clearer

style.

Figure (1) The Class of Unbounded Dependency Constructions

Evidently, the class of UDs has a rich taxonomical structure that justifies its

complexity. As noted above, studying each of the branches in the above tree diagram

on its own will yield unsubstantial insights into UDs. As a first approximation, the

thing that gathers all these different syntactic constructions under a uniform category is

the existence of a "gap" somewhere in the construction. An oxymoron as it might seem,


34/149

34

the existence of gaps or missing elements in the sentence is the common denominator

that holds all the above branches under one node UDs. That is why I allocate a

special section for the handling of gaps in UDs later in this work (see Chapter 4). Also,

I found out that an eclectic theory of gaps might be a step towards a better and more

profound comprehension of the phenomenon of UDs and the more general

phenomenon of gapping.

For the sake of brevity and better visibility conditions, the present work will focus

mainly on strong UDs throughout the proposed analyses and critiques. Weak UDs will

be sporadically mentioned throughout the work, though they will not have a proper

treatment on their own right. The partial exclusion of weak UDs from the work will

hardly affect the treatment of the overall phenomenon. Strong UDs have all the features

that we need in order to analyze UDs. Weak UDs, on the other hand, are more of a

subset of strong UDs: a fact that makes obviating the handling of weak UDs a

reasonable act in the footsteps of Ockham's razor.

1.7. Nomenclature

UDs have been variously termed in the literature. Y. Falk (2006: 316) recognized the

following designations: extraction, long-distance dependencies, wh dependencies

(or wh-movement), A' dependencies (or A' movement), syntactic binding,

operator movement, and constituent control. The concept owes its multifarious

terminological manifestations to different realizations of its nature and functions. Each

linguistic school or syntactic formalism saw UDs according to its defining

characteristics and theoretical grounding. Transformational theories (such as


35/149

35

Chomsky's GB, P&P and MP), for instance, have essentially a dynamic, movement-

based conception of most linguistic constructions; a fact which clearly explains the use

of such terms as "wh-movement, A' movement, syntactic binding," etc. On the contrary,

non-transformational theories (such as GPSG, HPSG, CG) proceed from a static

monostratal1 conception of linguistic constructions, hence their use of such terms as

"unbounded dependency constructions and long-distance dependencies."

Terminologically speaking, the term "extraction" is the only common ground where

transformational and non-transformational theories meet (on the use of extraction in

non-transformational contexts see Sag 1994).

1This term refers to the idea that syntactic structures are essentially monostratal, i.e. they consist of only

one level of representation, which is a surface apparent level. The Chomskyan postulate of a deep

structure is irrevocably repudiated within this monostratal framework. Gazdar et al(1985) was the firstunequivocal statement of this theorem on which are based the whole frameworks of GPSG, HPSG and

DCG. For more details see Horrocks (1987), Gazdar et al (1985), Sag et al (1994), Sag et al (2003),

Brown ed. (2006).


36/149

36


37/149

37

Chapter 2: UDs and Syntactic Formalisms


38/149

38

Chapter 2:

UDs and Syntactic Formalisms

2.1. Derivational Approaches to UDs

Most of the reviews of the literature I came across in academic theses and books

dealing with UDs, from a historiographical viewpoint, seem to be a disparate collation

of information that hardly precipitates profound understanding or evaluation of the

intellectual context that spawned and fostered the growth of syntactic theory. This is

not the case here as far as I hope. My faith is that syntactic theory (and its handling of

UDs) can hardly be understood or profoundly appreciated without a firm belief in the

utility of coming to grips with the intellectual milieu that made such scholarly feats

possible. Fortunately, the historiography of UDs in both the syntactic and the

computational realms is as much variegated as could help build a mosaic that is

informative, insightful and sufficiently panoramic. I believe, thus, along with Tomalin

(2006) that

[i]t could hardly be claimed that to consider the aims and goals of

contemporary generative grammar, without first attempting to comprehendsomething of the intellectual context out of which the theory developed, is

to labour in a penumbra of ineffectual superficiality. (Tomalin 2006: 20)

Another important factor that necessitates this line of research has to do with UDs

themselves. The study of UDs has been a major formative force in the field, a fact that

made it a prerequisite (and a keepsake) for anyone embarking on a serious study of


39/149

39

syntactic theory. The inherent complexity of unbounded dependency constructions and

the challenges they posed before syntacticians of different streamlines and the various

analytical strategies and tools proposed to handle them endowed these constructions

with a level of significance unprecedented in the field. That is why I adopted a

historical-cum-theoretical approach in studying them, because, as far as I can see, this

is the approach that is the most felicitous and the most enlightening as well.

Historically, UDs have been studied according to two different approaches: the

transformational and the non-transformational.1 Transformational approaches analyze

UDs from a movement-based perspective. The filler of a UD is marked with an

underscore (as in [a] below) then it changes its location through a series of movements

till it reaches the leftmost position in the tree.

(a)

1. Which car does John think you should purchase_?

2. That book you should read_.

3. This is the car which_ John told me he thinks I should purchase_.

4. Whom do you think Jim kissed _?

Sentence (4) (see Carnie 2006: 325) can be represented according to a transformational

(derivational) framework like the following2:

1Transformational approaches have also been known as derivational approaches, because they depend

on processes that derive, via transformations, the final output of a sentence from certain hypothesized

deep structures to their final realizations as surface structures (see Bussmann 1996; Trask 1993; Radford

2003).2The version used here is a recent version of the transformational enterprise known as P&P (Principles

and Parameters) which is the version before the last emendation stated in Chomskys The Minimalist

Program(1995).


40/149

41

Figure (2)

According to TG analyses, this is the original deep representation of the sentence;

where the wh-word is situated at the bottom of the tree. This means that in order to

move who to its proper position a number of movements have to be done. These

movements can also be illustrated in the following tree (see Carnie 2006: 326):


41/149


42/149

42

Figure (4)

The two arcs in the above figure represent the two hopsCarnie just referred to. Now,

we can have the correct S-structure where the wh-phrase will be situated at its rightful

initial position in the tree, as shown in figure (4) (Carnie: 328):


43/149

43

Figure (5)

Fodor (1978) pointed out that the effects of Wh-movement are not strictly local. The s-

structure position of a wh-phrase can be arbitrarily far from its d-structure position. The

sentence Which city did Ian visitcan serve as an example


44/149

44

Figure (6)

The analysis proceeds by creating the appropriate CP structure by attaching the phrase

which city in the [SPEC, CP] position. Then the analysis proceeds to account for did by

attaching it to the C position. Then the analysis proceeds to handle the verb visit by

identifying it with a verb that requires a NP. The analysis identifies an antecedent

where there is no argument position for the proposed NP. Here comes in the role of the

wh-trace (t) attaching it to a post-verbal NP node (Gorrell 1995: 132-133). The

fundamental line of argument evident in transformational analyses proceeds from a

psychological springboard entrenched in hypothetical reasoning that hardly accounts

for the computational handling we aspire to study.

2.2. UDs in Generalized Phrase Structure Grammar (GPSG)

The domineering nature of Noam Chomskys transformational grammar generated a

sense of dissatisfaction among leading younger linguists during the early 80s. Gerald

Gazdar was one of those leading linguists. Back at that time linguists began to call what

is now GPSG Gazdar Grammar. Gerald Gazdar, however, did not like that nor did his

collaborators: Ewan Klein, Geoffrey Pullum and Ivan Sag. Their main focus was on the

study of PSGs (Phrase Structure Grammars) but they did not have a specific name for

what they were doing. After attending a talk by Emmon Bach called Generalized


45/149


46/149

46

It has to be noted that GPSG was something of a revolution against Chomskyan TG

(Gazdar et al. 1985; Horrocks 1987; Borsely 1999; Falk 2006). And since the class of

UDs was one of the constructions that TG adherents used as proof of the inadequacy of

the class of Phrase Structure Grammars (PSGs) in describing natural language syntax,

Gazdar and his collaborators decided to show that this assumption was basically

mistaken (Falk 2006).1 Thus the earliest work in GPSG dealt with UDs in greater

detail.

Gazdars paper opened up new avenues of research in theoretical linguistics and formal

computer science producing four years later the seminal and foundational work by

Gazdar, Klein, Pullum and Sag (1985).2In Gazdaret al.(1985) we will encounter the

first formally perspicuous exposition of the nature of UDs. According to Gazdar et al.

(1985: 137) an unbounded dependency construction is one in which

(i) a syntactic relation of some kind holds between thesubstructures in the construction, and

(ii) the structural distance between these two substructures isnot restricted to some finite domain (e.g. by a requirement

that both be substructures of the same simple clause).

1GPSG was a frontal attack on transformational grammar. It not only attacked the lynchpins of

the concept of transformations, but it also showed how unfounded other sacrosanct conceptssuch as Deep Structure vs. Surface Structure are. Another attack was against the permeatingpsychologism of TG and its claim to universality. As such and against this backdrop, GPSGwas founded on a monostratal model (a model that accepts no dualisms or hypothesized deepvs. surface dichotomies) with an intricate use of set-theoretic concepts, just to cleanse their

syntactic model of any possible trace of psychologism.In spite of the rigorous nature of GKPS, the first chapter has this air of revolutionary

manifestoes, and it is by far the authors clearest statement on what GPSG is, (see Gazdar et al.1985: 1-16).2

Sometimes abbreviated as GKPS based on authors initials.


47/149

47

(iii) topicalization, relative clauses, constituent questions, freerelatives, clefts, and various other constructions in English

have been taken to involve a dependency of this kind.

According to Gazdar et al. (1985: 137), it is analytically useful to think of such

constructions, conceptualized in terms of tree geometry (in the usual way, root up and

leaves down), as having three parts: the top, the middle and the bottom. The top is the

substructure which introduces the dependency, the middle is the domain of the structure

that the dependency spans, and the bottom is the substructure in which the dependency

ends, or is eliminated. Gazdar et al.(1985: 138) illustrate their proposed tree geometry

as follows:


48/149

48

Figure (7) Tree geometry of the structure of a UD in GPSG

Gazdar et al. (1985: 138) theory of UDs claims that the principles which govern the

bottom and the middle are completely general in character, in that all types of UDs

receive the same treatment. The idea is that the proposed analysis of UDs will be

focused on the middle of the construction which involves no more than the feature

SLASH along with feature instantiation principles. Of these principles the Foot Feature

Principle (FFP) is the most important.


49/149

49

The central claim of GPSG analysis of unbounded dependencies is that these

dependencies are simply a global consequence of a linked series of mother-daughter

feature correspondences.

The main formative components of GPSG is a set of metarules that generate other

rules, such as Immediate Dominance (ID)/ Linear Precedence (LP) rules, along with

feature instantiation principles, such as FFP, Head Feature Principle (HFP) and

SLASH. The feature SLASH, however, is our mainstay in the analysis of UDs, because

it represents and accounts for the behavior of the most significant element in an

unbounded dependency constructiongaps. But what is a SLASH?

When we write down in quasi-algebraic notation that we have, for instance, a set

A/B, this means that the set A lacks or is missing the element B. The SLASH or [/] is

originally an algebraic symbol for a missing element. The value of the SLASH feature

will be a category corresponding to a gap dominated by the categories bearing a

SLASH specification. A gap is created by some Immediate Dominance (ID) rule which

introduces a constituent that has a SLASH feature; the feature-matching principles of

GPSG push it down the head path of the category on which it first appears, and a

multiplicity of metarules allow it eventually to be cashed outas a gap at the bottom of a

nonlocal tree structure (see Levine 1989: 124-5). The best way to come to grips with

the effects of the FFP apropos slash categories is to inspect an example of its

application. Consider the following ID rules:


50/149

51

1

According to the above rules and according to feature instantiation principles, we can

predict that the resulting structures will be the following:

Though the above notation seems a little difficult to follow, it is actually very

straightforward. Rule (e.) above, for instance, refers to a verb phrase (VP) missing (/) a

noun phrase, an object in this case (NP), which conforms with ID rule number (45) that

deals with transitive verbs that takes a prepositional object as part of its

1Numbers in square brackets refer to a list of rules provided as an appendix in Gazdaret al.

(1985: 245-9).


51/149

50

subcategorization such as approve of, which itself lacks the existence of this object

(PP/NP).

Now, we need to see an example illustrating all the formal nuances mentioned

above. A topicalized sentence like (a) will suffice.

(a)Sandy we want to succeed.

The normal ordering of this sentence would normally reads We want Sandy to

succeed. However, a topicalized structure such as (a) within the framework of GPSG

can be represented according to the following tree:

Figure (8)

The basic idea in GPSG analysis of UDs is that the constituent containing the gap

has a missing element feature (Falk 2006). This is represented by the [+NULL] e

above. The constituent headed by wantis a VP/NP (a verb with a missing object). The


52/149

52

e (empty) is a pronominal that refers back to Sandy. The whole clausal constituent

containing this VP/NP is S/NP, since it is missing the same NP as the VP it dominates.

As a result of the above feature sharing, the same element occupies the filler and gap

positions at the same time, without any indication or sign of movement. This

movement-less approach to UDs along with a solid formal apparatus (ID & LP rules,

metarules, FCRs, FSDs and FIPs) catapulted GPSG as a suitable alternative to the

much disputed TG framework. However, GPSG was short-lived: its sophisticated

formalism and nuanced quasi-algebraic treatment of complex phenomena such as UDs

made it forbidding to the majority of linguists during the 80s. But this was not the end

of GPSG, though. For, it continued its existence, as we shall see in the next chapter, in

a different guise, this time as the much more successful framework of HPSG (Head-

driven Phrase Structure Grammar).

2.3. UDs in Head-driven Phrase Structure Grammar (HPSG)

According to Sag et al.(1999: 435) HPSG was formulated in an intellectually eclectic

environment at Stanfords Center for the Study of Language and Information (CSLI).

During the 1980s, CSLI was incubating a number of theories, approaches and

frameworks that aim at formulating a kaleidoscopic view to language and its

mechanisms. Sag and Pollard established their theory of HPSG on a variety of theories

and formalisms: situation semantics, data type theory, TG, GPSG, CG and Unification

Grammars. This eclectic formation endowed HPSG with an undeniable flexibility on

the theoretical and formal levels.


53/149

53

There are three known hallmarks in the history of HPSG: the publication of Sag and

Pollard (1987); Sag and Pollard (1994); and Sag and Wasow (1999). These are

hallmarks in the sense that they marked some definitive changes in the views of the

authors or the formal apparatus of HPSG in general.

Unlike GPSG, HPSG shifted its attention from rules to features. This is clearly

manifested in the adoption of Unification Grammars use of typed (or sorted) feature

structures. A typed feature structure consists of features representing linguistic entities

(words, phrases, sentences) and values that identify the dimensions of those features.

For example, the feature PERSON in a given feature structure has three values: 1st, 2nd,

and 3rd. According to this, the word youhas the property second person and this is

represented by the feature value pair [PERSON 2nd]. Sag and Pollard (1994:8) suggests

that the role of their proposed linguistic theory is to give a precise specification of

which feature structures are to be considered admissible. And also according to their

view, the types of linguistic entities that correspond to the admissible feature structures

constitute the predictions of the theory.

UDs have received considerable treatment within HPSG. This could be ascribed to

two reasons: the first one has to do with the incremental theoreticalprerequisitenessof

UDs as a sophisticated syntactic phenomenon that many see as a testing-ground for any

proposed syntactic theory or formalism (see Winograd 1983; Falk 2006). The second

reason has to do with the importance of UDs within the previous contributory

progenitorGPSG.1However, HPSG took the analysis of UDs some steps further. In

HPSG UDs get more than a single feature, a wh-feature, as they used to get in GPSG.

1 It has to be noted that Ivan Sag, one of the original expositors of GPSG, became later the

central figure in HPSG work for his, along with Carl Pollard, 1987 and 1994 publications.


54/149

54

In HPSG they get two distinct features: QUE and REL for questions and relative

constructions (Pollard and Sag 1994: 159). This separation could be accounted for on

the ground that the only information that needs to be kept track of in an interrogative

dependency is the nominal-object corresponding to the wh-phrase, while in a relative

dependency the referential index of the relative pronoun is all that is required (see

Pollard and Sag 1994).

Another difference relates to the realization of feature structures in both GPSG and

HPSG. In GPSG, foot features take the same kind of value, which is normally a

syntactic category, while in HPSG, nonlocal features take setsas values.1According

to Pollard and Sag (1994: 159) this strategy will enable HPSG to deal with more

sophisticated UDs, such as multipleUDs as in the following sentences:

1- [A violin this well crafted]1, even [the most difficult sonata]2will beeasy to play2on1.

2- This is a problem which1John2is difficult to talk to2about1.It is noteworthy to mention the fact that in HPSG, strong UDs are analyzed in terms of

a filler-gap conception. This peculiar conception underscores the centrality of the

concept of gap in any treatment of UDs. This is why I think that HPSG is ahead of

most other syntactic theories in the analysis of UDs, because of this very gap-based

analysis. This competitive edge will be more clearly accounted for later in this work

(see ch.?).

1Again the mathematical, especially algebraic, influence on syntactic theory is very muchmanifested in this instance where the use of sets is borrowed from algebraic Cantorian Set

Theory.


55/149

55

Take the following sentence (P&S 1994: 160) as an example of how HPSG

analyzes a topicalized clause of the strong UDs type:

1- Kim1,we know Sandy claims Dana hates1.

Figure (8)

The analysis provided above looks similar, to a great extent, to Gazdars bottom -

middle-top model (see figure 6). In HPSG, the bottom of the arboreal skeleton is where

the dependency is introduced, because at the bottom there exists the terminal node that

triggers the whole unbounded dependency. This terminal node is associated with a

special sign that must be nonempty. As for interrogative dependencies, this sign is an

interrogative pronoun (what, which, where, etc.) with a nonempty value for the QUE


56/149

56

nonlocal feature, while in relative dependencies, the sign is a relative word (e.g. who,

which) having also a nonempty value for the REL nonlocal feature.

What really distinguishes HPSG from previous theories or formalisms is its reliance

on associativity: it attempts to associate linguistic objects with each other by a number

of concepts and techniques. Central to these is the concept of inheritance hierarchy, the

embodiment of which can be seen in the above tree diagram (figure 8). Instead of the

crude movement transformations in all versions of Transformational Grammars, we get

here a more computationally sound technique where the traits of a certain linguistic

object are inherited from one object to another. The SLASH category in the above tree,

for example, is being inherited from one stratum of analysis to the other by inserting

boxed numbers and the feature INHER. So the SLASH feature at the bottom of the

dependency passes from daughter to mother up the tree, and the top is where the

dependency is discharged or bound off (Pollard and Sag 1994: 160-161). As with

GPSG, HPSG is more inclined towards computational implementation, because it

originally availed itself from many computational models and procedures, and it has to

be noted here that the concept of inheritance is a genuine computational procedure that

HPSG incorporated into its theoretical architectonic.1

HPSG uses a number of features to construct what it considers to be a complete

description of a given linguistic entity. For the description of the syntax-semantics

interface, for example, it employs a feature SYNSEM that represents the syntactic as

well as the semantic content of a particular lexical item. This is realized via what HPSG

1The idea of inheritance is directly borrowed from computer science, especially from work onGenetic Algorithms which resorts to biological jargon and concepts such as inheritance,

evolution and survival of the fittest (see Dopico et al 2009)


57/149


58/149

58

Head-driven phrase structure grammar is a monostratal theory of natural

language grammar, based on richly specified lexical descriptions which

combine according to a small set of abstract combinatory principles stated as

formulae in a constraint logic regulating, for the most part, the satisfaction of

valence and other properties of syntactic heads. These constraints, applying

locally, determine the flow of information, encoded as feature specifications,

through arbitrarily complex syntactic representations, and capture all

syntactic dependenciesboth local and non-local in elegant and compact

form requiring no derivational apparatus.

This theoretically rich definition deserves an equally rich analysis. The first fact about

HPSG in this definition is that it is monostratal, which means that it does not subscribe

to derivational or transformational theories of natural language grammars (see fn.1 in

p.28 above). This, of course, reminds us of the early beginnings of GPSG (Gazdar

1982). The second important notice that really characterizes the theory of HPSG is its

lexicalism: as Levine (2003) puts it, HPSG is based on richly specified lexical

descriptions. This highlights HPSGs attention to the value of lexical items as bearers

of information and as the glue that binds linguistic descriptions together. In fact, HPSG

is head-driven because it relies on lexical heads, such as seesabove, on its descriptions

of linguistic entities. Finally, the definition gives us a hint concerning HPSG recourse

to mathematical and logico-mathematical jargon in its descriptions of local and non-

local (UDs) syntactic dependencies in an elegant and compact form. 1 Implied here is

1Note here also the use of elegant and compact which is a commonplace description inmathematical and logico-mathematical literature. A mathematical proof, for instance, has to beelegant and compact in the sense that it admits of no logical fallacies, internal inconsistencies

or needless tortuous sub-proofs.


59/149

59

the idea that a derivational apparatus as in GB, P&P and MP formalisms is

essentially inelegant and incompact.

HPSG, then, looks at UDs as filler-gap constructions (Pollard & Sag 1994), or as

constructions with gaps (GAPs) that can be resolved via the detection of the sites or

positions of those gaps and relating them to their original positions via inheritance. This

is realized by stipulating what HPSG calls the GAP Principle (Sag &Wasow 1999;

Carnie 2003). The GAP Principle states the following:

A well-formed phrase structure licensed by a headed rule other than the Head Filler

Rule must satisfy the following SD1:

Figure (11)

This means that the mother GAP feature subsumes all the GAP values in its daughters.

The symbol in the diagram above simply refers to the arithmetical notion of adding

up to, but this time the entities added are not single linguistic objects but lists of

linguistic objects (Sag & Wasow 1999: 351). The boxed n above is also the

arithmetical indication of the idea of any number of. Gaps in HPSG will be more

thoroughly, and comparatively, explored later along with other syntactic frameworks.

1SDs stand for Structural Descriptions, which are the amalgamation of constraints from lexical

entries, grammar rules, and relevant grammatical principles. See Sag &Wasow (1999: 68)


60/149

61

2.4. Categorial Grammar(s)

From a historical vantage point, Categorial Grammars (or CGs) antedate all generative

theories of syntax. CG was first formulated within a strictly logical backdrop: it was

Kasimir Ajdukiewicz, the famous Polish logician and algebraist of the Lvov-Warsaw

school of logic and mathematics, who introduced the idea of functional syntax in his

Die syntaktischeKonnexitt(1935). But Ajdukiewiczs treatment was strictly logico-

mathematical, a fact which made his work quite forbidding for linguists. 1Two decades

later, Yehoshua Bar-Hillel (1953), also a logician, came along with a revived interest in

Ajdukiewiczs CG, but this time combining it with many insights and methods from

American linguists during the 1950s. This new combination of ideas and methods of

mathematical logic and structural linguistics spawned a novel interest in CG in the

USA and the Continent. The interesting thing about Bar-Hillels revival of CG is his

belief in the suitability of CG for machine translation purposes. That explains why

computational linguists tend to prefer CG, and other likeminded formalisms, over other

syntactic theories bereft of such computational aptitude.

Being an offshoot of advanced logical and formal studies, CGs emphasis on the

semantics of natural languages is naturally expected. Unlike other formalisms and

theories of syntax, CG has no separate module for semantic processing; for it sees

semantics as an inherently inextricable component of syntactic description. In other

words, syntax and semantics in CG are one and the same thing: every rule of syntax is,

1Besides being an excruciating reading even for the initiated in mathematical logic,Ajdukiewiczs paper appeared in a Polish philosophical journal and has therefore been

unknown to most linguists, (Y. Bar-Hillel 1953: 1).


61/149

60

inherently, a rule of semantics (Wood 1993: 3). CG has the following properties (Wood

1993: 3-5):

(1)It sees language in terms of functions and arguments rather than of constituentstructure.

(2)Syntax and semantics are integral.(3)It is monotonous (monostratal), i.e. it avoids destructive devices such as

movement or deletion rules which characterize transformational grammars.

(4)It takes to its logical extreme the move towards lexicalism, i.e. the syntacticbehavior of any linguistic item is directly encoded in its lexical category

specifications.

The other peculiar aspect that has to do with CG and UDs is the somehow troubled

relationship between the two. Ironically, Bar-Hillel lost faith in CG because he found

out that it was unable to process discontinuous constructions (such as UDs) (Wood

1993: 23,104). But the theory of CG during the 1960s was not very much developed to

handle such sophisticated syntactic constructions such as UDs. Since that early, UDs

intractability was recognized as a processing fact that any syntactic theory or formalism

has to efficiently and rigorously account for.

Classical CG did not offer any straightforward method to deal with UDs (Wood

1993: 104). However, Ades and Steedman (1982) used the recursive power of

generalized composition to reach what they called a derivational constituent which

can be utilized to apply backwards to the fronted object giving the correct semantic

interpretation (Wood 1993: 105). A sentence like Who(m) do you think he loves?can be

represented according to Ades and Steedman (1982) in the following way


62/149

62

Figure (12)

Recent advances in CG produced the more elaborate type-logical categorical

grammar. What interests me the most in this more advanced formalism is its proposal

of a novel procedure to handle gaps in UDs. Bob Carpenter (1997) adopts Moortgats

approach to UDs to account for the existence of gaps and how they should be treated

within a CG-based framework. As Carpenter (1997: 203) mentions, Moortgats

analysis rests on proposing an additional binary category constructor,, that can be used

to construct equations of the form AB. This equation means that there is a category A

missing somewhere within it a B. For instance, snpis a sentence from which a noun

phrase has been extracted. The extraction constructor AB is a generic form for both

A/BandA\Bthat may be instantiated in the following:

snp=s/npors\np

which indicate a sentence lacking a noun phrase on the right or left frontiers. The use of

the SLASH feature in CG is similar to that in GPSG and HPSG; the difference lies in

the adoption of feature structures and AVMs in HPSG and the adoption of the Lambek


63/149

63

calculus (a semi-algebraic linear formalism) in CG. An example of how advanced CG

handles a UD can be of use here. The phrase Who Jo hitsis formally represented in

CG according to the following schemata, see Carpenter (1997: 206).

Figure (13) A representation of who Jo hits?

The postulation of (snp) in the beginning of the relative or interrogative clause (under

who) is the notational tool that unravels the unboundedness of the structure by

postulating that there is a missing noun phrase somewhere in the construction.

2.5. Lexical Functional Grammar (LFG)

This is the fourth syntactic theory through which I try to explain and unravel the nature

of UDs. LFG is one of the most prominent theories of grammar belonging to the

generative tradition. It is also one of the theories that subscribe to a non-

transformational agenda. Being non-transformational boosted the theorys potential for

a rigorous treatment of UDs. This is due to the fact that most non-transformational


64/149

64

theories and formalisms are more inclined towards formalisms that are couched in

mathematical or semi-mathematical terms. This is the case with LFG.

But in what sense LFG is different from the other theories mentioned above? It

differs from GPSG and CG in that LFG is, in fact, a complete theory of language

syntax, with a separate explanatory module for the study of language acquisition,

universals and cognitive aspects. This is not the case with GPSG or CG, because both

of them, and especially GPSG, pose ruthless critiques to the prevalent psychologism in

GB and P&P. And both of them are more devoted to such applications as

computational linguistics and AI. LFG is similar to HPSG because the latter also

sustains certain claims to universality and psychological reality. But all of them share a

staunch rejection of transformational rules and assumptions. They also share their avid

interest in lexicalism: the four of them (GPSG, HPSG, CG, LFG) see the lexicon as the

springboard for any viable and true grammatical analysis.

As opposed to GB and P&P, the non-transformational approaches mentioned above

see lexical categories as the keys with which we can unravel syntactic riddles,

especially the riddle of UDs. That also accounts for the high importance of UDs

analyses within the frameworks of all those theories. GPSG proposed the Head Feature

Principle, which restores to lexical items their due powers instead of ascribing all

powers to extra-linguistic features and movements as is the case with transformational

grammars (Falk 2001). HPSG, which is a more stringent framework than GPSG

(Carnie 2008), bases the entire linguistic analysis on the head sign, which is an

instantiation of a certain lexical item or word. CG is even more extremist on the issue


65/149

65

of lexicalism; that is why it derives its analytical momentum from certain atomic

lexical categories.

LFG is also lexical or lexicalist because the lexicon plays a major role in it. In LFG

(Dalrymple ELL2 2006) the lexicon is richly structured, with lexical relations rather

than transformations or operations on phrase structure trees as a means of capturing

linguistic generalizations. Yehuda Falk (2003) adds to the major tenets of LFG what he

calls the Lexical Integrity Principle, which states the following:

Words are the atoms out of which syntactic structure is built. Syntactic

rules cannot create words or refer to the internal structures of words, and

each terminal node (or leaf of the tree) is a word.(Falk 2003: 4)

The other aspect of LFG has to do with its emphasis on functionalism. The

functional part of LFG means that grammatical functions (or grammatical relations)

such as subject and object are primitives of the theory, not defined in terms of structural

configurations or semantic roles1 (Dalrymple 2006). LFG grants such grammatical

functions as subject and object a rather universal character where such abstract

grammatical functions are at play in the structure of all languages no matter how

dissimilar they might appear. The theory assumes that as languages obey certain

universal principles as regards abstract syntactic structures, they do the same thing

regarding the principles of functional organization (Dalrymple 2001). This is LFG as

pertains to its nomenclature, i.e. the lexical and functional epithets.

1This is the standard view of transformational approaches. According to this view subject and

object are not part of the syntax vocabulary, i.e. they are extra-configurational. Thosegrammatical functions or relations derive from the phrase structure they happen to occur in. Ifsubjects, for example, can be controlled, this control, according to this view, is attributed to thestructural lineaments of the position where the subject occurs. For a more in depth discussion,

see Falk (2003).


66/149

66

C-structure and F-structure:

The two divisions of the formal architecture of LFG are constituent structure (c-

structure) and functional structure (f-structure). The c-structure is concerned with the

description of syntactic structure while the f-structure details the semantic-cum-

functional structure of the linguistic entities concerned. The formal machinery of c-

structure depends on X-bar syntax with the addition of a number of techniques and

concepts that characterize the LFG theory and its formalism. C-structure can be

illustrated according to the following figure (Falk 2003) analyzing the following clause:

What Rachel thinks Ross put on the shelf

Figure (14)

According to this description the empty category (e) is tied to or bound with the

antecedent filler by what LFG calls metavaraibles represented by the up and down

arrows. The use of double arrows has been left over in the more recent versions of LFG


67/149

67

incorporating functional components into the tree; this could be illustrated in the

following sentence:

Figure (15) the c-structure of What Rachel thinks Ross put on the table?

The corresponding f-structure looks like the following

Figure (16) the f-structure of What Rachel thinks Ross put on the table?

The previous descriptions are classic representations of UDs that are due to Kaplan and

Bresnan (1982) and Kaplan and Zaenan (1989) respectively.

More recent advances in LFG tend to be more detailed and hence more

sophisticated. The following example from Asudeh (2009) is just an example. The


68/149

68

clause What did the strange, green entity seem to try to quickly hide?gets the following

constituent and functional descriptions respectively:

Figure (17) C-structure of What did the strange, green entity seem to try to quickly hide?

(Asudeh 2009)


69/149

69

Figure (18) F-structure of What did the strange, green entity seem to try to quickly hide?

(Asudeh 2009)

The interesting thing about this clause, however, is that it not only describes how LFG

handles the phenomenon of UDs but it also describes a host of other syntactic

phenomena such as Adjunction, Raising and Control.

To sum up, early LFG (Kaplan and Bresnan 1982) analyzed UDs in terms of c-

structure that explicitly drew the relation between a displaced constituent and its

corresponding gap via the double arrow notation. However, Kaplan and Zaenan (1989)


70/149

71

showed that the previous treatment was deficient in accounting for functional

constraints on UDs (Dalrymple 2001). This led them to incorporating f-structure

components in their analysis of UDs, thus abandoning the double arrow notation as

seen in figure (15) above.

2.6. Towards an Ontology of Gaps

The previous accounts pose a serious question as to the various treatments of UDs. But,

despite the various moot points among the many theories and formalisms scantily

described in the previous sections, the one thing that all those theories tend to agree

upon is that the key to unlocking the sophistication of unbounded constructions lies in

providing a rigorous account of gaps (a.k.a. empty categories, null elements, missing

elements, SLASH categories, traces). A correct and rigorous account of gaps will be

the liaison between the purely theoretical treatment of UDs and computational

implementation. This is due to the fact that dealing with gaps represents a crystallized

problem, and all computational theorizing or implementation is based on problem-

solving. Thus first we need to identify what might be called an ontologyo

a proposed approach to handling unbounded dependencies in automatic parsers

Documents