representativeness in the data base: a polemical update for the twenty-first century

23
Pergamon Language Sciences, Vol. 20, No. 2, pp. 113-135, 1998 © 1998 Elsevier Science Ltd. All rights reserved Printed in Great Britain 0388-0001/98 $19.00+0.00 Plh S0388-0001 (97)00024-7 REPRESENTATIVENESS IN THE DATA BASE: A POLEMICAL UPDATE FOR THE TWENTY-FIRST CENTURY JONATHAN OWENS 1. Introduction In his book Understanding Grammar (1979), Talmy Givrn criticized the state of art as he interpreted it in theoretical linguistic studies, arguing that linguists unnaturally and unneces- sarily restricted their purview of linguistic data to that which could be conveniently described by certain models of linguistics. The data base, according to Givrn, was gutted to promote the serviceability of the model. In particular, aspects of grammar relating to pragmatics, text structure and diachronic development had been neglected at the expense of the study of structures which could be formalized within a 'context-free' sentence-based model of language. Giv6n's critical interest was, like that of many good scholars, expansionist and integrationist. He defined aspects of language which had not, in his conception of the linguistic field, been adequately grafted into the standard linguistic canon and argued for the necessity of such an integration. It is not my purpose here to examine the extent to which Givrn's criticisms have been 'redressed'l since 1979. Impressionistically, judging for example by the variegated range of specialized linguistic journals on pragmatics, discourse structure, and semantics, much of his concern has been given earnest consideration. A basic consideration lying behind Givrn's criticisms is the question of representativeness. The question which Givrn addressed was whether a generative grammar, of whatever fashion, is adequately representative of the entire structure of language. Representativeness is of central concern to any generally formulated proposition, and hence must be of central concern to all branches of linguistics. Within linguistics it has two aspects, which may be termed thematic and language-specific representativeness. Givrn's criticisms related to the former, defining the more or less discrete components which can be systematized within the study of language. The latter relates to the range of languages which are taken into serious and detailed account in defining the thematic content. It is this aspect of representativeness which I am concerned with in this paper. There are, of course, sub-fields of linguistics where language representativeness forms part of the definition of the field of inquiry, such as language typology. In most, however, the range of languages covered is treated as independent of the essential thematic content itself. That is, there exist such linguistic entities, more or less well defined, as phonemes, phrase structure, semantics, discourse structure, and sociolinguistics independent of any language realizing them. For instance, the statement that there tend to be significant correlations between given Correspondence relating to this paper should be addressed to: Professor J. Owens, Arabistik, Universitat Bayreuth, Bayreuth 95440, Germany. I13

Upload: jonathan-owens

Post on 17-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Representativeness in the data base: A polemical update for the twenty-first century

Pergamon

Language Sciences, Vol. 20, No. 2, pp. 113-135, 1998 © 1998 Elsevier Science Ltd. All rights reserved

Printed in Great Britain 0388-0001/98 $19.00+0.00

Plh S0388-0001 (97)00024-7

R E P R E S E N T A T I V E N E S S IN T H E DATA BASE: A P O L E M I C A L U P D A T E F O R T H E T W E N T Y - F I R S T C E N T U R Y

J O N A T H A N O W E N S

1. Introduction

In his book Understanding Grammar (1979), Talmy Givrn criticized the state of art as he interpreted it in theoretical linguistic studies, arguing that linguists unnaturally and unneces- sarily restricted their purview of linguistic data to that which could be conveniently described by certain models of linguistics. The data base, according to Givrn, was gutted to promote the serviceability of the model. In particular, aspects of grammar relating to pragmatics, text structure and diachronic development had been neglected at the expense of the study of structures which could be formalized within a 'context-free' sentence-based model of language.

Giv6n's critical interest was, like that of many good scholars, expansionist and integrationist. He defined aspects of language which had not, in his conception of the linguistic field, been adequately grafted into the standard linguistic canon and argued for the necessity of such an integration. It is not my purpose here to examine the extent to which Givrn's criticisms have been 'redressed'l since 1979. Impressionistically, judging for example by the variegated range of specialized linguistic journals on pragmatics, discourse structure, and semantics, much of his concern has been given earnest consideration.

A basic consideration lying behind Givrn's criticisms is the question of representativeness. The question which Givrn addressed was whether a generative grammar, of whatever fashion, is adequately representative of the entire structure of language. Representativeness is of central concern to any generally formulated proposition, and hence must be of central concern to all branches of linguistics. Within linguistics it has two aspects, which may be termed thematic and language-specific representativeness. Givrn's criticisms related to the former, defining the more or less discrete components which can be systematized within the study of language. The latter relates to the range of languages which are taken into serious and detailed account in defining the thematic content. It is this aspect of representativeness which I am concerned with in this paper.

There are, of course, sub-fields of linguistics where language representativeness forms part of the definition of the field of inquiry, such as language typology. In most, however, the range of languages covered is treated as independent of the essential thematic content itself. That is, there exist such linguistic entities, more or less well defined, as phonemes, phrase structure, semantics, discourse structure, and sociolinguistics independent of any language realizing them. For instance, the statement that there tend to be significant correlations between given

Correspondence relating to this paper should be addressed to: Professor J. Owens, Arabistik, Universitat Bayreuth, Bayreuth 95440, Germany.

I13

Page 2: Representativeness in the data base: A polemical update for the twenty-first century

114 JONATHAN OWENS

linguistic structures and given social categories, would I assume be accepted as a valid axiom of sociolinguistics, without requiring substantiating evidence from particular languages.

What I will suggest in this paper, however, is that there has in recent years arisen an unarticulated alliance between linguists working among a small set of languages and linguists (often the same ones) defining the thematic content of the various sub-disciplines of linguistics. Furthermore, the unrepresentativeness of the languages chosen for study impinges on the validity of the definition of thematic content.

The thesis I argue here derives from my own work in linguistics (PhD 1978) over the last twenty years, and I have therefore chosen to illustrate it in two domains where I have accumulated some experience: sociolinguistics and creole linguistics. To provide some orientation for the reader I begin in section 2 by citing some statistics about the content of articles from standard journals in these two fields. In sections 3 and 4 1 discuss some substantive issues arising from data restrictiveness in the respective fields, and in section 5 offer some general observations.

2. Two sub-disciplines, two journals

In the modem academic world one feature identifying a discipline or sub-discipline as an entity unto itself is the existence of a journal devoted exclusively to it. The two sub-disciplines considered here have at least one. For the present review I have chosen Language in Society (= LS established 1972) and The Journal of Pidgin and Creole Languages (= JPCL established 1986) for the six-year period 1988-1993. The articles (no reviews) in these journals were classified according to the languages which they dealt with. I should emphasize that the following figures are in no way assumed to reflect on the editorial policy of any of the journals. Rather, the general range of content is taken to reflect the interests of linguists specializing in the various sub-fields. That is, the content is assumed to be determined by the general linguistic climate, not by individual editorial policies. The following represents a summary of the range of languages covered in the issues.

In Table 1 creolists may object to the classification of creole languages such as Papiamentu, Negerhollands and Bislama as being based on European languages. For present purposes the classification may be regarded as uncontroversial from one classical comparativist perspective. The lexicon from all the creoles I regard as European-based are overwhelmingly from an European language (English, French, Portuguese or Dutch), and on this basis regular sound- meaning correspondences may be established between the European language and the respective creole.

In Table 2 the LS articles cover the greatest range of languages and geographical areas, so

Table 1. JPCL

European-based creoles Others

1988 7 0 1989 4 1 (Fanagalo) 1990 7 0 1991 8 0 1992 7 0 1993 2 0 Total 33 1

Probability = 0.0000 i.

Page 3: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE

Table 2. LS

Euro. Middle E. Asia Africa S.+C. Am. Native

1988 11 1 2 1 0 0 1989 11 2 1 0 0 1 1990 13.5 0 1.5 0 0 0 1991 7 1 5 1 2 2 1992 13 2 3 0 0 1 1993 5.5 0.5 2 2 3 0 Total 61 6.5 14.5 4 5 4

p = 0.0000.

115

inevitably a degree of arbitrariness enters into the grouping parameters. On the whole I have followed a simple geographical distribution, while singling out the Middle East and the native languages (= 'native') of North America and Australia (but not South America) from their larger geographical unity. As might be expected, using finer categorizations (e.g. dividing Asia into China, Japan, India and the remainder) has no significant effect on the probabilities. 2

The overall tendency in each of the journals points in one direction, namely an over- proportional attention to articles dealing with European languages, plus some interest in Japanese and Chinese (for LS). Note that this tendency has nothing to do with the stated aims of any of these journals, none of which delimit the range of languages they intend to treat in any predetermined way, beyond their general thematic focus. It is certainly no coincidence that it is just these languages which are spoken in what very roughly can be termed the industrial or post-industrial regions of the world. For the most part these are European languages, with a heavy interest in English. Since the statistics generally assign languages to geographical areas, I consider Japanese (and Chinese) as in a separate group and refer to the dominance of languages spoken in the 'West'. Other perspectives would be suggestive: languages of the industrialized world (including Japan), for example, or languages of the rapidly developing world, or English vs. all other languages. I will not pursue these classificatory strands and their implications here, however.

Assuming, for the sake of the following discussion, that this overemphasis on one set of languages is typical of these sub-fields of linguistics generally, a question arises whether the delimitation of the data to the Western languages doesn't affect the tenor and range of questions which are addressed in them.

In the following I would like to indicate where a lack of adequate data has consequences for general issues in the linguistic sub-field. While my purpose is to sketch broad problems, proof of their existence is shown by a consideration of specific sets of data. In section 3 I consider issues in creolization and section 4 sociolinguistics.

3. Creole languages and restructured languages

Probably the most interesting attraction of creole languages for linguists is the question of their origin. 3 Roughly speaking, three possible sources for the grammars of creoles have tended to dominate the discussion about creole origins as in (1-3) below, with various intermediate versions derived from these (as e.g. in (4)).

(1) a universal grammar of some sort (UG) (2) the grammar of the lexifier language

Page 4: Representativeness in the data base: A polemical update for the twenty-first century

116 JONATHAN OWENS

(3) substrate languages (4) least common denominator of substrates (+ UG)

Each of these sources had been put forward at one time or another to explain the origin of at least some creole languages. Indeed, Haitian creole has been adduced to support three of the four positions. Hall (1974, pp. 109 ff., 117) classified Haitian creole as mostly closely related to seventeenth-century Norman French, Bickerton (1981) explains its structure in terms of a universal bioprogram, while Lefebvre (1986, following Sylvain, 1936) sees it as a relexification of a West African substrate language, like Ewe. The fourth explanation is a more specific version of the third, though it also allows input from language universals (Thomason and Kaufman, 1988, p. 153). Paralleling the skewed distribution of creole languages cited in Table 1, no non-European based creole (in the sense of section 2) has been allowed into this debate in a serious manner. This is unfortunate, because such languages with different socio-historical origins from those of the European-based creoles have the potential to at least verify or modify hypotheses developed on the basis of the latter. 4

A good example of this I believe pertains to Bickerton's (1981) bioprogrammatic origin for the grammar of certain creoles, a variant of (1) above. His hypothesis is clearly formulated and intuitively appealing. He reasoned that when creole languages crystallize within a short time (a period of not more than two generations), when the speakers of this generation are not dominated by a single group of substrate speakers and when they have been cut off from contact with their native homeland, then the speakers are forced to develop a language on the basis of a universal input, a bioprogram. Not all creoles developed under such circumstances. Examples of those which did include Saramaccan, Hawaiian Creole English and Haitian Creole (so-called early-creolizing creoles). In his 1981 book he described a number of common linguistic properties which this class of creoles have, arguing that these features arise out of a tendency of speakers, lacking linguistic input of other types, to develop their grammars on the basis of an inherent, biological blueprint.

The plausibility of Bickerton's argument rests on the exclusion of alternative explanations for the existence of the shared linguistic properties, borrowing, common substrate and the like. There has been considerable debate among creolists as to what extent preconditions for the emergence of a bioprogram are met in the various creoles adduced as data in Bickerton's 1981 study (cf. remarks above on Haitian creole). The argumentation in this arena has become so convoluted that one might think that creolists would welcome the chance to verify Bickerton's hypothesis with a creole language with a completely different historical, linguistic, social and geographical background, yet which essentially fulfils the assumed conditions for the appearance of the bioprogram.

Such a language is East African Nubi, a creole language originating with speakers in the (extremely multilingual) southern Sudan at the end of the nineteenth century, yet who, within forty years of its formation in 1888 had to flee the Sudan, settling in Uganda and Kenya, where they live today. The language is lexified from Sudanic Arabic. Testing Bickerton's hypothesis is thus an easy matter: does Nubi have the bioprogram features? Bickerton's hypothesis would predict that it should, since Nubi is a creole originating in a multilingual environment, demographically (at least to 80% of the population) dominated by speakers of the different substrate languages, whose speakers left the substrate homeland within two generations of its incipience.

It turns out (Owens 1990, 1991) that it does not, that the bioprogrammatic features are found in Nubi to a degree no more predicted than by chance. According to the bioprogram, for

Page 5: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 117

instance, a Q-word is predicted to be preposed to the beginning of the sentence (Bickerton, 1981, pp. 70-72). In Nubi, however, the Q-word holds the position of its non-Q counterpart.

(5) ita gi-ja miteeni you prog-come when 'When are you coming?'

In a comparison between Bickerton's 14 bioprogram features and their reflex in Nubi, only five agree.

End of Bickerton's bioprogram? Dispassionate application of the comparative method might call for such. But the implications of Nubi for this universal creole hypothesis have largely been ignored, in line with the tendency of Creolists to ignore data from outside the realm of creoles lexified with European languages.

As noted above, hypothesis (2) above was formulated most explicitly by Hall, writing on Haitian creole. It has fallen out of favor in recent years, though I have argued that a systematic phonological, morphological and syntactic comparison between Nubi, its lexifier (Sudanic Arabic) and potential substrate languages reveals that the greatest degree of correspondence is found between Nubi and Sudanic Arabic (Owens 1990, 1991). Does this mean, however, that a recent tendency, exemplified for instance in Holm's (1988) compendious survey of the world's pidgin and creole languages, is justified, namely to refer to certain creoles (like Nubi, Holm 1988, p. 568) as 'restructured x', Nubi='restructured Arabic'. Samarin (1991, p. 51 n. 2) has taken strong exception to such a terminology, and I think for good reason. It turns out that 'restructured x's' are, for Holm, mainly non-European-based creoles. The term is not applied to pidgin/creole (PC) French or Portuguese, and only sporadically to PCs of European lexical origin (e.g. restructured English in West Africa (p. 406), but otherwise, creole or Pidgin English, e.g. of Surinam (p. 432), restructured Spanish (p. 304)). Now, the terminology itself is unobjectionable, provided it were applied to all creoles equally. Saramaccan is 'restructured English', Haitian creole 'restructured French', and so on. However, one hardly needs another synonym for the already existing 'creole' or 'pidgin'.

The term 'restructured x' has never been defined by Holm (or most other ¢reolists; Mufwene, 1996, p. 83, democratically applies it to all creoles), so I will venture an explanation for its existence: It is applied especially to the more exotic creoles (i.e. those not of European language lexical source) in order to marginalize their importance to creole studies. Nubi is a creole, yes, but it is only (really?)just restructured Arabic, a variant of Arabic. Saramaccan, on the other hand, is a creole, but not restructured English. Only true creoles are of interest to creolists.

However, when one looks for substantive differences between the various creoles it is apparent that the African creoles share many similarities with other creoles. Haitian creole is not mutually intelligible with French. Similarly, East African Nubi is not mutually intelligible with any dialect of Arabic, Sango is not mutually intelligible with Ngbandi or other Central African Republic languages, and Kituba, which will be discussed in greater detail below, is not mutually intelligible with Kikongo. The speakers of Nubi do not consider their language 'Arabic' (either dialectal or Classical) and even the speakers of 'Juba Arabic', the variety mutually intelligible with Nubi spoken by southern Sudanese, do not consider Juba Arabic to be Arabic. Similarly, Kituba speakers do not consider their language to be Kikongo, and so on. Furthermore, the African creoles have lost key structural elements which are intimately associated with their lexical donors. Unlike Arabic, for example, Nubi words are not based on

Page 6: Representativeness in the data base: A polemical update for the twenty-first century

118 JONATHAN OWENS

the well-known Semitic tri-consonantal root structure whereby word classes are derived via vowel intercalation and ablaut changes. Kituba, as will be seen below, has lost the key Bantu system of concordial relations based on an extensive gender class system. In other words, Nubi, Sango, and Kituba have been structurally and communicatively so vastly restructured that they are probably no more (but also perhaps no less) like their lexical donor language than Haitian creole is like French. Unless it can be shown that the 'restructuring' of English into Saramaccan or Jamaican creole is qualitatively different from that of Arabic into Nubi, the terminology is better dispensed with.

Moreover, there is one interesting difference which distinguishes Sango, Kituba and Lingala from most other creoles. While Nubi is not simply restructured Arabic, nor Haitian French simply restructured French, both of these creoles, and many others, can come under the status- based influence of their lexical donors, leading to the formation of the well-known creole continuum. 6 These creoles can build a structural bridge leading stepwise 'back' to the lexical donor. While detailed information is lacking, it has been reported that both Kituba and Sango in undergoing nativization are gaining complexity and moving in the structural direction of the other languages in the area. Yanga (1980), for example, speaks of the 'rebantuization' of Lingala and Fehderau (1966, p. 116) speaks of a 'streamlining' of Kituba. In this process, formerly free affixes are tending to become bound, and to increase in allomorphic complexity. What is not reported, however, is a re-Kongoization of Kituba (nor a re-Ngbandization of Sango). The factors behind this difference are probably as much socio-political as they are linguistic. The fact remains, however, that it would be ironical to call Kituba 'restructured Kikongo', when it is one of the few creoles that refuses to realign itself with its structural mother (see Mufwene, 1997, p. 184).

At this point it is time to take the discussion to a more concrete level, looking in greater detail at what sort of restructuring actually characterizes two African creoles: Kituba and Lingala.

The Congo River basin is a fairly rich area for creolists, harboring no less than three creole languages: Sango, spoken in the Central African Republic, Lingala, an urban lingua Franca and lingua franca of Central Zaire, and Kituba, in western Democratic Republic of Congo. 7 According to Samarin (1982, 1991), who is the only scholar to have examined the question in detail not only from a linguistic, but also a demographic and historical standpoint, all of the creole languages in the area arose at the end of the nineteenth century under similar circumstances, namely the need of a large group of non-native African colonial functionaries working for either the French or the Belgians, to communicate with the local population. These African foreigners came from all parts of Africa: East African Zanzabaris, Zulus from South Africa, and West Africans, often referred to generically as 'Senegalese'. This latter group was particularly important because it tended to constitute a mid-rank bureaucracy, forming an intermediate level between the indigenes and the European colonialists. In all three cases there arose a language simpler in certain respects than that from which it was 'derived'.

There appears to be consensus as to which language was the lexical source: for Kituba it was Kikongo (Fehderau, 1966, p. 70 ff.; Mufwene, 1988), for Lingala a language or group of languages known as Bobangi (Guthrie, 1944; Knappert, 19798), and for Sango an indigenous language of the same name (termed Vernacular Sango) belonging to the Ngbandi group of languages (Jaquot, 1961). What is striking (for creolist circles) is the relative unanimity about the source of these creoles. 9

One reason for this harmony may, of course, be that there has been too little research done on these languages for widely divergent interpretive positions to have developed. In fact, the three

Page 7: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 119

creoles differ from each other structurally and in their comparative linguistic history in clear ways. Rather than deal with all three, however, a concise point can be made about the advantages for creole studies of including these languages in a comparative purview by considering only two of them: Kituba and Lingala. Both of these creoles (adopting Samarin's interpretation) arose at roughly the same time, Kituba probably slightly before Lingala, in adjacent areas of the Congo River basin. Kituba developed in the westernmost regions of Democratic Republic of Congo, west of the capital Kinshasha, and was used, approximately, as far as Stanley Pool (at modem Kinshasha), whereafter Lingala took over, extending to the northeast of today's Kinshasha along the Congo River. These two languages allow one to look at the interplay between the lexical source language and substrates in their genesis, where a large number of the substrate languages belong to a typologically (and genetically) similar group of languages, namely Bantu languages.

The languages in the NW region of Democratic Republic of Congo are exclusively Bantu. They are typologically very close to one another, though differentiated enough to guarantee some substrate variation. For example, according to Guthrie (1953) along the stretch of the Congo River where the postulated origin of Lingala lies, between Stanley Pool and the confluence of the Ubangi and Congo rivers, there are, depending on how the count is made, between six and 10 languages spokenJ ° This includes Bobangi, probably the most immediate ancestor of Lingala. To this residential substrate would have to be added speakers from neighboring languages who participated in various early colonial work projects. The Kikongo- speaking area where Kituba arose (see Mufwene, 1988, p. 40) is more unified, though Kikongo itself shows considerable dialectal differentiation. The data used in this paper is taken from Laadi spoken in Congo (Brazzaville), which Jacquot (1982, p. 7) considers a Kikongo di~ilect. Fehderau (1966, p. 74) found that on a phono-lexical basis Laadi is one of the Kikongo dialects with the highest degree of affinity to Kituba. In general it would appear that there was a considerably greater degree of agreement between superstrate (Kikongo and Bobangi) and substrate languages in this area than is the case for most creole languages.

This historical background is tailored to testing in particular hypothesis (4) above. Given the structural similarity between the contact languages of the speakers who presumably contributed to the formation of Kituba and Lingala one would expect that the resulting creole should show a high degree of structural congruence with the languages of the area.

Before introducing one set of linguistic data bearing on this point it is necessary to describe briefly the descriptive material used. This is because there is not one variety of Kituba and Lingala, but rather a number of varieties. As might be expected, both languages form a continuum whose 'upper' end is a variety strongly influenced by the Bantu regions of the area. For Lingala in particular authors have spoken of the rebantuization of the language (see n. 14). A number of textbooks (e.g. Odhner, 1981) exemplify this variety of Lingala, its morphology completely falling within the Bantu structural type. Unfortunately there are no variational studies on the spoken language (see Rottland, 1979, for the written) allowing a precise characterization of the scale of variation in the language. Yanga (1980) speaks of three stages in the development of the language: a pidgin stage virtually lacking inflection, a creole with a minimal amount of inflectional structure, and a post-creole variety, the rebantuized variety. He notes that the rebantuized variety has been markedly unsuccessful in establishing itself among the population at large, being largely restricted to literary varieties and to normative textbooks. Interesting though Yanga's observations are, as the basis for the Lingala description I use Guthrie (1935)) I Guthrie was a noted Bantuist, and well aware of the significant differences between what was then termed vehicular Lingala and other varieties of the language. More

Page 8: Representativeness in the data base: A polemical update for the twenty-first century

120 JONATHAN OWENS

importantly, published in 1935, his grammar describes the language as it was spoken by the earliest generations of its speakers. If Lingala came into existence at the end of the nineteenth century, some of its founding speakers would probably still have been alive at the time Guthrie did his research. 12

For Kituba I have used Swift and Zola (1963), Fehderau (1966) and Hochegger (1981), which show a fairly high degree of agreement. In Kituba as well there exists a continuum of varieties, with a tendency towards the development of inflectional morphological categories, similar probably to what is occurring with Lingala. The works used, however, either tend to ignore the more inflected variety (Swift and Zola), or to offer it as an alternative (Hochegger). I base my discussion on the less inflected variety, assuming that this more closely reflects the earlier state.

Given the large degree of structural correspondence between both the lexifier languages and the areal substrate languages, one would expect Kituba and Lingala to evince a large degree of congruence with these languages. I will test this idea in one component of grammar, namely morphology. To gain an overview of the problem I have quantified the number of inflectional morphemes which are found in the two creoles, Lingala and Kituba, and their purported lexical donor languages, Bobangi and Kikongo respectively. Such a quantification in the case of the creoles is not particularly problematic, since there is not a great deal. For the other two languages the problems are much greater, and I would not claim to have achieved a perfect analysis. Nonetheless, I believe the current results are adequate for the comparative purposes at hand.

While a great deal could be said about the accounting practices used to derive these figures, I will try to be as brief as possible, as the point which I am making does not ultimately depend on the precise method used. First, this is intended to be a listing of individual inflections, not of linguistic forms. Thus, the Bobangi future continuative, which Whitehead lists as a separate tense (high tone here and elsewhere left unmarked)

(6) nh-kt-kata-ka I-prog-hold-cont 'I shall be holding' (Whitehead 1899, p. 45)

is not counted here as contributing an inflectional affix. Rather the two component aspect inflections, k6- 'progressive' and -/ca 'continuative', are included among the Bobangi tense and

Table 3. Formally distinctive inflections: creole lexifiers

Inflections Bobangi Kikongo

Noun sg 6 10 pl 6 6

Bound pronouns verb prefix 16 16 object prefix 16 16

Verb voice 6 24 tense, mode 23 11

Derivational 2 (Agreeing categories 7 5)

Total 75 83 With agreeing categories: 82 88

Page 9: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 121

Table 4. Inflections: two Bantu creoles

Lingala Kituba

Noun sg 6 4 pl 6" 4

Bound pronouns subject 8 0

Verb voice 6 6 tense, mode 5 4 m

(Agreeing categories 1 0)

Total 31 16 With agreeing categories: 32 16

mode affixes, and each of these combines in various ways to form simple or complex verbal finite inflectional categories.

The discrete inflectional markers can be marked tonally, as in the Lingala subjunctive.

(7) bd-sala they-worked (indicative) vs. ba-sald (subjunctive) (Guthrie, 1943, p. 122)

In Kikongo the five modes are distinguished exclusively by different tone patterns (Jacquot, 1982, p. 144).

For the nominal prefixes I have counted according to morphological forms, not according to nominal classes. In the traditional Bantu grammar each noun class generally has a singular form paired with a plural. Whitehead, for example, gives the first two Bobangi classes as the N- (beginning with nasal), ha- class and the too-, ha- class:

(8) mp6mba 'senior' ba-mp6mba 'seniors' mo-to 'person' ba-to 'people'

In my reckoning these have been counted as representing three inflectional elements, N-, mo- and ba-, where the last may be considered to be shared among different noun classes. In Kikongo there are three classes (1/3, 8/18, 14/19) identical except for the nominal prefix. For brevity's sake I have not counted these as separate classes.

For Kituba and Lingala I have followed the authors of the grammars in considering the various voice suffixes as productive suffixes. In Bantu languages these are suffixes which indicate such meanings as reciprocity, causativity, and stativeness (with attendant diathetic changes). It is possible that originally these suffixes were introduced into Kituba and Lingala as frozen lexical material which a Bantuist like Guthrie would naturally interpret as a normal productive voice suffix. With the rebantuization of the languages they would have become productive. In this case one would no more consider the Kituba causative form ku-disa 'to feed s.o' to contain the suffix -is (cf. ku-dia "eat") than one would consider the Nubi lobfi 'wind' as having a definite article (< Sudanic Arabic al-habuub). Lack of a diachronic perspective on these languages hinders a judgement in this case. Note here for Kituba that I have given a lower figure of inflectional markers for the sg-pl pairs than is usually found in the grammars. One can consider (as I have done) the sg form di- in di-lala 'oranges' (Hochegger, 1981, p. 25) a prefix

Page 10: Representativeness in the data base: A polemical update for the twenty-first century

122 JONATHAN OWENS

of some sort on the basis of the fact (and only on this) that the plural is formed here (and for many other nouns beginning with di-) by substituting ma- for di-, ma-lala 'oranges'. However, there is no grammatical basis for recognizing a noun prefix lu-, as Hochegger (1981, p. 27) does, because it pluralizes by prefixing ba- to lu, as in ludimi 'tongue' ba-ludumi. Lu- derives historically from a class marker (cf. Kikongo linkbdi - mahki~ndi 'banana(s)'), though in Kituba it has been integrated into the root, just as in the Nubi example above al- has become a part of the root.

Two Kituba verbal prefixes are problematic in that in the grammars they are generally not marked as clitics, i.e. are written as separate words (as opposed to other affixes, which are). These are the future marker ke and the perfect marker me.

(9) inla" nge ke pesa munu what you fut give me 'What will you give me?' (Swift and Zola, 1963, p. 36)

I have classified ke- and me- as inflectional prefixes because in the examples in all grammars they occur immediately before a verb (or auxiliary), and they do not appear to be independent, adverbial-like elements.

Tables 3 and 4 show clearly that, relative to their lexical source languages, Kituba and Lingala have a sharply reduced inflectional system. In fact, I would go so far as to say that they have a drastically reduced system when two further points are considered. First, the statistics do not take into account morphophonological variation, and this is far higher in the source languages than in the creoles. In Bobangi, for instance, Whitehead (1899, p. 28) recognizes no fewer than 17 different verb conjugations, which are necessary to account for allomorphemic variants of the voice suffixes. The morphophonological and tonological variation for Kikongo is, if anything, greater than that for Bobangi. Some of the variation thereby accounted for is of relatively automatic morphophonological nature (especially vowel harmony), but much is not. In Bobangi, the vowels of the stative suffix, for example, appear to follow vowel-harmonic rules, though the consonant differs: -Vn V, -wa, VmV, VIV and -ya are all different realizations of this suffix, according to the verb conjugation it is suffixed to. While Lingala and Kituba do have a small degree of morphophonological variation, usually associated with the addition of a voice suffix (and hence lexically inherited), it is of a far smaller degree than either Bobangi or Kikongo. I have also ignored reduplication (frequently used) and compounding (relatively little used), processes which are more common in Bobangi and Kikongo than in the two creoles.

Even more important, however, is the number in parentheses in Tables 3 and 4, the agreeing categories. The noun-class system in Bantu is a part of larger system of agreement which encompasses nearly all independent syntactic elements. The following example from Kikongo illustrates this.

(10) m~-n t~ g~-u laJ n~-n-j~k~li eeti ko 1-person 1-this not I-him/l-saw absolutely 'This person, I never saw him' (Jacquot, 1982, p. 230)

The prefix mt~t~ is the mark of the first gender class, with two agreement morphemes in this example: g~ the prefix marking a demonstrative in this gender class, and -n- the class 1 object marker.

By 'agreeing category' is meant those items which take agreement markers indicating

Page 11: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 123

agreement with the noun. Thus in Bobangi, the verb, adjective, demonstrative, numerals 1-5, reflexive ('myself'), two different possessive morphemes, and third-person pronouns all vary according to the noun they agree with. For Bobangi, and even more so for Kikongo, the number of stated agreeing categories is certainly too small, since individual items have often been lumped together into one category. 13 Each of the agreeing categories has, therefore, its own set of inflectional prefixes. For Bobangi, if each individual agreeing category is reckoned to have an inflectional paradigm, there are seven agreeing categories × (approximately) 14 12 nominal classes=84 further inflected elements, while for Kikongo the figure is (approximately) 16 x 5=80. Note that the verbal agreeing category of subject (the only agreement category for Lingala) has already been included on the list in Table 4.

Against this, Lingala has only one agreeing category, that of the verb, which carries a subject marker agreeing with the subject noun, while Kituba has none at all.

Looking at the fate of morphological categories in these two Bantu creoles, there has occurred a drastic morphological restructuring relative to the lexifiers. This conclusion coincides with that reached by Mufwene (1988, p. 47) for Kituba, though the precise ramifications of this for comparative creole linguistics have not been sufficiently specified. ~5 Theory (4) above predicts that where there is typological convergence in the input languages, the resulting creole will be typologically similar to the source language(s). Clearly in the case of Lingala and Kituba morphology this is not so. What these languages show is that there is operative a rule of morphological simplification in the formation of creoles, which overrides commonalities of the substrate languages. Note that while this rule of precedence is not contradicted by creoles of European language origin, its explicit effect is evident only in non- European creoles, for two reasons. First, the input European languages are not inflectionally rich enough for such a drastic morphological simplification process to be as evident as in the present examples and, secondly, because the creoles based on European languages had a far more diverse typological (substrate+lexifier) input than did Kituba and Lingala and they would, following hypothesis (4) above at least, be less expected to preserve marked input features.

This, however, brings us to two further points. First, Lingala and Kituba themselves show a considerable difference in the degree of retention of inflectional morphology, Lingala having it to a higher degree. 16 This fact runs counter to expectations in two ways. The Kikongo area probably was one of greater language homogeneity than the riverine Babongi area, since Kikongo is a far larger language. Moreover, Kikongo was the language of a kingdom with considerable economic and political power, whereas the Babongi were riverine traders. In these linguistic and social terms one might have expected Kituba to have retained more of the Kikongo-like inflectional structure.~7

Secondly, thus far I have spoken of the lexifier sources and only of the Bantu substrates whose speakers were present in the formation of Lingala and Kituba. As summarized above, however, Samarin has shown that a significant demographic presence of non-Bantu-speaking West Africans was instrumental in the formation of these two creoles. What linguistic effect, however, did their presence have on the present languages? As Samarin indicates (1991, p. 69), outside of the lexical realm it was negligible. Moreover, there are no significant traits in either language which have been demonstrated to derive from a West African language (or European language). Only Swahili has made an impact, largely in the lexical realm. It appears that speakers of West African languages, as well as the other foreigners in the contact situation who also spoke typologically very different languages, the Europeans, Is were the ones instrumental in breaking down the typical Bantu morphological system. Their need to acquire a stable communicative vehicle quickly did not allow (or require) mastery of a complicated inflectional

Page 12: Representativeness in the data base: A polemical update for the twenty-first century

124 JONATHAN OWENS

system. Their function must thus be seen as largely a negative one, to break down extant systems. The perspectives to be gained by a more active integration of the African creoles into creole

studies are considerable. In this short sketch three points are significant. First, it allows purportedly universalistic claims to be put into a sharply relativized comparative perspective, as the example of Nubi in relation to Bickerton's bioprogram shows. Secondly, it allows one to gauge the importance of competing factors in creole genesis. In this case it was seen that even when lexifier source language(s) and many of the substrate languages show a high degree of structural congruence, they do not lead to the development of typologically similar morphological systems. While the point has otten been made that creoles are characterized by a loss of morphological complexity, the massive restructuring, or perhaps better, destructuring, evident in the formation of Kituba and Lingala indicates that the social dynamics which lead to creole formation are so powerful that potentially countervailing unifying structural linguistic elements contributed by a significant proportion of the 'donor' population are filtered out. Thirdly, the role of substrate speakers as destroyers of the morphology of the lexical source language is highlighted by the latter two languages (and could be shown for Nubi as well). If this is so for these creoles, might it not also have been the case in the formation of the European-based ones?

4. Sociolinguistics

In this section I would like to address one very specific and one more general problem that arises from lack of representativeness.

4.1. Familiar models, unfamiliar data

The dangers of interpreting data from lesser known languages in terms of models developed on the basis of languages from other areas are well known, though they deserve continual repeating.

Many examples of this kind involve specific interpretations of limited problems, and are easily corrected. A case in point is Labov's (1994, p. 345-346) reading of uvular phonological variation in certain varieties of Arabic in the Levant and Egypt of the type q ~ g ~ k (or ok(?)) ~ ?, such as is found in,

(11) qaal ~ gaal ~ kaal ~ ?aal 'he said'

Labov (1994, p. 295 ft.) is interested in the question of phonological mergers, inter alia the question whether mergers can be unmerged, and it is in this context that the present problem arises. Generally speaking, Labov argues, mergers are irreversible (this is termed Garde's prin- ciple, Labov, 1994, p. 311). The purported existence of such a demerger in Arabic is thus worthy of mention. Filling in implicit details, Labov would contend that at some point Classical Arabic had an inventory of phonemes including q, k and ?and that at some point q merged entirely with one of these sounds. In recent years, however, under the influence of 'Islamic' (not the best description) education, q has been reintroduced into the language via Standard Arabic, which does maintain q. q, however, is used on a variable basis, as in (11). What is particularly notable is that speakers do not hypercorrect, q is reintroduced only in those words which should have q (in Standard Arabic), so that one does not find substitutions of the type qam 'how much?' < kam, since kam derives from etymological *k, not *q. Since the modem language has both q and the sound with which q earlier merged, one can, in some sense (see below) speak of a demerger.

Page 13: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 125

The problem with Labov's introducing this example under the heading of mergers is that it is not clear (as with Labov's 'meat, meet' example) whether a merger ever took place. Labor follows Abdel-Jawad's (1981) account of the development of Arabic, whereby the modem dialects are seen as the direct descendants of Classical Arabic (Abdel-Jawad, 1981, p. 162), a variety described by the Arabic grammarian Sibawaih in the second half of the eighth century. Such a position is inadequate for explaining a number of developments. In the following discussion, for the sake of brevity I will restrict my remarks to the case question ofq and 7. I do not address the question of g and k, simply because giving them adequate treatment (particularly k) takes the discussion unnecessarily far into the intricacies of Arabic dialectology and historical linguistics. A detailed consideration would show that g or k do not fit the demerger model any better than ? does.

According to Abdel-Jawad (1981, p. 166), Old Arabic at some point saw the development (12).

(12a) *q ~ ?

Abdel-Jawad follows Garbell here (1978, p. 211), 2 who places this change in the eleventh- fifteenth century. According to Garbell, *q and CA "7 thereby merged, as in bada? 'begin' < *bada?a and ?aal < *qaala.

(12b) *q

?

This interpretation is implausible, however. According to this view, badaP would be expected to have preserved its ? until today, just as the ? in ?aal (or ltara7 'he burned') has. This, however, is not the case. In those dialects where *q=?, original etymological "7 has disappeared, so that rather than the 'expected' *bada?-ti 'you fsg began', one finds badee-ti. This form has arisen from the loss of 7, leading to the merger of verbs having original final - ' 7 with the class of original final -V verbs, like *bana(a) 'he built'. Modem dialectal bada has merged with the bana conjugation (cf. banee-ti 'you f built'). In fact, this is nearly a universal development among Arabic dialectsJ 9 To assume that CA *q and "7 merged in ? in some dialects raises yet another intractable problem in Labov's treatment of demergers. If such a merger occurred, some explanation has to be found for the subsequent demerger whereby words of the original * ? class, e.g. bada? (< *badaT) lost their 7 (merging in this example with -V final verbs), whereas words which showed the *q> ? shift (12a), like ?aal 'he said' and tiara? (< *qaala, *haraqa) kept their Z This implied development can be represented as follows.

(13 ) proto-Ar merger split (demerger)

• bada?ax ~__da, cf. badee-ti "you began" like banee-ti "you built \

b_ad~ ? /

l~r a 7 /

* baraqa N~"~ hara ~ cf l~ra ?-ti "you fbumed", like katab-ti

"you f wrote"

Page 14: Representativeness in the data base: A polemical update for the twenty-first century

126 JONATHAN OWENS

In fact, the original merger (12b) postulated by Garbell, Abdel-Jawad and Labov probably never took place. Evidence against it comes from two sources. First, one of the very best- documented facets of variation in Old Arabic (see e.g. Rabin, 1951, p. 130) is that even before the original Arabic diaspora (c. AD 630) there were Arabic dialects where *?had disappeared. Early Quranic orthography, originally lacking a glottal stop, reflects precisely this dialect (see e.g. Diem, 1976, p. 256 ft.), and in the Quranic reading traditions there are a number of versions which leave off the glottal stop in most contexts (see Ibn Mujaahid (d. 932), p. 132 ff.). Secondly, with the exception of a small number of areas in central Yemen (see n. 22), original * ? is universally absent from Arabic dialects. Such unitary agreement throughout the Arabic- speaking world supports the supposition that the glottal-stop-less variant was very widespread in spoken Arabic even prior to the Arabic diaspora of the seventh and eighth centuries, and was exported along with the diaspora. This glottal-stop-less dialect equally is the basis of the Jordanian dialects on which Abdel-Jawad based his analysis, none of which today continue etymological ? in native words. Beyond the independent evidence just adduced, assuming that the change "7 ~ ~ (e.g. bada? ~ bada) occurred independently of and probably prior to the change *q ~ ?, avoids the problem of subsequent demerger alluded to in (13) above.

Accepting these arguments, however, obviates the need for Labov's interpretation that 'Amman, then, seems to show a remarkable reversal of the mergers of q with these other phonemes.' (1994, p. 346).

On what terms, then, is one to understand the synchronic variation q ~ ? (~ g ~ k, the latter two left out of this brief discussion)? The key is found in Labov's subsequent discussion of the same type of variation attested in Cairene Arabic (see e.g. Sallam, 1980). He notes that q in modem spoken Arabic is, above all, introduced in a lexically restricted vocabulary. The basic framework for this type of variation (not cited by Labov) is provided by Mitchell (e.g. 1986). Mitchell observes that in the modem Arabic-speaking Middle East, advances in mass education 2° have led to the introduction of a great deal of vocabulary from Standard Arabic, including the introduction of the grammatical and phonological rules associated with this variety. At the same time, speakers continue to have at their disposal their corresponding native dialectal forms. Which forms they actually use, a q or a ? in the word meaning 'he said', for instance, will loosely depend on a range of situational and stylistic factors. Precisely because the basis of the importation of certain forms, like Standard Arabic q, is sanctioned at one diglossic level ensures that this 'high' form will not displace the local variant, but rather will remain (more or less) in free variation with it.

Note that if the forms so introduced correspond to comparable ones in the dialect, effects will be produced which may mimic instances of demerger (suspending the historical interpretation argued for above for the moment). Equally, however, there may be introduced forms and rules which the native dialect never had (see e.g. Bani-Yasin 1987, on agreement rules). This process may be termed 'diglossic borrowing' and it should be kept distinct, conceptually at least, from the type of language change which Labov generally subsumes under merger and demerger. 21

4.2. Norms and dialect leveling

The above is one small, concrete example illustrating the dangers of transferring explanatory models developed to describe one socio-historic context to a quite different set of circumstances. It happens, not surprisingly, that the transfer moves from better-studied Western languages (particularly English, in Labov, 1994), to a less, though still, comparatively speaking, fairly well-studied non-Western one. The problem becomes more acute when general

Page 15: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 127

parameters of explaining sociolinguistic variation are considered. Central to any sociolinguistic study is the question of linguistic norms. It is fair to say, I believe, that most quantitative-based sociolinguistic studies proceed on the assumption that well-defined norms are discernible, linguistic norms often being characterized in terms of standard-vernacular variants. Recently Chambers (1995, p. 52 ff.) has suggested that one mechanism through which linguistic norms are established is via the leveling and homogenizing effects of mobility.

Mobility can be understood in both social and geographical terms, and through his examples Chambers appears to imply that both types of mobility lead to dialect leveling. In regards to social mobility, citing work by Labov, Chambers notes, for example, that upwardly mobile working class speakers tend to use fewer non-standard forms of 6 (i.e. comparatively more than d) than do stable working class speakers. Geographical mobility, on the other hand, appears to be the basis of his observation that various historical documents point to a rapid homogenization of diverse English dialects in the colonial New World. Furthermore, he characterizes a tendency for younger speakers in the Canadian cities of Victoria, Vancouver and Toronto to front diphthongs (to eu, vu or ~eu) as a general feature of the Canadian urban environment. One may further cite work done on Scandinavian cities established after World War II (e.g. Thelander, 1982, on the Swedish city of Burtr~isk), showing that within three generations of their foundation a city-wide dialect had developed, based on a mixture of standard (Swedish or Norwegian) and local dialect features. Urban immigration appears to have a leveling effect that cuts across class, gender and other sociolinguistic categories.

Is it correct, however, as Chambers implies, that mobility is always a leveling force? This proposition may be examined from the perspective of spoken Arabic in Maiduguri, Nigeria.

Maiduguri is the capital of Borno state in NE Nigeria. The urban population of Maiduguri has grown from 41,000 residents in 1950 to between 200,00 and 600,000 today (exact figures are impossible to come by today). A great deal of this increase has been due to immigration, largely from rural areas in Borno and neighboring states of NE Nigeria, but also from Ndjamena in Chad and rural areas of Cameroon and Chad. 22 Immigrants come from a wide spectrum of ethnic and linguistic groups, with native Arabic speakers constituting a not insignificant segment of these. Probably not less than 10% of the Maiduguri population speaks Arabic as a native language, making Arabic one of the larger minority languages in the city.

Mobility is thus an important factor in the establishment of Arabic in Maiduguri. The dependence of Maiduguri Arabic on migration is discernible in the fact that its dialectal varieties may best be understood relative to varieties found outside of the city. Roughly speaking, there are three Arabic dialectal regions contributing to Maiduguri Arabic: one in the western part of Arabic-speaking NE Nigeria, one encompassing the eastern part of Arabic- speaking NE Nigeria, Cameroon and parts of Chad (termed eastern Nigerian Arabic for short), and a third, western Chad, including its capital Ndjamena. The approximate position of these three areas is shown in Fig. 1. All three of these have found their way into Maiduguri, tending to reconstitute themselves in different Maiduguri neighborhoods. The process of 'dialect reconstitution' can be illustrated on the basis of a single linguistic feature. The expression for lpl in the imperfect verb is marked either by n- alone, or by the discontinuous morpheme n-. . .-u,

as in n - i m f i ~ n - i m f - u 'we go' or n-ak tub ~ n -ak tub -u 'we write'. In rural Nigerian Arabic the n- form is nearly categorical. In a sample of texts (c. 100,000 words) from 52 speakers in 22 rural Nigerian villages the n- form has a 99% incidence (the eastern and western dialects being undifferentiated for this feature). Somewhere in northern Cameroon - - precisely where requires independent research --- runs the isogloss separating the n- ~ n-...-u variety, with n-...-u lying to the east of the line. I know of no quantitatively based studies of Arabic dialectology in

Page 16: Representativeness in the data base: A polemical update for the twenty-first century

128 JONATHAN OWENS

C H A D I.OkB C~~ .................................................................

o Ndjamena Kano • ni Maidugurl

NIGERIA i

...

i / / .......

. . ' /

S

/ ...................... J

j "

Fig. 1. Arabic dialects in the Lake Chad area.

Chad, though all grammars of Chadian Arabic which I am acquainted with report the n-. . . -u

form exclusively. Text-based samples (c. 220,000 words) of 58 speakers were made of Arabic in Maiduguri,

in ter al ia concentrating on the question of neighborhood differences. Three areas in particular were singled out, each of which tended to be settled by speakers from the three different dialect areas outside of Maiduguri. Two of these neighborhoods (Gwange, Gamboru) were settled pre- dominantly by rural Nigerians originating either from the western or the eastern Nigerian Arabic dialect region. The third, Ruwan Zafi, was settled largely by immigrants from Chad. I will refer to the areas of emigration for these three Maiduguri neighborhoods as their ancestral areas. The percentage of n- ~ n-...-u forms tends to reflect these different settlement patterns. In Table 5 I have written in parentheses the ancestral source area for each of the three Maiduguri residential neighborhoods. A score of 100% means that the n- form is used exclusively, 0% the n-. . .-u.

The Maiduguri percentages tend to replicate the usage found in the immigrants' ancestral source areas: the two areas populated by immigrants from Nigeria, Gamboru and Gwange, have predominantly the rural Nigeria n- variant, whereas Ruwan Zafi, an area dominated by speakers of Chadian ancestry, have predominantly n-...-u. Beyond these tendential correspondences, it is interesting to note that Gamboru and Gwange have lower percentages of n-, Ruwan Zafi lower percentages of n-...-u than their respective ancestral areas. In Gwange in particular, this is due in part to the fact that two of the speakers have a Cameroonian ancestry, one having been born there, and may therefore have come originally from a n-. . . -u area. As will be seen below, however, other factors are at work as well.

Table 5. lpl marking in imperfect verb among four groups of speakers

Rural Nigeria 99% Maiduguri: Gamborn (western Borno) 82% Gwange (eastern Borno) 69% Ruwan Zafi (Chad) 26%

Page 17: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 129

Returning to Chamber's observation that mobility tends to level dialects, note that the percentages in Table 5 imply a maintenance of heterogeneity. This follows from the fact that two Maiduguri neighborhoods whose ancestral lpl imperfect value is n- tend to increase the alternative n-...-u value. The neighborhood with a Chadian ancestry, on the other hand, tends to increase its non-ancestral n- frequency.

The Maiduguri data discussed thus far is based on formal interviews conducted in various parts of the city with the aim of gaining an overview of the significant forms of Arabic found in Maiduguri. In addition more detailed surveys were taken of various Arabic households, generally in the form of informal conversations between the household members. One large compound is of note in this context. It is surrounded by a wall with a large open courtyard in the middle big enough to comfortably accommodate a number of cattle and goats. Inside the outer compound wall are a further five residences.

The current residents first began settling there in 1979 on a (then) open site owned by a prominent Maiduguri Arab (and relative), whose house is about 150 yards away. The composition of the compound was observed over a five-year period and underwent a certain amount of change, though essentially it is built around five different sub-compounds, each of which, roughly, is based on one (in one case, two) nuclear family (husband+co-wives). Kinship is the basis of settlement, three of the five compounds being related to each other, though the relationship is for some a distant one, being traced through a maternal aunt of the owner of the land on which the large compound stands. The common ancestor binding the families flourished around 1900 in a village SW of Lake Chad. Two of the three families moved to Maiduguri directly from a rural Nigerian village. The scion of the third (now deceased) went to Ndjamena and became a successful cattle trader before moving to the present compound in 1979 (compound 3 in Table 6). His relatives continue to maintain a home in Ndjamena, besides their Maiduguri residence, and the children and wives in this compound continue to live between the two cities. This Maiduguri residence is thus linguistically interesting for the fact that speakers who potentially acquired Ndjamena (Chadian) variants live in close proximity to those whose ancestors spoke a Nigerian one.

In addition, three further Arab families were given houses in the compound (living in compounds 1 and 5 in Table 6) through their friendship with the compound owner. All of these non-kin residents had relations with the present kin residents of the compound and/or its owner before 1979, some of them already having lived together with the kin residents in an earlier Maiduguri residence, others tracing their relationship to their earlier common rural residence. Table 6 summarizes the number of residents in the five sub-compounds.

It should be noted that all but one of the residents of the compounds is a native Arabic speaker. In all, 37 different speakers were recorded from these compounds.

From the brief biographical sketch it might be expected that compound 3 would show a

Table 6. Population of five sub-compounds

Sub-compound 1990 1991 1992 1993 1994 1995

1 (friend of owner) 7 8 9 7 6 6 2 (brother of owner) I 1 1 | 11 10 11 12 3 (cousin of owner) 11 11 11 29 26 26 4 (brother of owner) 8 8 8 5 4 5 5 (friends of owner, two families) 17 17 17 18 18 18

To~ls 44 44 45 69 65 57

Page 18: Representativeness in the data base: A polemical update for the twenty-first century

130 JONATHAN OWENS

Table 7. %n- lpl imperfect verb

Compound 1 76% Compound 2 98% Compound 3 30% Compound 4 72% Compound 5 96%

higher percentage of the Chadian n - . . . - u variant, and indeed this expectation is borne out, as the percentages in Table 7 indicate.

What is of greater interest than the retention of the n - . . . - u variant in compound 3 is the relatively lower percentages of the n- variant in compounds 1 and 4. Recall that residents in both of these compounds have ancestries relating to rural Nigerian Arabic, which as seen above is nearly categorically n-. The explanation for them is that in the compounds live two girls, aged 10-13 at the times recordings of them were made. Each of them strongly tend towards the n-...-u form. One of the girls was recorded on six occasions, the other on two, so the possibility that the frequencies are a one-off aberration is small. Strikingly, the behavior of the two girls in respect of this variable can be contrasted with that of their older brothers, who categorically use the n- form (see Table 8).

A post hoc explanation for this pattern may be proposed. It appears that there existed a girls' network in the larger compound, which, besides the two girls summarized in Table 8 included three daughters from compound 3, the 'Ndjamena' compound. Their speech probably had a certain status since they were slightly older than the two girls summarized here (two were married during the course of the study).

Whatever the explanation, the detailed look at one feature from the present compound allows two points to be made relative to the major themes of this section. First, as already seen above, mobility does not automatically lead to dialect leveling. Even speakers living in fairly intimate contact with one another are seen here to maintain inherited dialect differences. Two tentative conclusions may be drawn. Either mobility is not an independent variable at all in determining dialect leveling, or it is, but only in conjunction with other still to be defined factors. Secondly, it indicates that the search for Maiduguri-wide norms among Nigerian Arabs has to begin at a local level, and probably will not yield to a simple dichotomous contrast phrased, for example, in terms of prestige vs. non-prestige variants, let alone standard-vernacular. The case of the present compound helps substantiate this supposition. The compound lies within the Gamboru area of Maiduguri, an area, as seen in Table 5 above, where the n- form predominates. Local conditions within the compound, however, allow the n - . . . - u variant to have diffused to two speakers who, given their family history, would not be expected to use it. However, the restricted nature of this development - - it is restricted even within the compound - - suggests that marked local contrasts will continue to be a feature of the Maiduguri Arabic landscape for the foreseeable future. 23

To conclude this section, it may be emphasized that sociolinguists have more than an altruistic interest in looking at linguistic variation in non-Western societies. If one facet of

Table 8. %n- for four siblings

Sister Brother

Compound l 2% 100% Compound 5 12% 100%

Page 19: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 131

sociolinguistics is, as Labov would have it, to explain linguistic change, then one has to entertain the possibility that in the past, societies in the West had a texture of linguistic variation which was more like that which can be observed in Maiduguri today, in the sense that well-defined linguistic norms encompassing the entire speech community were weak or non- existent. This indeed is implied in at least some work on variation in Western societies. Mougeon and Beniak (1991, p. 156) find that fluctuations in the use o f j e vais ~ j e vas 'I go' in Canadian French do not correlate consistently with their standard set of sociolinguistic categories (age, sex, social class, degree of knowledge of English etc.). They point out that the j e vais ~ j e vas variation was endemic to pre-revolutionary French everywhere and that the variation was brought to Canada with the original French migrants. When French Canada was lost to France, the standardizing forces and institutions which effaced (inter alia) theje vais ~ j e

vas variation in metropolitan French in the aftermath of the French Revolution had ceased to exist in Canada, which allowed the variation to continue.

If the j e vais ~ j e vas type of variation in Canadian French is a socio-historical relic of past migrations, it is a basic feature of Maiduguri Arabic, and perhaps of certain third-world urban landscapes in general, n- ~ n-....-u variation in Maiduguri may correlate most strongly with neighborhood; the variation is found throughout the city, however, and as seen above, often has a dynamic that can only be understood at a very local level. A basic consideration therefore arising from this discussion is that socio-historical processes which may have been operative in Western languages in the past, are no longer so today, and will primarily, sometimes exclusively, be observable in a detailed way in various non-Western societies.

The assumption which underlies this suggestion is the uniformitarian principle, the assumption that past processes may be inferred from observing the present (Labov, 1994, p. 21). If, however, the present socio-technological environment in Western societies is markedly different from those in the same geographical areas two hundred or four hundred years ago, it follows from the uniformitarian principle that a great deal of past linguistic reality will be filtered out and lost by reconstructions based on contemporary realities. 24

5. Conclusion

I have suggested that a bias towards linguistic issues in Western societies (see Tables 1 and 2) can lead to various types of distortions in the linguistic data. Hypotheses may be tested with a less than maximally diverse set of data (section 3), explanatory models which work in the West may misleadingly be assumed to be valid elsewhere (4.1), and universals of linguistic behavior may turn out to have less than universal scope (4.2).

Is this a problem? I think it is, though would divide the 'yes' into two distinctive aspects. On the one hand, one may say it is, but there probably isn't much that can be done about it. If

the simple statistical tests in Tables 1 and 2 above were expanded to include related disciplines in the humanities like political science or sociology the same sort of Western bias would likely be found, if not worse. Nor are the causes difficult to detect. The journals cited here serve all linguists, including those in language departments. The demographics of academia simply dictate that much more will be written about English than about Kituba, because there are more people employed to write about English. This doubtlessly explains the heavy creolist bias towards languages with European lexifiers. While a PhD about Jamaican Creole can be gained in an English (or Linguistics) Department, one on Kituba would have to come from a much smaller range of institutions (Linguistics, Africa Department, those few that exist), and very likely the employment prospects of the latter (in the West) would be correspondingly poorer.

Page 20: Representativeness in the data base: A polemical update for the twenty-first century

132 JONATHAN OWENS

In this context a pecking order is discernible among languages which command academic attention. English is clearly at the top, followed by various European languages, Japanese and increasingly Chinese. Thereafter the pecking order continues to unfold among regionally dominant languages. In the LS statistics cited in Table 2, the Middle East languages written about are, predictably, Arabic, Hebrew and Iranian. The rule is simple: the smaller (a language can be small for various reasons) the language, the less likely it will be written about, and from the fewer thematic perspectives. The Middle Eastern languages Kurdish, modern Aramaic and Mehri may gain the attention of the odd grammar, but a quantitatively oriented sociolinguistic study is most unlikely, while politeness formulas if written about at all will be the task of the anthropologist, but then from an anthropological, not a linguistic perspective in all likelihood.

On the other hand, such considerations do not ameliorate the fact that linguistics, a discipline presumably committed to the study of general principles of language, and not to principles which emerge from a biased subset thereof, can hardly claim to be general when its own data base is seriously unrepresentative. I attempted to indicate from my own limited perspective some negative consequences of this situation, and certainly other linguists with more experience than myself with 'exotic' languages could amplify on them.

If there is a problem, I would not suggest that there is an obvious solution, or at least not one which linguists can arrive at. Quotas, for example, are unworkable. It is impossible to decide objectively whether it is more worthwhile to publish, say, about the language of doctor-patient interaction (in English) than about West African naming practices, and even if quotas were instituted on some arbitrary basis, there would be no guarantee that material on the 'right' subject would be forthcoming.

If this is so, however, one may ask what the point of the current article is. Simply this: all pessimism aside, it is worthwhile for linguists to remind themselves occasionally that general laws and principles in many cases are seriously constrained by a limited and unrepresentative comparative perspective. If there are no solutions for the most intractable of these problems, having a clear awareness of them may at least lead to an ameliorization of some of them.

I began this article referring to Giv6n's criticisms of then-prevailing linguistic practices. Of course, Giv6n's own criticism already reflected a growing interest in the very topics which he claimed had not been allowed into the paradigm: pragmatics, text linguistics, semantics and so on. The nearly twenty years intervening since have seen these aspects of linguistics find their own niche in the discipline. Moreover, the development of the English-based generative paradigm has not always been detrimental to more exotic languages: when grammars on them are written the range of data addressed is often broader than what one would have found in the pre-Chomskyian mode. Might not the same happen with the sub-disciplines which I have considered?

In some cases the answer will eventually be positive. It is not in principle difficult to gather data on African creole languages and integrate the findings into the creole genesis debate. Sociolinguistic material on 'exotic' languages, unfortunately, is harder to gather. Whereas a grammar of a language can be written wherever a speaker of it is found, sociolinguistics usually has to be practiced in situ, and assumes a good knowledge both of the language and the society it is spoken in, which implies time and money. Given the limited institutional support which the smaller languages can expect, rapidly expanding the data base to include them will be difficult. Institutionalizing co-operation with local research centers may help matters, though very often the bulk of such support will have to come from the outside. Lacking such initiatives, however, general linguists will humbly have to recognize the limits to what their discipline can offer.

Page 21: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 133

Acknowledgements--The author would like to thank Piweew Larcher, Kees Vcrsteegh and an anonymous Language Sciences reader for their critical remarks on an earlier version of this paper, None, of course, are responsible for any shortcomings in the paper. The work on Nigerian Arabic was sponsored by the German Research Foundation (DFC, SFB 214).

N O T E S

IGivrn was attacking in particular the inheritors of American structuralism, Chomskyism as it were, and the linguistic world in 1979 was certainly more than Chomskyism (cf. e.g. Halliday's series of three articles on transitivity and theme, Halliday 1968, 1969). 2South America includes South and Central America and the Caribbean. A few general articles with no language or geographical focus (e.g. one on code switching by Scotton) were left out. Half points were given when the article dealt with two languages in widely different regions (e.g. Israeli Hebrew and American English). One article on Panjabi in London was left unclassified since it neither fit into the above schema (immigrant languages in the West, for example), nor merited a new category of its own. One article on English (included under European languages) is on South African English. 3This of course is not to say that there aren't other interesting questions relating to them, though probably the only feature distinguishing all creole languages is their unusual origin, languages which develop from nothing, as it were, into full-fledged native languages in the period of three or four generations or less. 4The statement that the non-European-based creoles developed in different socio-historical circumstances has to be taken with caution. Just as there are no large-scale studies covering the linguistics of all creole languages, there are also no comparative studies on their socio-historical emergence. Certainly all known creole languages developed after, most in fact in one way or another in response to, the era of European colonization. This statement applies as much to non- European-based creoles such as East African Nubi, Sango (Samarin, 1982) and Lingala as it does to the European-based o n e s .

STroe to form, in a recent compendium on pidgin and creole languages (Arends et al., 1995) the structure of eight pidgin or creole languages are summarized (pp. 137-246), the only language referred to as a restructured x is Shaba Swahili ( 'the restructured variety of Swahili spoken on the copperbelt of Zaire's south-eastern Shaba province', p. 179). The other seven languages are referred to either as a pidgin (pidgin Eskimo) or a creole (Haitian=French lexifier, Sammaccan=Engiish lexifier, Fa d'Ambu=Portuguese lexifer, Papiametu=Portuguese/Spanish lexifier, Sranan=English lexifler, Berbice Dutch=Dutch lexifier). ~Strictly speaking, Nubi has no contact with Arabic and forms no continuum with it. Juba Arabic, however, does coexist with Arabic in such a way; see Mahmud (1979). 7I leave out of discussion the question whether or not Lubumbashi Swahili is a creole or not. SKnappert (1979, p. 158) reports that 56% of the words in Guthrie' Lingala dictionary are also found in Whitebead's Bobangi dictionary. Other languages he compared Lingala vocabulary with, e.g. Ngombe at 10% and Lomongo 1.3% have a far weaker affinity. I use Whitehead's term Bobangi for the language. The Bantu class name for the language would be either Kibangi or Lobangi. Miehe (1981) contains a thorough discussion of the etymology of the word Lingala and related terms (Mongaia, Bangala, Ngala). A discussion of its genetic classification, following a traditional Bantuist approach, is also found there. Some scholars (e.g. Kuappert) equate Kiyansi with Bobangi. Mufwene (1994, p. 68) reports that they are different languages, however. 9An unanswered, and probably unanswerable, debate exists as to when the ereolization set in, in particular whether it antedated the period of European colonization. I find Samarin's arguments most compelling (summarized in text), though others (e.g. Fehderau, 1966, p. 99 for Kituba, Mufwene, 1990 for Kituba and perhaps Lingala) would see at least the existence of a koine form of the source language (same sources as mentioned) as being the ultimate precursor of the creole. Both Kituba and Lingala are creoles today, particularly in urban settings, though large-scaie creolization is apparently a post-independence phenomenon. Samarin (1991, p. 72) considers it possible that Lingala was creolized l~e. a native language for some)in the colonial era.

Guthrie is not always clear as to which of his units are dialects of a 'cluster' and which are separate languages. It could of course be that he did not have adequate information to always make a clear decision l~Yanga's description is far less complete than Guthrie's. It would appear that Guthrie's Lingala would correspond to what Yanga terms the pidgin stage of Lingala (1980, p. 170), though there are differences. Guthrie, for instance, has no morphological N-Adj agreement whereas Yanga has number agreement for these categories. 12Miehe (1981, p. 62) reports that an earlier Lingala (or Bangala) grammar by Stapleton (1914), which I have not seen, shows almost perfect agreement with Guthrie' grammar. 13For instance, Bobangi has a category of possessor meaning 'belonging to' which has two members, either -lila" or -mbo. laThe 12 distinctive nominal classes are not always represented by 12 distinctive inflections on their agreeing categories, though the number of distinctive inflections does not fall below 10. 15Mufwene (1994, p. 69 ft.) takes a different tack. He observes that Bantu languages in the area provide potential models for certain 'creole-like' elements in Kituha/Lingala. For example, Mufwene compares the Kituba use of an independent subject pronoun to the Kimanyanga (Kikongo dialect) alternative of emphasizing a subject through repetition of an independent pronoun, e.g. bu n-tel-de mono 'this I-tell-perf I' 'This is what I personally said'. While

Page 22: Representativeness in the data base: A polemical update for the twenty-first century

134 JONATHAN OWENS

observations such as these may explain the appearance of certain patterns in Kituba, they do not account for the disappearance of the rich morphological system in the creole. ~6Non-parametfic chi-square tests comparing the totals in Tables 4 and 5 give significances of 0.028 (not counting the agreeing category) and 0.021 (counting it) between Lingala and Kituba. Differences between the non-creoles and creoles are all at the 0.0000 level.

Thomason and Kaufman (1988 p. 184) have suggested that a common typological substrate has led to a lesser degree of restructuring for Kituba than for many other PCs. Unfortunately, they do not suggest general parameters by which 'degree of restructuring' can be measured, so the question remains an undefined one. One aspect of the answer, however, has been defined here, namely the large degree to which Kituba has been simplified morphologically relative to its lexical source. 17Of course it is difficult to read a detailed socio-historical interpretation into pure linguistic data. It could have been that the Bakongo encouraged a differentiated Kituba to keep foreigners at arm's length from their social and political life. ISSamafin, perhaps rightly, downplays the role of Europeans in the creolization process in the early history of the Congo Free State. It is remarkable, however, that it is in former French-speaking colonies that the most important African creoles emerged (Lingala, Kituba, perhaps Lubumbashi Swahili in the Congo, Sango in the Central African Republic). Only East African Nubi, associated mainly with Arab-African contact, is an exception. 19Indeed, it is only with the publication of Behnstedt's dialect map of Yemen, showing that etymological ? has been preserved in about 3-11 surveying points (most in the highland region, Behnstedt, 1985, pp. 135, 156, 158; the matter is complicated enough to require further discussion for an adequate summary), that the universality of the disappearance of etymological * ? in the dialects has been disproved. 2°Labov's attribution of linguistic changes to 'Islamic' schooling is off the mark. Variation with q is above all, as Mitchell correctly notes, characteristic of educated Arabic speakers. Whether greater use of q marks a certain 'Islamic' religious orientation (certainly a possibility) requires independent investigation. 2~A hypothetical exposition elucidates this. A situation can be imagined whereby fourteenth century English becomes accepted as Standard English (e.g. a group of language 'purifiers' manage to capture the educational system of English- speaking lands). 'Meat' and "meet" thus receive their original distinctive pronunciations, meet/meet (Labov, 1994, p. 296). At the same time, in the spoken colloquial people continue to use the historically merged variety. If the fourteenth century variety never manages to displace the colloquial, it is doubtful whether grammarians of English (to the extent they would be allowed a free opinion) would speak of demerger. 22In the urban sample illustrated below, no speaker above the age of 30 was born in Maiduguri itself. 23In a recent study, Johnstone (1996) argues for a sociolinguistics which highlights the diversity of individual expression as opposed to group-orientated generalizations. The problem I focus on here is that in certain societies the nature of group-orientated generalizations may differ significantly from what has become familiar in the soeiolinguistic literature on the basis of studies carried out in Western societies. 24Labov (1994, p. 22) duly notes the shitt of American society from rural to urban, with attendant consequences for linguistic change, though ignores the problems which reconstructing linguistic developments in rural agrarian America (say pre-1990) on the basis of observations in urban late twentieth century America may bring. Rather, he turns necessity into a virtue, claiming that ' . . . there is good reason to think that cities have always been the center of linguistic innovation, and that most rural dialects are relics of developments that began in the cities and spread to progressively smaller speech communities until they reached the countryside' (1994, p. 23). This may be so for the history of English in America, or it may not be. Whether this is and always has been the case for language generally is, I believe, doubtful. A more measured approach is to assume that linguistic innovation can occur anywhere and spread from anywhere to anywhere.

R E F E R E N C E S

Abdel-Jawad, H. (1981) Lexical and phonological variation in spoken Arabic in Amman. Ph.D. thesis, University of Pennsylvania.

Arands, J., Muysken, P. and Smith, N. (Eds) (1995) Pidgins and Creoles: An Introduction. Benjamins, Amsterdam. Bani-Yasin, R. and Owens J. (1987) The lexieal basis of variation in Jordanian Arabic. Linguistics 25, 705-738. Behnstedt, P. (1985) Die Jemenitischen Dialekte. Reichert, Wiesbaden. Bickerton, D. (1981) The Roots of Language. Karoma, Ann Arbor, MI. Chambers, J. (1995) Sociolinguistics. Cambridge University Press, Cambridge, UK. Diem, W. (1976) Some glimpses at the rise and early development of the Arabic orthography. Orientalia 45, 251--61. Fasold, R. (1990) The Sociolinguistics of Language. Blackwell, Oxford. Fehderau, H. (1966) The origin and development of KJtuba. University microfilms, Ann Arbor, MI. Giv6n, T. (1979) On Understanding Grammar. Academic Press, London. Guthrie, M. (1935) Lingala Grammar and Dictionary. Conseil Protestant du Congo, L~opoldville-Ouest. Guthrie, M. (1943) The Lingua Franca of the Middle Congo. Africa 14, 118-123. Guthrie, M. (1953) The Bantu Languages of Western Equatorial Africa. Oxford University Press, London. Hall, R. (1974) Pidgin and Creole Languages. Comell University Press, Ithaca, NY.

Page 23: Representativeness in the data base: A polemical update for the twenty-first century

REPRESENTATIVENESS IN THE DATA BASE 135

Halliday, M. (1967-68) Notes on transitivity and theme in English. Journal of Linguistics 3, 37-81,199-244; 4, 179- 216.

Hancock, I. (ed.) (1979) Readings in Creole Linguistics. Storia Scientia, Ghent. Hoehegger, H. (1981) Grammaire du Kikongo ya Leta. Ceba, Bandundu. Holm, J. (1989) Pidgins and Creoles, Vol 2. Cambridge University Press, Cambridge, UK. lbn Mujaahid. Kitaab al-SabTa fly al-Qiraa?aat. Shawqi Dayf (Ed.), Cairo, Dar al-MaTaarif. Jacquot, A. (1961) Sur la Situation du Sango ~ Bangui. Africa 31, 158-66. Jacquot, A. (1982) Etude Descriptive de la Langue Laadi. Universit6 de Lille, Lille. Johnstone, B. (1996) The Linguistic Individual Oxford University Press, Oxford. Knappert, J. (1979) Origin and development of Lingala. In Hancock. I. (Ed.), 153-164. Labov, W. (1994) Principles of Linguistic Change: Internal Factors. Blackwell. Oxford. Lefebvre, C. (1986) Relexification in Creole genesis revisited: the case of Haitian Creole. In Muysken, P. and Smith, N.

(Eds), Substrate Versus Universals in Creole Genesis, pp. 279-300, Benjamins, Amsterdam. Mahmud, U. (1979) Variation in the aspectual system of Juba Arabic. Ph.D. thesis, Georgetown University. Miehe, G. (1981) Anmerkungen zum Begriff und zur Klassifikation des Lingala. In Jungraithmayr H. (Ed.), Berliner

Afrikanistische Vortr?ige, pp. 53-79, Reimer, Berlin. Mitchell, T. (1986) What is educated spoken Arabic? International Journal of the Sociology of Language 61, 7-32. Mougeon, R. and Beniak, I~. (1991) Linguistic Consequences of Language Contact and Restriction. Oxford University

Press, Oxford. Mufwene, S. (1988) Formal evidence of pidginization/creolization in Kituba. Journal of Afrian Languages and

Linguistics 10, 33-52. Mufwene, S. (1990) La Cr6olisation en Bantou: Les Cas du Kituba, du Lingala Urbain, et du Swahili du Shaba. MS. Mufwene, S. (1994) Restructuring, feature selection, and markedness: from Kimanyanga to Kituba. In Moore, K.,

Peterson, D. and Wentum, C. (Eds), Proceedings of the Twentieth Annual Meeting of the Berkeley Linguistics Society, pp. 67-89. Berkeley Linguistics Society. Berkeley, CA.

Mufwene, S. (1996) The founder principle in Creole genesis. Diachronica 13, 83-134. Mufwene, S. (1997) Kit6ba. In Thomason, S. (Ed.), Contact Languages: A Wider Perspective, pp. 173-208. Benjamins,

Amsterdam. Odhner, J. (1981) English-Lingala Manual University Press of America, Washington, DC. Owens, J. (1990) East African Nubi: bioprogram vs. inheritance. Diachronica 7, 217-250. Owens, J. (1991) Nubi, genetic linguistics and language classification. Anthropological Linguistics 33, 1-30. Owens, J. (1995) Minority languages and urban norms: a case study. Linguistics 33, 305-358. Rabin, C. (1951) Ancient West Arabian. Taylor's Foreign Press, London. Rottland, F. (1979) Free variation in the Concord system of written Lingala. In Hancock, 1. (Ed.), pp. 165-174. Sallam, A. (1980) Phonological variation in educated spoken Arabic. Bulletin of the School of Oriental and African

Studies 43, 77-100. Samarin, W. (1982) Goals, role and language skills in colonizing Central Equatorial Africa. Anthropological Linguistics

24, 412-422. Samarin, W. (1991) The origins of Kituba and Lingala. Journal of African Languages and Linguistics 12, 47-77. Swirl, I. and Zola F. (1963) Kituba. Foreign Service Institute, Washington. Sylvain, S. (1936) Le Crbole Haitien: Morphologie et Syntaxe. Wetteren, Port-au-Prince. Thelander, M. (1982) A qualitative approach to the quantitative data of speech variation. In Romaine, S. (Ed.),

Sociolinguistic Variation in Speech Communities, pp. 65-83. Arnold, London. Thomason, S. and Kaufman T. (1988) Language Contact, Creolization, and Genetic Linguistics. University of

California Press, Berkeley, CA. Whitehead, J. (1899) Grammar and Dictionary of the Bobangi Language. London. Yanga, T. (1980) A sociolinguistic identification of Lingala. Ph.D. thesis, University of Texas.