crossroads corpus manual · crossroads area; two from djibonker, one from brin, one from essil, and...

71
Crossroads Corpus Manual Abbie Hantgan-Sonko March 2, 2017 Contents Contents 1 1 Workflow 3 1.1 Outline of Workflow .................................... 3 1.2 Timeline ........................................... 3 2 Overview 5 2.1 Proportionality ....................................... 8 3 File Incorporation 15 3.1 New Recording Sessions .................................. 15 3.2 Existing Recording Sessions ................................ 18 3.3 Corpus Ready ........................................ 21 4 Searches 27 4.1 Variables .......................................... 27 5 Results 33 5.1 Vignettes .......................................... 33 5.2 Charts ............................................ 37 6 Sample Design 39 6.1 Sampling Based on Metadata ............................... 39 6.2 Sampling Based on (Transcribed) Data .......................... 39 7 Interlinearisation 41 7.1 Monolingual, Single Participant Text ........................... 41 7.2 Multilingual, Multi-participant Text ........................... 45 7.3 Pre-existing Data ...................................... 47 1

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Crossroads Corpus Manual

Abbie Hantgan-Sonko

March 2, 2017

Contents

Contents 1

1 Workflow 31.1 Outline of Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Overview 52.1 Proportionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 File Incorporation 153.1 New Recording Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Existing Recording Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Corpus Ready . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Searches 274.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Results 335.1 Vignettes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2 Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Sample Design 396.1 Sampling Based on Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.2 Sampling Based on (Transcribed) Data . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Interlinearisation 417.1 Monolingual, Single Participant Text . . . . . . . . . . . . . . . . . . . . . . . . . . . 417.2 Multilingual, Multi-participant Text . . . . . . . . . . . . . . . . . . . . . . . . . . . 457.3 Pre-existing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1

Page 2: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

2 CONTENTS

8 How to 538.1 Export Arbil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538.2 LAMUS Upload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548.3 Number of Hours of Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

9 External Links 61

10 Troubleshooting 6310.1 FLEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A List of .xlxs files in Crossroads Corpus 65

B List of .spss files in Crossroads Corpus 69

References 71

Page 3: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 1

Workflow

1.1 Outline of Workflow

Collect Backup Metadata Copy Analyse Archive

1. Record following either Elicitation methods or SNA Manual2. Convert files using Data Management Workflow and associated Fiches Techniques3. Back up original files to your external hard drive using Free File Sync4. Complete metadata in Arbil according to SNA Manual5. Prepare Elan files using Segment and Splice and give working copy to transcribers6. Bring back transcribed Elan files on dedicated transport hard drive7. Transfer all files to Q-drive and then yours to Y-drive according to Data Management Work-

flow8. Analyse data according to your area9. Update Arbil and export to Y drive according to Ideal Arbil Workflow10. Upload to LAMUS

1.2 Timeline

Every two months with alerts in calendar!

3

Page 4: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 5: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 2

Overview

The corpus includes data gathered during fieldwork conducted by the Crossroads project’s sixresearchers since December 2014. Additionally incorporated into these data are those gathered bytwo post-doctoral researchers during fieldwork for their Ph.D. theses in the area since 2008. Thecorpus is comprised of 516 recording sessions, encompassing just over 100 hours (101:39:17 of audioand 47:00:03 hours accompanying video).

The working corpus is part of a larger body of data which were gathered, not only during thecurrent project’s focus among the three villages, but also those collected by the project’s principleinvestigator since 2008 from a different area of Casamance which are to be used as control datafor comparison. Also included in the larger corpus are those data collected by a resident of TheKingdom which were gathered for his thesis and are archived at ELAR (Sagna, 2008). The datawithin the larger corpus in their entirety add up to 306 hours of audio recordings (.wav format)and 119 hours of video (.mp4 format), 134 (52 hours video and 82 hours audio) hours of whichhave been transcribed, representing a total of 621 transcribed and translated Elan files.

All of the recordings in the corpus have been transcribed by a team of five residents of thecrossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essilbut currently lives in Djibonker. In addition to providing meticulous transcriptions, time-alignedby utterance, the transcribers also annotate each utterance with an identification of language andparticipant. This is particularly essential for multilingual, naturally occurring conversations; see §2below.

Organisation

The team of trained crossroads-based transcribers transcribe, translate, and tag speaker and lan-guage information for the sound and/or video files produced during fieldwork by the project re-searchers using ELAN. Upon transcription completion, the files are transferred to the project’shome university for the corpus manager and assistants to integrate into the corpus structure.

The corpus structure is organised by genre, as depicted in §2 below, and then by session. Eachrecording session in the corpus consists of a folder labelled with a mnemonic that indicates thelocation in which the recording took place, the date, and the researcher’s initials. Each folderconsists minimally of a sound file and its transcription (ideally with a video file as well), togetherwith a metadata file which depicts the language(s) spoken in the recording and the participants inthe recording, their age, gender, reported languages, and ethnicity/village.

Elan files constitute a bundle within themselves: the transcribed file itself is linked to mediafiles (audio and/or visual), and a metadata file. Due to restrictions on what may be linked to Elanand uploaded to the remote server, see §2, all video files are converted to .mp4 and audio files to

5

Page 6: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

6 CHAPTER 2. OVERVIEW

.wav. Metadata files are created in Arbil, a program developed by the Max Planck Institute forPsycholinguistics to create metadata files. With the assistance of three research assistants, thecorpus manager links the metadata files directly to the transcribed Elan file for ease of informationretrieval.

Certain Elan files have been exported for interlinearisation and incorporation into the 5723lexical-entry multilingual dictionary, compiled using the project-team databases, by the author,using the SIL lexical database program, Fieldworks, otherwise known as FLEx. These files, oncere-imported and time-aligned, can be searched at the level of the morpheme or part of speech.

Location

The corpus is housed on a university server which is accessible to each project researcher. Aremote copy of the corpus is located on the SOAS Endangered Language Archive (ELAR) LATserver. Paralleling the local corpus copy’s non-hierarchical folder structure, each researcher has hisor her own node in which is found the recording (audio and/or video), transcription, and metadatafiles.

Partition

In order to facilitate the use of the corpus, sessions are partitioned into one of four communicativeevents (cf. (Himmelmann, 1998)). Depictions of the content of each partition in the followingsubsections

Observed Communicative Events

Observed communicative events in the corpus include the Social Network Study and SociolinguisticStudy of Multilingualism.

1. Social Network StudyOngoing since November 2016, two participants from each of the three villages are recorded foran entire day as they go about their normal activities and interactions through the use of aportable digital recorder. The researcher who initiated the recording then performs a debriefwith the social network study participant to ask him/her with whom s/he interacted during theday and if there were any sections of the recording to be omitted for ethical or confidentialityreasons. These recordings are spliced into ten-minute cuts which are then given to the transcriberbest suited for the languages included in the recording.

2. Sociolinguistic Study of MultilingualismTwo project members who are currently conducting research for their Ph.D. theses are alsofocusing on recordings gathering in naturalistic settings such as within the household, outside oflocal shops, and during work in the rice fields and house construction. These recordings are alsooften accompanied by video recordings which are beneficial in gaining knowledge of turn-takingand other subtleties of multilingual conversations not captured by audio content alone.

Staged Communicative Events

Staged communicative events are subdivided into narratives and experiments.

1. NarrativesDOBES Project on Bainounk language and culture

Page 7: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

7

One of the project’s post-doctoral researchers and the project principle investigator contributedto a study of Bainounk language and culture and those which were recorded in Djibonker areincluded in the corpus.

2. Experiments

• Spatial language and cognition beyond Mesoamerica

Another of the project’s post-doctoral researchers performed tests participants primarily basedin Brin to assess linguistic specifications of spatial relations among objects.

• Deictic gestures and spatial language

The third of the Ph.D. students on the current project is gathering data through experimen-tation on gesture.

• Pear Stories

The Pear Story (The Pear Stories: Cognitive, Cultural, and Linguistic Aspects of NarrativeProduction, 1980) was used by one of the project’s post-doctoral researchers as a means bywhich to determine the frequency of argument ellipsis among the three Crossroads languages.

Interview

Interviews are included in the corpus because, although many of them were collected in French,the former colonial language of Senegal, many were also collected using languages indigenous of thearea and thus provide examples of a genre not found among other types of collected data. Further,the information collected during the interviews are included among the participants’ metadata,discussed in §2 above.

1. Social Network Study InterviewsAs noted above, six participants (two from each village) are primarily involved in the socialnetwork study. Interviews were conducted with these six participants as well as two levels oftheir social networks.

2. Sociolinguistic InterviewsSociolinguistic questionnaires were gathered by the project Ph.D. students for their sociolin-guistic studies and a member of the transcriber team who is currently working towards hisMaster’s at the University of Ziguinchor.

Elicitation

Elicitation, while not being a natural form of discourse, is included in the corpus as the data serveas a means by which the researcher may investigate aspects of a language’s grammar, for instancethe phonemic inventory as depicted in §4.1.

1. Lexical ElicitationAnother component of the current project is the obtaining of a comparative word-list for thearea. The word-list currently contains over 1300 items from each of the three Crossroads lan-guages. Lexical items with the same meaning and which were phonetically similar over the threelanguages were also targeted by the author for a study of pronunciation; that which constitutesa ‘foreign accent’ Hantgan (2016).

2. Grammatical Elicitation

Page 8: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

8 CHAPTER 2. OVERVIEW

Two of the current project members also studied in the area for their Ph.D. research; their field-work data which were gathered towards completing reference grammars for two of the crossroadslanguages are included in the corpus.

2.1 Proportionality

The data which are housed in the corpus were not gathered nor compiled with any intention ofbeing proportionate in terms of demographics, language, or genre. Following (Gries & Berez, toapprear), Chapter 6, takes the disproportionality outlined here into account with the creation ofsamples. The descriptive statistics presented in this chapter were obtained on or before March 1,2017.

Representation

There are 211 participants (83 females and 128 males) and 20 languages represented in the corpus.The participants’ individual proportionality are illustrated by Figure 2.1 and that of language inFigure 2.2.

Figure 2.1: Percentage of Participant Duration

We can see from this figure that certain participants such as LM and CB3 have a higher proportionalduration in the corpus.

Figure 2.2: Percentage of Language Duration

The nearly equal duration of the three crossroads-area languages in the corpus was unintentional,but it speaks to the make-up of the data which have been gathered thus far.

Page 9: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

2.1. PROPORTIONALITY 9

Classification

As shown in §2, the corpus is partitioned according to one of four speech genres. The number oftranscribed Elan files and media files (hrs:min:sec format) are summarised in Table 2.1.

Amount Observed Staged Interview Elicitation

Transcription 110 331 27 48Audio 16:55:58 31:23:04 04:46:01 05:21:35Video 5:14:59 31:58:17 01:55:07 02:26:41

Table 2.1: Corpus Contexts

The largest proportion of data is classified as belonging to the genre of staged communicativeevents. This is important to note because, as shown in the following diagrams 2.4 and 2.3, thenumber of languages represented greatly increases with the naturalness of speech genre. Thisobservation is explored in depth in the next section, §2.1.

Figure 2.3: Proportionality of Languages in Observed Communicative Events

Figure 2.4: Proportionality of Languages in Staged Communicative Events

Page 10: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

10 CHAPTER 2. OVERVIEW

Contextualisation

By tagging the corpus for instances of inter- and intra-utterance language changes, we can countthe number of times a speaker changed languages within a speech genre. Figures 2.5 and 2.6illustrate that language single language utterances are less common among observed than amongstaged communicative events.

Figure 2.5: Proportionality of Language Switching in Observed Communicative Events

Figure 2.6: Proportionality of Language Switching in Staged Communicative Events

While we see that one language remained pervasive through each type of speech event, therewere fewer language changes in the staged communicative genre than in the observed communicativegenre. Two main reasons account for this disparity: first, staged communicative events were thosefor which a research asked a consultant to speak in a particular language; this also accounts for thelower number of languages as these five (Joola Kujireray, Joola Banjal, Bainounk Gubeeher, Wolof,and French) were either the target or the source language of these events. Secondly, as is stated byGreen and Abutalebi (2013), natural contexts in which multilingual speakers are present are thosein which language changing is expected to occur. Further, as noted in §2.1 above, participants inthe corpus are not equally represented. To account for this factor, just those participants who were

Page 11: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

2.1. PROPORTIONALITY 11

found in both speech contexts were sampled for their language distributions. Four participantswere found to be active in both speech genres; their durations are shown in Table 2.2.

PAR OBSV STAGLM 40 12GS 8 5HPS 4 2JHS 12 1

Table 2.2: Duration of Transcribed Utterances (minutes)

While the amount of time spent speaking in each genre was not equal, we can still see from thepercentages shown in Figures 2.7 and 2.8 that the percentage of single language usage eclipses thatof multi language usage in staged communicative events.

Figure 2.7: Proportionality of Language Switching in Staged Communicative Event Sample

Figure 2.8: Proportionality of Language Switching in Staged Communicative Event Sample

Page 12: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

12 CHAPTER 2. OVERVIEW

Thus, we see that, although many languages are represented in the corpus, active multilingual-ism increases in proportion to the naturalness of the communicative event. However, note that theproportion of single language usage is still higher than that of multiple language usage even amongobserved communicative events.

Collection

A component of the collected data gathered for the social network study is that of participants’reported speech repertoires. Focusing here on those languages which are associated with the Cross-roads area, we see a surprisingly low number of participants (29 out of 113) who claim proficiencyin all three languages. Although speaker’s reported proficiency may not be a reliable diagnostic ofspeech usage, it is still worth acknowledging in terms speech accommodation patterns.

Figure 2.9: Speaker Reported Repertoires from Social Network Study

As noted in 2.1, a relatively even amount of data from each of three target languages was,unintentionally, included in the overall corpus. As a comparison of actual speech usage, a samplewas created which includes about ten minutes of transcribed, naturally occurring speech eventswith at least five participants, from the three villages. As can be viewed in Figure 2.10, the relativeproportionality differs significantly from that shown in Figure 2.2 above.

Figure 2.10: Language Usage in Observed Communicative Events Sample

Page 13: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

2.1. PROPORTIONALITY 13

We see from this graph that Joola Banjal is spoken with a disproportionate duration to theother two languages in the sample, contradicting the trend illustrated in Figure 2.9 of reportedlanguage proficiency in Joola Kujireray being higher than the other two languages.

Page 14: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 15: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 3

File Incorporation

At this stage in the corpus’ organisation, both existing and new files need to be modified andincorporated into the corpus structure. The procedure for incorporating newly recorded sessionsinto the corpus is described in §3.1 and that of existing sessions in §3.2.

3.1 New Recording Sessions

To incorporate newly acquired transcribed sessions into the corpus, the procedure is outlined asfollows. Crossroads researchers are encouraged to follow steps outlined in (1) and the corpus man-ager in (2).

1. Crossroads Researchers

a) Save newly transcribed .eaf files to Q:\Shared_Resources\Files_for_Transfer

Newly transcribed .eaf files which are brought back with a researcher returning from thefield and those sent by Dodo will continue to be saved in the Files_for_Transfer folderin the Q-drive.

b) Notify all researchers of transfer

The person who transfers the files will notify all researchers that there are newly tran-scribed files which need to be moved by each researcher to his/her folder on the Y-drive.

c) Each researcher prepares his/her files for corpus inclusion.

Following guidelines in §3.3, each researcher prepares his/her files to be included in thecorpus.

d) Make copy of transcribed (.eaf), metadata (.imdi), and media (.wav and .mp4) files inQ:\Shared_Resources\Files_for_Corpus

Each researcher saves a copy of their session files in the Files_for_Corpus folder. Pleasefollow steps outlined in §8.

e) Notify corpus manager upon completion of this process

15

Page 16: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

16 CHAPTER 3. FILE INCORPORATION

2. Corpus Manager

a) Move both media and metadata files to corresponding folders in Q-drive

Please move (not make a copy) of the first .eaf’s corresponding .wav to theQ:\Crossroads_Corpus\WAV_MP4\WAV folder, the .mp4 to theQ:\Crossroads_Corpus\WAV_MP4\MP4 folder, and the .imdi file to theQ:\Crossroads_Corpus\ARBIL folder.

(Right click the file, select cut, right click the destination folder, select paste. Repeat foreach type of accompanying file except the .eaf itself.)

b) Open first .eaf file in Q:\Shared_Resources\Files_for_Corpus folder

c) When prompted select media files from designated folders

For .mp4 files, navigate to the Q:\Crossroads_Corpus\WAV_MP4\MP4 folder and paste inthe name of the file. Click open and the video file will be linked to the Elan file. Re-peat the process for the audio file (navigate to the Q:\Crossroads_Corpus\WAV_MP4\WAVfolder.)

d) Ensure that file is corpus-ready (see §3.3)

Once the Elan file is opened and the media files are linked, look over the file to ensurethat all necessary components are complete. If not, note them in the Corpus_Upkeep file

Page 17: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

3.1. NEW RECORDING SESSIONS 17

or resolve them following the procedures outlined below in §3.3.

e) Link metadata (.imdi) file

f) Save in designated folder based on genre in metadata

Save file to the genre listed in the metadata file. For example, the file pictured aboveis listed as being in the genre of Observed Communicative Event so it would saved inQ:\Crossroads_Corpus\ELAN\Obsv. The sub-genre is listed as being Language Use Col-

Page 18: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

18 CHAPTER 3. FILE INCORPORATION

lection data so it is saved within the KGB-sub-folder.

g) Note any missing information in Corpus Upkeep

Upon completion of the verification of the files to be incorporated into the corpus, the cor-pus manager will inform the research assistant, who will then make the necessary changes.

3.2 Existing Recording Sessions

Some of the existing transcribed files are missing information such as metadata. The followingprocedure outlines how to incorporate such changes into existing Elan files.

1. Metadata

If there is no metadata file for a transcribed audio file, notify the researcher so that s/he cancreate or amend his/her Arbil. Once the file has been created, the researcher will then exporthis/her Arbil, either the entire database, or those specific files, to the researcher’s Metadatafolder located in the Q-drive (Q:\Shared_Resources\Metadata\). Once the researcher hassaved the the new Arbil files to his/her sub-folder folder in the Metadata folder, you may incor-porate them into the corpus following the procedure outlined here:

a) Locate all newly exported .imdi files in researcher’s Metadata folder

b) Paste a copy of all newly exported .imdi files to Q:\Crossroads_Corpus\ARBIL folder

Page 19: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

3.2. EXISTING RECORDING SESSIONS 19

c) Sort Corpus Upkeep by .IMDI column

d) In Elan, open first file listed as not having an .IMDI file

Page 20: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

20 CHAPTER 3. FILE INCORPORATION

e) Select .imdi file located in Q:\Crossroads_Corpus\ARBIL folder

f) Adjust participants accordingly

Page 21: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

3.3. CORPUS READY 21

Using the newly incorporated metadata as a guide, adjust tier prefixes to match those ofthe participant(s)’ ID’s accordingly.

3.3 Corpus Ready

Having followed steps outlined in 3.1, this section lists the necessary components of a Corpus-Readyfile and how to adjust the file if necessary. The first step is to link the metadata directly into theElan file.

1. Link Metadata

a) Open first file in Q:\Shared_Resources\Files_for_Corpus folder

b) Link media files when prompted as shown in (2c) above

(Ensure that the media and metadata files have been moved to their corresponding corpusfolders following step (2a) above before opening the Elan file.)

c) Link metadata file

Click on the metadata tab to the right of the side of the Elan screen, then on SelectMetadata Source... Navigate to the Q:\Crossroads_Corpus\ARBIL folder, (Elan will au-tomatically navigate to this folder after the initial file), and paste the name of the Elan fileinto the search field. Simply add the extension .imdi and press RETURN or click Selectand the file will appear.

d) Click Apply for the Arbil file to be linked to the Elan file.

Page 22: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

22 CHAPTER 3. FILE INCORPORATION

e) Click configure, select all the options (CTRL + A), and click Apply

f) Right click anywhere in the metadata field and select Hide Empty Metadata fields and TreeView

Page 23: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

3.3. CORPUS READY 23

Now you have the information needed to complete the next corpus-readying steps (actor(s),language, and genre).

2. Check participant codes match tier prefixes

a) Ensure participant code in metadata matches tier prefixes

For those files which the transcribers have indicated the participants, make sure these arethe same as those indicated in the metadata and tier structure as well.

b) If incorrect, change via Tier > Change Tier Attributes

Page 24: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

24 CHAPTER 3. FILE INCORPORATION

3. Annotate for language

a) Create Language note tier if not already present

b) Tier > Add New Tier...

The tier name is the participant code, underscore Language, hyphen note. The parent tieris the Transcription tier and the Tier Type is Note.

c) Create a copy of the annotations from the Transcription tier on the Language tier

Page 25: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

3.3. CORPUS READY 25

d) Tier > Create Annotations on Dependent Tiers...

First, select Next, then uncheck the Word tier and check the Language note tier. Click finish.The newly created Language note tier will appear with empty annotations corresponding tothe transcribed utterances in the Transcription tier.

e) Populate empty annotations with two letter language abbreviation

f) Search > Find and Replace (or CTRL + F)

Search for all the empty annotations on the Language note tier. Replace all these emptyannotations with the two letter language abbreviation of the language that was listed in themetadata for the file.

Page 26: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

26 CHAPTER 3. FILE INCORPORATION

4. Save in Crossroads Corpus genre-specific foldera) File > Save as...

Navigate to the file which corresponds to the genre listed in the metadata and save the filethere.

Read the next chapter to learn how to use your newly ingested corpus files which are ready tobe searched for comparison and analysis.

Page 27: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 4

Searches

I have compiled the Elan corpus in such a way to facilitate searches. This chapter provides instruc-tion on how to search for a linguistic variable §4.1,

For the time being, the Crossroads corpus is yet to be interlinearised or tagged for part ofspeech. For interlinearised texts (see §7), searches for syntactic categories or morphemes may alsobe performed using the same steps as follows.

4.1 Variables

This section describes a means by which one can search for a given variable such as a specific lexicalitem (even with inconsistent spellings), or a phoneme in an Elan-based corpus.

1. Elan

In this example, we will use Elan to search the Crossroads corpus for the corresponding lexicalitem [-kkan] ‘put’. This lexeme appears in all three main Crossroads languages, but is pro-nounced (and is spelled) variably with a geminate consonant and a short vowel [-kkan], or witha singleton consonant and a long vowel [-kaan]. By using regular expressions, we can includeresults which reflect this variation.

a) Open Elan(it is not necessary to open an Elan file unless you wish to search only that file)

b) Search > Search Multiple eaf...

c) Define Search Domain

The first task is to create a domain; usually (and for this example), this is the Crossroadscorpus of Elan files located in the Q-drive. To create a domain, first click Define domain,which then reveals the option to Specify a new domain. Click New Domain... and thennavigate to and select the folder(s) which contains Elan file(s) you wish to search.

27

Page 28: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

28 CHAPTER 4. SEARCHES

d) Click OK > Name the domain (for example Crossroads Corpus) > OK

Once the search dialogue reappears, you are ready to perform the search. For this example,you will use the regular expression exactly as it appears in the picture below.

e) Tick regular expression

Page 29: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

4.1. VARIABLES 29

f) Click search

g) Right click the table to remove any unwanted columns.

Page 30: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

30 CHAPTER 4. SEARCHES

Results are displayed as table which includes adjacent annotations as well as the target, inbold, listed within the annotation in which it occurs. Click on any of the results to listen tothe utterance.

h) Click Export to export search results to .csv file(This export produces all columns)

or

i) Right click search results table > Export Table As Tab-delimited Text...(This export only produces displayed columns)

2. Excel

As an additional step, the tokens found as a result of the search can be imported or opened inExcel (or other data processing programs).

a) Open Excel(simply open the program, do not open the created file yet)

b) File > Open > .txt or .csv file

c) Select Unicode .UTF8 font

Page 31: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

4.1. VARIABLES 31

d) Leave everything else as is, just click Next until the end, then click Finish

The results may now be sorted, for example as I have done in the .txt file pictured on theright, according to whether they have been transcribed with a geminate consonant or a longvowel. I can then return to Elan to listen to these instances by locating them by number inthe original output of my search in Elan, as indicated in the picture on the right.

When we return to compare the results in Elan, we can, as an additional step, compare theassociated metadata for the chose files.

Page 32: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

32 CHAPTER 4. SEARCHES

In this example in particular, we learn that a difference in the pronunciation of a geminateversus a long vowel is due to the language of the speaker; the participant who is speakingin Bandial uses the geminate variant while that of Kujireray is using a long vowel.

An additional optional step is to export specific Elan files which contain examples of particularinterest to FLExTexts and then as texts into the multilingual FLEx Gujireray database project forinterlinearization and further analysis, particularly that of word-internal, morpheme-by-morphemelanguage interchanges. Skip to 7 to learn how to export and reimport to/from FLEx.

Page 33: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 5

Results

5.1 Vignettes

The vignette is a convenient way to obtain descriptive statistics for languages tagged in an Elanfile, such as the proportionality of one language versus another in a session or conversation. Thefollowing pie chart provides an example to illustrate the proportionality of languages spoken in oneof the natural spoken language recordings.

Figure 5.1: Proportionality of Languages in Session BRI080416RWRD

This section will teach you how to create such a chart using information exported from Elan andthen imported into either Excel or SPSS. You will need an Elan file for which the languages (andideally the participants) have been identified.

1. Elan

a) Open .eaf fileb) File > Export as > Tab Delimited Text...c) Select Transcription, Translation, and Language note tiers

(for each participant in the file)

33

Page 34: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

34 CHAPTER 5. RESULTS

d) Check that duration is also selected

e) Save as...(I recommend to save as temp.txt on the desktop)

You have now exported the relevant information from Elan, (the transcription and transla-tion, and the language and duration of each utterance), for each participant in the recordedsession. The reason we export from Elan is in order to view the information in a list, ortabular, format. However, the export from Elan requires some clean up before the results areviewable, particularly if there are multiple participants in the file.

2. Excel

a) Open Excel program(simply open the program, do not open the created file yet)

b) File > Open > .txt file

c) Select Unicode .UTF8 font

Page 35: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

5.1. VIGNETTES 35

d) Leave everything else as is, just click Next until the end, then click Finish

Now you should have one column which has the duration for each participant’s utterances,followed by each participant’s tiers in a separate column for transcription, translation,and language. The utterances, however, are spanned across separate rows. We want toconflate these rows into one single column. But first, we have to indicate which rowsbelong to which participants.

e) Select first participant’s Transcription tierf) Ctrl + C

(copy first participant’s transcription column)g) Home tab > Insert

(make copy of first participant’s transcription in adjacent column)h) Ctrl + H > Replace with first participant’s ID

(replace first participant’s utterances with their ID code)

Page 36: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

36 CHAPTER 5. RESULTS

i) Repeat for each participant

Now that we have created a separate column which identifies the speaker of each utter-ance in the file, we can now conflate the utterances into one column.

j) Select second participant’s columns(participant ID, transcription, translation, and language note columns)

k) Ctrl + C(copy second participant’s columns)

l) Select first participant’s tiers(participant ID, transcription, translation, and language note columns)

m) Paste > Paste Special.. > Skip Blanks > OK

Page 37: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

5.2. CHARTS 37

n) Save > Save as... Excel Binary Workbook(Use the same file name as the original file and save it in the docs folder of the corpus)

Now we have the Elan data in a tabular format that is easy to read and to view the locusfor a language shift. We can also create a pivot table in Excel which allows us to comparethe duration of each language spoken in the file, and by which participants. For additionalstatistical operations, we can open the .xlsx file in IBM SPSS, either creating variables foreach tier/column or using the existing SPSS vignette template located in the docs folder ofthe corpus.

5.2 Charts

In this section, you will learn how to create a chart like that which was featured at the beginningof the previous section.

1. Excel

a) Open the Excel file that you just created following the steps in §5.1.b) Select upper left-hand cell of spreadsheetc) Insert > pivot table

(leave all options as is)d) Select duration and language for given participante) Unselect blankf) Right click within pivot table to select Show as percentages of totalg) Design tab > select Pivot Charth) create desired type of chart

The chart can now be saved as a picture, or directly copied and pasted to preferred platform(Word, PowerPoint, etc). Some charts have already been created and are stored in the .jpg

Page 38: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

38 CHAPTER 5. RESULTS

folder of the Crossroads Corpus.2. SPSS

Page 39: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 6

Sample Design

The data which are housed in the corpus were not gathered nor compiled with any intentionof being proportionate in terms of demographics, language, or genre(for information about theproportionality of the corpus, see §2). Therefore, it is necessary to create samples based on thetopic in which you are interested in researching. Samples may be created either using Elan or R,depending on where the information is stored (i.e. in the data or the metadata respectively). Here,I will give two examples of potential samples: one based on gender which was compiled using R in§6.1, and one based on genre which was compiled using Elan in §6.2.

6.1 Sampling Based on Metadata

6.2 Sampling Based on (Transcribed) Data

39

Page 40: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 41: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 7

Interlinearisation

This chapter will teach you how to create an interlinearised transcription for a recording sessionusing Elan and FLEx. A combination of Elan and FLEx is recommended for interlinearisationof texts because of the ability to time-align transcriptions with audio in Elan, coupled with thecapability of dictionary creation and morphological parsing in FLEx. If you have never used Elanto transcribe before, or have a simple recording session such as elicitation, begin with §7.1. If youhave a more complex recording scenario, or are already familiar with Elan, skip to §7.2.

7.1 Monolingual, Single Participant Text

1. ElanFirst, you will create a new Elan file for the media file you recorded. So that the Elan anno-tation file can be processed into FLEx, we use a set of tiers which have already been createdand stored in an Elan template file.

a) Open Elanb) File > New...c) Locate media file(s), .wav and/or .mp3

Elan does not allow certain types of media files (such as .mov) so these must be convertedto .mp3 or .mp4. It is also advisable to create a .wav file. If a .wav file is present, thewave form is visible in Elan and can be opened with Praat (see §X).

d) Click Template, Select Crossroads.etf (Located in ETF folder in Q drive)

41

Page 42: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

42 CHAPTER 7. INTERLINEARISATION

e) Click OK

You have now created a new Elan file which is ready for transcription.

f) File > Save as... (name the Elan file the same as the media file(s))

Now you are ready to begin transcribing.

g) Options > Annotation Mode > Highlight utterance > Double click > Transcribe

Using the waveform as a guide, highlight the utterance you wish to transcribe. The levelof the utterance is your choosing: phrase, word, morpheme, or even phoneme1. Click theplay button to listen to the utterance and slow down the rate if necessary.

h) Transcribe the file

Once you are happy with your transcription, you are ready to export the file to FLEx forinterlinearisation, including glossing and morpheme-breaks.

i) File > Export as... > FLEx file

j) Click Export interlinear-text tier > Select Interlinear-title-kuj(unclick Export paragraph tier)

1Phonetic transcription would be better done in Praat; see §X

Page 43: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

7.1. MONOLINGUAL, SINGLE PARTICIPANT TEXT 43

Click next until you are prompted to save the file. Save the file with the same name as themedia file(s), but add segmentation to the end of the file name, i.e. BAN07012016AH3 segmentation.flextext

2. FLExYou are now ready to import your newly created transcribed file into FLEx for interlineari-sation.

a) Open FLEx database for your language(if you have not yet created a FLEx database for your language, see §X).

b) File > Import > FLExText Interlinear... > Select .flextext file > Open > OK

You may now interlinearise the text.2

You may use the category of spelling variant to add any words which are spelled in a waywhich is different than your lexeme form. Alternatively, you could list these as allomorphsif you believe that the spelling accurately represents the pronunciation of the word in theaudio file.Once you are finished with the interlinearisation, you will export the file back into an .eaffile format.

c) File > Export Interlinear... > FLEXTEXT > Export... > Select file > OK

Save the file with the same name as the media file(s), but add interlinearisation to theend of the file name, i.e. BAN07012016AH3 interlinearisation.flextext

3. Elan

a) Open Elan

2For instructions on how to use FLEx to interlinearise a text, see §X. If you are prevented from interlinearisingthe text, see §10.1.

Page 44: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

44 CHAPTER 7. INTERLINEARISATION

b) File > Import > FLEx File

c) Select file > Check as above > OKSelect only the .flextext file, the media files are already attached.

Now you have imported your interlinearised text from FLEx back into Elan.d) File > Save as...

Save the file as an .eaf, i.e. BAN07012016AH3 interlinearisation.eaf

You have now successfully created an interlinearised text file which may be searched for morpho-logical category or part of speech as described in Chapter 4. The next section describes how to

Page 45: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

7.2. MULTILINGUAL, MULTI-PARTICIPANT TEXT 45

deal with more complex recording sessions such as interviews or natural speech data.

7.2 Multilingual, Multi-participant Text

Although FLEx was not designed to handle multi-party, conversational data, in different languages,with the help of the multilingual Gujireray lexical database (located in the FLEx folder of theCrossroads Corpus), we can accurately depict some of the nuances of multilingual speech throughthe identification of language switching at the lexical and morpheme levels; something that is notcaptured simply through the transcribers’ identification of languages at the utterance level. Forexample, note that in Fig. 7.1, the language(s) which the transcriber has identified differ with thoseof the researcher.

Figure 7.1: Elan Excerpt from Natural Conversation Interlinearised Transcription

Where they have been identified by the transcriber, both the participant and language note tiersare included in the FLEx import.

The process for exporting from Elan, importing into FLEx, and then re-importing into Elan remainsthe same as that which is outline above in §7.1, repeated simply here.

1. Elan

a) File > Export as... > FLEx file...

b) Tick: Export Interlinear-text tier > Select: Interlinear-title-kuj(ensure that the suffix of your Elan file’s Interlinear-title tier matches that of the threeletter code of your FLEx database vernacular writing system)

c) Untick: Export paragraph tier

d) Select: All transcription and word tiers > Next

Page 46: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

46 CHAPTER 7. INTERLINEARISATION

Figure 7.2: FLEx Excerpt from Natural Conversation Interlinearised Transcription

e) Leave everything as is > Next

For any tiers which do not yet have an associated language for export, select a language

f) Next > Save > Finish

2. FLEx

a) Select Texts & Words tab

b) File > Import > FLExText Interlinear... > Locate file > Import

Page 47: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

7.3. PRE-EXISTING DATA 47

Now that you have imported the transcribed Elan file into FLEx, you may use the auto-matic parser if you have created morphological templates for it or you may interlineariseyour text by hand. When you are done, you can export it back into Elan by followingthe steps thoroughly outlined above, and repeated succinctly here.

3. Elana) File > Import > FLEx file...

b) Select: exported file > Tick: Include “interlinear text” elementc) Smallest time-alignable element, select: phrase

d) Duration per phrase element: 4000

e) OK

You will now see the interlinearised text in Elan with a tier structure which reflects thecolumns you configured in FLEx. As with that above, you now have the additional abilityto search the file for morphological or syntactic category.

7.3 Pre-existing Data

If you have been using Elan and/or FLEx for some time for your fieldwork, but did not anticipateusing the two together, you may still adapt files so that they can be used with either program.This section will teach you how to do so.

1. Existing FLEx but no ElanThe process for exporting a file from FLEx into Elan is the same as that above, save theadded step of time-aligning the transcription with the audio recording.a) Open the text you want to export in Fieldworks under the analyse pane.b) Under tools select configure to choose which interlinear lines to show.c) Go to File, Export Interlinear, and then select FLExText.d) Save this file somewhere you can easily retrieve it.e) Open Elan.f) Go to File, Import, FLEx File.g) Select the FLExText file you just generated plus the accompanying sound file as the media

file.h) The smallest alignable time element is the phrase. Leave everything else unchecked.i) Save this file as filename raw.eaf (where filename is the same as that of the recording).

Here is the point at which you will need to follow some additional steps to time-align theElan file.

j) In Elan, go to file and select new, here only select your sound file.k) Open this side by side with the Fieldworks text in baseline mode/tab.l) In Elan, go to Options and select segmentation mode.

m) Select one keystroke per annotation (adjacent annotations).n) Play the sound file in Elan, and while listening to it, follow along with however you have

arranged the text into paragraphs in Fieldworks (remember every time you hit return inthe baseline mode, that is a new paragraph).

o) After each paragraph in Fieldworks, hit a key on the Elan screen to segment the soundfile.

p) Save this file as filename segmentation.eaf

Page 48: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

48 CHAPTER 7. INTERLINEARISATION

q) Go to (or download it if you don’t have it) Notepad++.r) Open up both the .eaf files (raw and segmentation) in Notepad++.s) They should have the same number of time slots. Select all the time slots (like this:

<TIME SLOT TIME SLOT ID=”ts1” TIME VALUE=”1423”/<) in the segmentationeaf file and copy them into the raw eaf file.

t) Save this file as filename done.eaf.u) Open filename done.eaf in Elan.

You have now created a time-aligned, transcribed and interlinearised text in Elan.2. Existing Elan but no FLEx

For those of you who have been using Elan for some time but have never exported a file toFLEx, you are most likely using a template which is beneficial to your needs as a researcher,but cannot be read by FLEx. The process for exporting the file from Elan, importing it intoFLEx, and then re-importing it into Elan is the same as outlined in §7.1 above, however thetier structure of Elan must be modified before doing so. The process is not too onerous, andis very much worth the effort in order to utilise the text tools available in FLEx, let alone theability to add words from a text directly into the lexicon. This section will teach you how todo adapt your tier structure so that it is able to be imported correctly in FLEx.

a) Open file for conversion in Elan

b) Tier > Import Tiers > Browse > Import > Close(Choose the Crossroads template as the file with the tiers to import.)

c) Right click your transcription tier > Select: Change Attributes of (your tier name)

Page 49: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

7.3. PRE-EXISTING DATA 49

d) Change Tier Name to match that of Transcription tier from imported template(Remember to change tier suffix to match 3-letter code of your FLEx database vernac-ular language writing system. The tier prefix may also be exchanged with the speaker’sinitials.)

e) Change Linguistic Type to Phrases

f) Leave everything else unchanged

g) Repeat process for translation tier, matching imported template translation tier

Page 50: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

50 CHAPTER 7. INTERLINEARISATION

This is the process for changing existing transcription and translation tiers in Elan to tierswhich can be read in FLEx. These tiers are assumed to be the minimum that any Elanfile would contain. If the transcription has also been parsed by morpheme (in Elan), thenchange that tier to match that of the imported template Word tier. If no such morphemetier yet exists, proceed as follows.

h) Tier > Add new tier... > Change Tier Name to match Word tier from imported template

i) Parent tier is Transcription Tier

j) Linguistic type is Words

k) Additional tiers may be changed to that of Note tier(right click imported note tier for specifications and follow steps above, changing tiername and linguistic type)

l) Add title tier(following steps above, changing tier name and linguistic type)

m) Right click A Transcription-txt-kuj > Delete A Transcription-txt-kuj

Page 51: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

7.3. PRE-EXISTING DATA 51

(As a final step, delete the imported tiers.)This is the method by which a simple Elan tier structure can be adapted to that of aFLExText import. The following, more complex, example is from a researcher who wasusing Elan to interlinearise her texts. The added complication was that she has tiers inmultiple different writing systems.

Following the steps outlined above, we were able to convert this Elan file’s tier struc-ture into one that can be read in FLEx so that she can incorporate new lexical itemsfrom the text into her dictionary. We performed the additional step of exporting one ofher existing texts from FLEx into Elan to gain the information about the additional tiers.

Page 52: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 53: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 8

How to

8.1 Export Arbil

Because there have been difficulties with Arbil exports, a workflow is presented in this section.Please follow these steps when transferring .imdi files to the corpus.

1. Open Arbil

• Right click either corpus node or individual session node to export < Select Export

• Click Export Branch Destination Directory

Export to the Files_for_Corpus folder. (If exporting from your laptop to an externalhard-drive, follow the same procedure as above but select a folder on your own hard-drive as the destination folder for the export.)

• Uncheck Export Resource Files, uncheck Rename Metadata Files

53

Page 54: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

54 CHAPTER 8. HOW TO

Media files associated with your Arbil sessions are not to be included in the export asthese will cause errors in transfer.

• Click Start

The exported file(s) will now appear in the Files_for_Corpus folder. If the export donefrom the researcher’s laptop, please copy the files to the Files_for_Corpus. Please no-tify the Corpus Manager of the successful transfer.

8.2 LAMUS Upload

The corpus in its entirety was uploaded to LAMUS in March 2016. The process for uploadingand updating files on LAMUS is described in this section.

1. Navigate to LAMUS https://lat1.lis.soas.ac.uk/jkc/lamus/> Click Create new workspace

The system will ask you to login with your credentials if you have not already done so andmay also ask for a Java update.

2. Right click Crossroads corpus node > Click Select this node as top node for new workspace

Page 55: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

8.2. LAMUS UPLOAD 55

The page should now load all the files that have been uploaded to this corpus node. Thisprocess takes a few minutes given that there are many files associated with the node. If aquicker result is desired, instead of selecting the corpus node, select the individual researcher’snode with whom the files to be uploaded/updated are associated. The limitation therein isthat you will only have access to that specific node.

3. Click on Upload Files (lower left-hand corner)

4. Search and select or drag and drop files to be uploaded from corpus on Q-drive to LAMUS(An Arbil .imdi file must be included in the files to be uploaded as this file will create thesession node on the LAT server.)

The simplest way to upload multiple files to LAMUS is to select them from the Q-drive thendrag and drop them into the uploader. Once the selected files have are ready, click Upload.You should then see a message that the files have been successfully uploaded.

5. Right click corpus subnode where files will be linked

6. Select link session node

Page 56: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

56 CHAPTER 8. HOW TO

7. Check box next to session to be linked

8. Select Link

A new session will be created.

9. Right click newly created session node

10. Select link media resource

Page 57: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

8.3. NUMBER OF HOURS OF MEDIA 57

11. Repeat steps 7 - 8.

https://lat1.lis.soas.ac.uk

8.3 Number of Hours of Media

The process to determine the number of hours of corresponding audio and video files for thetranscribed corpus files is arduous, therefore, I have put copies of all the media files into a folderwithin the corpus folder so that these totals can be easily obtained. Otherwise, method is as follows:

1. Windows Explorer

a) Search for all Elan files in Corpus

b) Select names all Elan files in Corpus (ctrl A)

c) Copy as Path (ctrl shift > right click > copy as path)

Page 58: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

58 CHAPTER 8. HOW TO

d) Results have been copied to clipboard

2. Notepad

a) Open Notepad

b) Paste results (ctrl v)

c) Clean up so only file name remains (ctrl h)

d) Select first 50

e) Replace tab with OR using RegEx([\r\n]+)

f) Select first line and copy

Page 59: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

8.3. NUMBER OF HOURS OF MEDIA 59

3. Windows Explorer

a) Paste file names into search box

b) Search

c) Select all > Right click > Properties > Details

d) Copy length

4. Excel

a) Open Excel

b) Paste lengths into columns

c) Obtain total time

Page 60: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 61: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 9

External Links

http://www.mpi.nl/tools/elan/tp/how-to/ELAN-FLEx-ELAN.zip

61

Page 62: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 63: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Chapter 10

Troubleshooting

10.1 FLEx

Problem: Unable to interlinearise imported text from ElanSolution: In the Baseline tab of the text interface, select the entire text. Ensure that the writing system

which is selected is the same as that which is used for the lexeme forms.

Problem: Export from FLEx into Elan is not time aligned correctly.

63

Page 64: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 65: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Appendix A

List of .xlxs files in Crossroads Corpus

1. XRoads_ACTORSUpdated duration and list of identified actors in Crossroads corpus; pivot table to illustrateproportionality

2. GENDER_LANGSpeakers reported repertoires and gender distribution of participants in Crossroads corpus

3. Xroads_PAR_DURPrevious version of (1)

4. SAMP_OBSV_LANGComparison of observed language use from equivalent length of each village

5. LANG_DUR_CORP_XroadsLanguages identified in Crossroads corpus; pivot table to illustrate proportionality

6. LANG_DUR_CORP_ALLLanguages identified in entire corpus; pivot table to illustrate proportionality

7. OverlappingParParticipants’ overlapping presence in genres

8. SelfReportedRepertoiresPrevious graph of (2), redone there to account for loss of original data

9. SAMP_WORD_EXP_OBVSFrequency of words found in observed communicative events genre

10. SAMP_WORD_EXPFrequency of words found in Crossroads corpus

11. CORP_WORD_EXPFrequency of words found in entire corpus

65

Page 66: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

66 APPENDIX A. LIST OF .XLXS FILES IN CROSSROADS CORPUS

12. CORP_DUR_PARDuration and list of identified actors and languages in entire corpus; pivot tables to illustrateproportionality

13. OBSV_LM_DUR_LANGLM’s language usage across observed communicative events in Crossroads corpus; pivot tableto illustrate proportionality

14. STAG_PAR_DURDuration and list of identified actors and languages in staged communicative events; pivottables to illustrate proportionality

15. OBVS_PAR_DURDuration and list of identified actors and languages in observed communicative events; pivottables to illustrate proportionality

16. ALL_PAR_DUR_LANGDuration and list of identified actors and languages in entire corpus; pivot tables to illustrateproportionality

17. INTV_PAR_DURDuration and list of identified actors and languages in interviews; pivot tables to illustrateproportionality

18. STAG_ALL_DUR_LANGDuration and list of identified actors and languages in staged communicative events taggedby single, dual, or code-switching context; pivot tables to illustrate proportionality

19. OBVS_ALL_DUR_LANGDuration and list of identified actors and languages in observed communicative events taggedby single, dual, or code-switching context; pivot tables to illustrate proportionality

20. OBVS_HPS_DUR_LANGHPS’s language usage across observed communicative events in Crossroads corpus; pivot tableto illustrate proportionality

21. STAG_GS_DUR_LANGGS’s language usage across staged communicative events in Crossroads corpus; pivot table toillustrate proportionality

22. STAG_LNS_DUR_LANGLNS’s language usage across staged communicative events in Crossroads corpus; pivot tableto illustrate proportionality

23. STAG_JHS_DUR_LANGJHS’s language usage across staged communicative events in Crossroads corpus; pivot tableto illustrate proportionality

Page 67: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

67

24. OBVS_JHS_DUR_LANGJHS’s language usage across observed communicative events in Crossroads corpus; pivot tableto illustrate proportionality

25. OBVS_GS_DUR_LANGGS’s language usage across observed communicative events in Crossroads corpus; pivot tableto illustrate proportionality

26. STAG_HPS_DUR_LANGHPS’s language usage across staged communicative events in Crossroads corpus; pivot tableto illustrate proportionality

27. STAG_LM_DUR_LANGLM’s language usage across staged communicative events in Crossroads corpus; pivot tableto illustrate proportionality

28. Vignettes‘Footprints’ for each of observed communicative event sessions depicting language use andduration for each identified participant

29. Sample_obsv_durExported transcription and duration for each participant in observed communicative eventgenre

Page 68: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 69: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

Appendix B

List of .spss files in Crossroads Corpus

69

Page 70: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing
Page 71: Crossroads Corpus Manual · crossroads area; two from Djibonker, one from Brin, one from Essil, and one who grew up in Essil but currently lives in Djibonker. In addition to providing

References

Green, D. W., & Abutalebi, J. (2013). Language control in bilinguals: The adaptive controlhypothesis. Journal of Cognitive Psychology , 25 (5), 515-530.

Gries, S., & Berez, A. (to apprear). Handbook of linguistic annotation. In N. Ide & J. Pustejovsky(Eds.), (chap. Linguistic annotation in/for corpus linguistics). Berlin, New York: Springer.

Hantgan, A. (2016). How foreign is accent? Expressions of peace in Casamance. Voices fromaround the world .

Himmelmann, N. P. (1998). Documentary and descriptive linguistics. Linguistics, 36 , 161–195.The pear stories: Cognitive, cultural, and linguistic aspects of narrative production. (1980). Nor-

wood, New Jersey: Ablex.Sagna, S. (2008). Formal and semantic properties of the gujjolaay eegimaa (a.k.a banjal) nominal

classification system (Unpublished doctoral dissertation). SOAS.

71