introduction to xaira part one: all about xaira andrew hardie

19
Introduction to Xaira Part One: All about Xaira Andrew Hardie

Upload: amber-knight

Post on 28-Mar-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Introduction to XairaPart One: All about Xaira

Andrew Hardie

Page 2: Introduction to Xaira Part One: All about Xaira Andrew Hardie

What is Xaira?

XML Aware Indexing and Retrieval Architecture

The XML-aware version of SARA for the BNC corpus

Several programs, including the Index Toolkit and the Client

Page 3: Introduction to Xaira Part One: All about Xaira Andrew Hardie

How do you pronounce “Xaira”?

Its designers pronounce it like “Sarah”

We pronounce it like “Zirah”

Other pronunciations may vary

Page 4: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Why are we talking about it?

Andrew and Richard have been beta-testers for Xaira for several years

Andrew wrote the help file

Page 5: Introduction to Xaira Part One: All about Xaira Andrew Hardie

What sort of program is Xaira?

Xaira is an analysis program for indexed corpora

Searching indexed vs. non-indexed corpora Indexing – retrieval Xaira does both

Page 6: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Indexing

Page 7: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Retrieval

Page 8: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Xaira contains

The Indexer itself

Xaira-tools “Easy” user interface for corpus set-up

and using the indexer

The Xaira “client” Sophisticated corpus analysis system Wordlist, concordance, collocation Structured searching

Page 9: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Client, server?

Why does Xaira describe itself as a client?

Xaira splits the work between… one program that you use to build the search

(the client), and one program that actually looks in the index

and finds the solutions (the server)

But you can just use the client like any concordancer software the user never deals directly with the server

Page 10: Introduction to Xaira Part One: All about Xaira Andrew Hardie

What is special about Xaira?

Xaira is based on XML

XML is based on Unicode

Thus Xaira can be used with any language in any alphabet

But Xaira has been specially designed to aid multilingual analysis e.g. allows Unicode keyboard setup for any

language

Page 11: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Do I need a Unicode corpus?

Yes! (… but ASCII counts as valid UTF-8)

Both UTF-8 and UTF-16 are OK

(If in doubt, ask Andrew about variant text encodings)

Page 12: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Does my corpus need to be XML?

No!

Xaira can add basic XML to a corpus of plain-text files

Xaira can also upgrade SGML to XML

TEI XML is perfect for Xaira…

… warning: Xaira will reject ill-formed XML or SGML files.

Page 13: Introduction to Xaira Part One: All about Xaira Andrew Hardie

First, index your corpus

Messages from the different tools appear here (you don’t need to worry about them)

Access the commands you need to set up and run the indexer from the Tools menu

Page 14: Introduction to Xaira Part One: All about Xaira Andrew Hardie

The Tools Menu

Tools for preparing your corpus and its header

Tools for telling Xaira how to handle the XML markup in your corpus

The indexer itself

Page 15: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Scared?

Using Xaira-tools to prepare a corpus manually can be a bit complex

Instructions: http://www.oucs.ox.ac.uk/rts/xaira/Doc/

But don’t despair – there is a wizard! File >> Index Wizard

Page 16: Introduction to Xaira Part One: All about Xaira Andrew Hardie

The index wizard

Page 17: Introduction to Xaira Part One: All about Xaira Andrew Hardie

The index wizard

Page 18: Introduction to Xaira Part One: All about Xaira Andrew Hardie

The index wizard

Page 19: Introduction to Xaira Part One: All about Xaira Andrew Hardie

Live Indexing!