introduction to xaira part one: all about xaira andrew hardie
TRANSCRIPT
Introduction to XairaPart One: All about Xaira
Andrew Hardie
What is Xaira?
XML Aware Indexing and Retrieval Architecture
The XML-aware version of SARA for the BNC corpus
Several programs, including the Index Toolkit and the Client
How do you pronounce “Xaira”?
Its designers pronounce it like “Sarah”
We pronounce it like “Zirah”
Other pronunciations may vary
Why are we talking about it?
Andrew and Richard have been beta-testers for Xaira for several years
Andrew wrote the help file
What sort of program is Xaira?
Xaira is an analysis program for indexed corpora
Searching indexed vs. non-indexed corpora Indexing – retrieval Xaira does both
Indexing
Retrieval
Xaira contains
The Indexer itself
Xaira-tools “Easy” user interface for corpus set-up
and using the indexer
The Xaira “client” Sophisticated corpus analysis system Wordlist, concordance, collocation Structured searching
Client, server?
Why does Xaira describe itself as a client?
Xaira splits the work between… one program that you use to build the search
(the client), and one program that actually looks in the index
and finds the solutions (the server)
But you can just use the client like any concordancer software the user never deals directly with the server
What is special about Xaira?
Xaira is based on XML
XML is based on Unicode
Thus Xaira can be used with any language in any alphabet
But Xaira has been specially designed to aid multilingual analysis e.g. allows Unicode keyboard setup for any
language
Do I need a Unicode corpus?
Yes! (… but ASCII counts as valid UTF-8)
Both UTF-8 and UTF-16 are OK
(If in doubt, ask Andrew about variant text encodings)
Does my corpus need to be XML?
No!
Xaira can add basic XML to a corpus of plain-text files
Xaira can also upgrade SGML to XML
TEI XML is perfect for Xaira…
… warning: Xaira will reject ill-formed XML or SGML files.
First, index your corpus
Messages from the different tools appear here (you don’t need to worry about them)
Access the commands you need to set up and run the indexer from the Tools menu
The Tools Menu
Tools for preparing your corpus and its header
Tools for telling Xaira how to handle the XML markup in your corpus
The indexer itself
Scared?
Using Xaira-tools to prepare a corpus manually can be a bit complex
Instructions: http://www.oucs.ox.ac.uk/rts/xaira/Doc/
But don’t despair – there is a wizard! File >> Index Wizard
The index wizard
The index wizard
The index wizard
Live Indexing!