machine translation, digital libraries, and the computing research laboratory indo-us workshop on...
TRANSCRIPT
Machine Translation, Digital Libraries, and the Computing
Research Laboratory
Indo-US Workshop on Digital Libraries
June 23, 2003
The Computing Research Laboratory (CRL)
New Mexico State University
Las Cruces, New Mexico
http://crl.nmsu.edu
Stephen Helmreich
(505) 646-2141
Machine Translation (MT)
• Component technologies
• Comparable technologies
• Composed technologies
MT--Purposes
• Dissemination (high quality) sublanguages, controlled languages
• Assimilation (broad coverage)
• Communication (speed)
MT -- Types
• Direct – string-for-string
• Transfer – structure-for-structure
• Interlingual – to and from a meaning representation
• Statistical – most probable translation given a corpus
Component technologies -- I
• Character encoding and representation, text editing (Unicode)
• Text segmenting (OCR, sandhi?)
• Morphological analysis
• Lexical annotation (part of speech tagging, proper name identification, others)
Component technologies -- II
• Syntactic analyzers (grammars, parsers)
• Bilingual/multilingual dictionaries
• Ontologies (WordNet, OntoSem, Cyc)(lexical, linguistic, world-knowledge)
• Generation systems
Comparable technologies
• Information Retrieval (IE) (URSA)
• Information Extraction (IR) (MUC)
• Text Summarization (DUC)
• Word Sense Disambiguation (SensEval)
• Cross-Document Named Entity Identification (Coreference Resolution)
Composed Technologies
• All of the above (IR/IE/Summarization)
• multi-lingual
• multi-modal
• with attention to human-computer interaction (HCI)
Composed technologies -- II
• Personal Profiler – searches the web to find information about a particular person, translates it if appropriate, and organizes in temporal order
• Quick Ramp-up MT (Expedition) – allows a non-linguist language user and a computer expert to construct a simple MT system
Question-Answering Systems
• Advanced Question and Answering for Intelligence (AQUAINT)
• MOQA – Meaning-Oriented Question Answering
• Allows user to pose structured or natural language queries, obtains answer from a variety of sources, and presents the answer appropriately
Summary
• Choose an appropriate purpose and type
• Look at related technologies: component, comparable, composed
• Search for an appropriate research partner