terminology extraction tekom 2008 - tekom e.v
TRANSCRIPT
Agenda
What is terminology extraction?What are the complications and difficulties associated with terminology extraction?Comparison of different extraction strategy:
ManualConcordanceStatisticalLinguistic
File formats for extraction and export
Terminology workAfter using translation memory systems and sentence level recycling, now terminology comes into the focus for possible savings.Source language text needs to become even more consistent
More accuracy in translationFaster translation
But … terminology work is WORKnever-ending (new technologies, more languages)ever-changing (mergers, marketing innovations)time-consuming / resource-intensive
Automation???
Monolingual Extraction
Extraction of terms from documents in one language.
Creation of term lists…Important terms
Who defines what is important?How can a tool “know”, what is important?
Frequent termsWhat is frequent? 3 times / 10 times…Are frequent terms also important?
New termsAccording to whose level of subject matter knowledge?Compared to which term list / term database?
Bilingual Extraction
Term extraction from bilingual sources like translation memory files or bilingual translation files
Creation of parallel lists of terms and their translation(s)
All forms of the term and all its translationsOnly basic formMost frequent translation of source term
This is, how a text looks to a statistical extraction tool…
Vot gnig harengoga fuor tok gnig nor shewerginhatz. Mirhon bortup tip trewshu gnig batbo loqtet. Bortup ter, bortup nofdas, semsel nih furpo ayano bliktreptat. Mirhon granbevtrov driktopret grig go wasbrekit mut mirkep taptro gnig suf. Aktrep zitpek nitnit bortup mil. Setrimb ak troptan bur metlatkento.
Term extraction issuesTerminology extraction is a highly individual process
Goal of extraction, subject matter expertise, available time
Tools use different methods for terminology extraction
Concordance, statistics, linguisticsTools support different file formats for extraction and export
Monolingual, bilingual, export formatsTools sometimes don’t show the context from which the term was taken
Term Extraction Tools
Assistance for manual extractionConcordance tools
Extraction of all term combinationsStatistical extraction tools
Frequent termsAll languages
Linguistic extraction toolsExtraction of noun phrases…Supported languages only
Manual Extraction
Human reads the text, understands the meaning and selects terms (or term pairs) according to previous knowledge of the subject matter and/or the goal for the extraction.
List of standard termsList of company termsList of new termsAdditional information like source, context example…
Tools assisting manual extraction
Tools that connect to an editor and allow the collection of terms or term pairs
Translation memory tools that save terms and term pairs directly into the term database component
Term checking tools that report missing terms / translations
Manual extraction
Time consumingResource intensiveSubject matter and language expertise requiredMost accurate regarding the goalIndividual goals can be set
Concordance Tools
Automatic creation of a list of all terms and term combinations from a documentNo term is missedLong list of termsManual selection process necessary
TM tool
Term can consist of up to X wordsTerms that already exist in the database are not extractedExtraction from all files of a project (various file formats)
Extraction with TM tool
Export of term list for translationExport of term list to term database
Statistical Extraction ToolMonolingual and bilingual extractionTerms that occur more than X times are extractedList of frequent terms – frequent terms are seen as importantImportant terms / new terms that appear in this document less than X times are not extractedCan be used for any languageList of term candidates must be checked by a human with subject matter and language expertise
Terminology Tool of TM Suite
Settings for number of words per termSettings for frequency
Linguistic Extraction Tool
Tool knows about the structure of the languageExtracted terms can be reduced to their basic from with the help of dictionaries and rulesUser can define the rules used for extractionExtraction limited to supported languages
Linguistic Settings
Extraction according to specific rules of the languageFrequency settings
Bilingual Extraction Results SDL PhraseFinder
Translations of terms come from the extraction files and internal dictionariesEach term is shown with its context and a grammatical analysisResults of extraction
List of one-word termsList of multi-word termsList of context sentences
Export and view can be filtered
File Formats
TM tools extract from every file format they supportConcordance tools are usually limited to text or Word RTF files, maybe also HTMLBilingual extraction can be produced from bilingual file formats like translation memories, project files of a TM tool or bilingual translation files, but not from two separate filesExport usually in Excel, tab-delimited TXT or directly into the terminology component
ConclusionNo one tool can do what a human can do, but depending on the goal, the tools can help to automate repetitive tasks and comparisons with stop word lists and/or term bases
Concordance tools extract all words and provide filter and search settings for the view of the term listStatistical tools offer settings for frequencies, term length and comparison with stop word lists or existing term lists / term databases Linguistic tools can be customized by rules for the extraction, which could be different for various languages and use language-specific dictionaries
Some Terminology Extraction ToolsConcordance tools
Simple Concordance Program (SCP), http://www.textworld.com/scp/ExtPhr32, http://publish.uwo.ca/~craven/freeware.htm
Term extraction tools / components of translation memory toolsStatistical Extraction
MultiTerm Extract, Déjà Vu Lexicon, Heartsome Dictionary Editor, acrossTermiDOG (www.dog-gmbh.de), Chamblon Terminology Extractor (http://www.chamblon.com/terminologyextractor.htm)…
Linguistic ExtractionSynthema Terminology Wizard
(http://www.synthema.it/english/servizi/traduzioni.html), SDL PhraseFinder…
30