types of machine translation
TRANSCRIPT
Drop me a mail: Drop me a mail: [email protected] me at: Visit me at: http://rushdishams.googlepages.com
1Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh
Translation Approach The translation process may be stated as:
1. Decoding the meaning of the source text2. Re-encoding this meaning in the target
language. Machine translation can use a method
based on linguistic rules- words will be translated in a linguistic way the most suitable words of the target language
will replace the ones in the source language.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 2
Translation Approach The success of machine translation requires
the problem of natural language understanding to be solved first.
Generally, rule-based methods parse a text, usually creating an intermediary, symbolic
representation, from which the text in the target language is
generated.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 3
Translation Approach According to the nature of the intermediary
representation, an approach is described as interlingual machine translation or transfer-based machine translation.
These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 4
Translation Approach Machine translation programs often work
well enough for a native speaker of one language to get the
approximate meaning of what is written by the other native speaker.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 5
Translation Approach the large multilingual corpus of data needed
for statistical methods to work is not necessary for the grammar-based methods.
But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 6
Types of Machine Translation
Text Generation
Syntactic Parsing
Semantic Analysis
Sentence Planning
Source (Arabic)
Target(English)
Transfer Rules
Direct: SMT, EBMT
Interlingua
Rule based MT The rule-based machine translation
paradigm includes 1. transfer-based machine translation, 2. interlingual machine translation and 3. dictionary-based machine translation
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 8
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 9
Transfer based MT Itis necessary to have an intermediate
representation that captures the "meaning" of the original sentence in order to generate the correct translation
In interlingua-based MT this intermediate representation must be independent of the languages in question, whereas in transfer-based MT, it has some dependence on the language pair involved.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 10
Transfer based MT The original text is first analyzed
morphologically and syntactically
in order to obtain a syntactic representation.
This representation can then be refined to a more abstract level putting emphasis on the parts relevant for translation and ignoring other types of information.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 11
Transfer based MT The transfer process then converts this final
representation (still in the original language) to a representation of the same level of abstraction in the target language.
These two representations are referred to as "intermediate" representations.
From the target language representation, the stages are then applied in reverse.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 12
Transfer based MT
Transformation process Morphological analysis
Surface forms of the input text are classified as○ to part-of-speech (e.g. noun, verb, etc.) and ○ sub-category (number, gender, tense, etc.)
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 14
Transformation process Lexical categorization
In any given text some of the words may have more than one meaning, causing ambiguity in analysis.
Lexical categorization looks at the context of a word to try and determine the correct meaning in the context of the input.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 15
Transformation process Lexical transfer
This is basically dictionary translationthe source language lemma (perhaps with sense
information) is looked up in a bilingual dictionary and the translation is chosen.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 16
Transformation process Structural transfer
While the previous stages deal with words, this stage deals with larger constituents
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 17
Transformation process Morphological generation
From the output of the structural transfer stage, the target language surface forms are generated.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 18
Transfer Types Superficial transfer (or syntactic)
This level is characterized by transferring "syntactic structures" between the source and target languages.
It is suitable for languages in the same family or of the same type.
for example in the Romance languages between Spanish, Catalan, French, Italian, etc.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 19
Transfer Types Deep transfer (or semantic)
This level constructs a semantic representation that is dependent on the source language.
This representation can consist of a series of structures which represent the meaning.
In these transfer systems predicates are typically produced. The translation also typically requires structural transfer. This level is used to translate between more distantly related
languages (e.g. Spanish-English or Spanish-Basque, etc.)
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 20
Dependency Grammar
Case Grammar
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 23
Interlingual MT the source language, i.e. the text to be
translated is transformed into an interlingua, i.e., an abstract language-independent representation.
The target language is then generated from the interlingua.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 24
Interlingual MT In the direct approach, words are translated
directly without passing through an additional representation.
In the transfer approach the source language is transformed into an abstract, less language-specific representation.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 25
Interlingual MT
Advantage and disadvantage The advantage in multilingual machine
translations is that no transfer component has to be created for each language pair
The obvious disadvantage is that the definition of an interlingua is difficult and maybe even impossible for a wider domain.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 27
Components Dictionaries for analysis and generation A conceptual lexicon, which is
the knowledge base about events and entities known in the domain.
A set of projection rules (specific to the domain and the languages).
Grammars for the analysis and generation of the languages involved.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 28
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 29
Dictionary-based MT The words will be translated as a dictionary does
— word by word, usually without much correlation of meaning between them
Dictionary lookups may be done with or without morphological analysis or lemmatisation
used to expedite manual translation, if the person carrying it out is fluent in both languages and therefore capable of correcting syntax and grammar.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 30
Dictionary-based MT
Dictionary-based MT
Example-based MT
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 33
Example-based MT characterized by its use of a bilingual corpus
with parallel texts as its main knowledge base
It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning
Example-based MT characterized by its use of a bilingual corpus
with parallel texts as its main knowledge base
It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning
Example-based MT
Example-based MT bilingual parallel corpora contain sentence
pairs like the example shown in the table. How much is that X ? corresponds to Ano X
wa ikura desu ka. red umbrella corresponds to akai kasa small camera corresponds to chiisai kamera
Example-based MT President Kennedy was shot dead during the
parade. and The convict escaped on July 15th. We could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences.
Statistical MT
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 39
Statistical MT
The idea behind statistical machine translation comes from information theory.
A document is translated according to the probability distribution p(e | f) that a string e in the target language (for example, English) is the translation of a string f in the source language (for example, French).
Statistical MT
The problem of modeling the probability distribution p(e | f) has been approached in a number of ways. One intuitive approach is to apply Bayes Theorem
where the translation model p(f | e) is the probability that the source string is the translation of the target string, and the language model p(e) is the probability of seeing that target language string string.
Statistical MT Finding the best translation is done by picking up
the one that gives the highest probability