**junk** (no subject)
TRANSCRIPT
ARASU ENGINEERING COLLEGE
R.MUTHU KUMARAN (II-CSE)R.PANNEER SELVAM (II-ECE)
AUTHORS,
NATURAL LANGUAGE PROCESSING (NLP) TAMIL - HINDI CONVERSION
Now a days the information is available electronically. Indeed, there has been an explosion of text and multimedia content on the World Wide Web. For many people, a large and growing fraction of work and leisure time is spent navigating and accessing this universe of information. The presence of so much text in electronic form is a huge challenge to NLP.
The Universal Networking Language (UNL) is an electronic language in the form of a semantic network that act as an intermediate representation to express and exchange every kind of information . The UNL represents information, i.e. meaning, sentence-by-sentence. Sentence information is represented as a hyper-graph having Universal Words (UWs) as nodes and relations as arcs.
INTRODUCTIONINTRODUCTION
The text – once converted into UNL – can be converted to many different languages . For example, once a home page is expressed in UNL, it can be read in a variety of natural languages.
The meaning representation is directly available for retrieval and indexing mechanisms and tools for automatic summarization and knowledge extraction and it will be converted to a natural language only when communicating with a human user.
UNL greatly reduces the cost of developing knowledge or contents necessary for knowledge processing, by sharing knowledge and contents. Furthermore, if the type of knowledge required for doing some task is described in a language.
UNL, the software only needs to interpret unambiguous intermediate instructions written in the language to be able to perform its functions.
UNL FEATURESUNL FEATURES
UNL
TAMIL
HINDI
FRENCH
RUSSIAN
ENCONVERSION
DECONVERSION
EnconverterAnalysis
RulesDictionary
W WW W W
ni ni+1 ni+2Node List
V
TM
N
GM
Node-net
ni-1 ni+3
Currently we have many analysis for language conversion:
Aspects Model Standard Theory
Extended Standard Theory (EST)
ASPECTS MODEL STANDARD THEORYASPECTS MODEL STANDARD THEORY
It was in the Aspects of the Theory of Syntax nouns are chosen on the basis of context free rules ; verbs are then chosen on the basis of context sensitive rules, which are the terms to express the lexical features. Since nouns are the first words to be chosen, they are identified by lexical features only. Verbs and adjectives require additional features to indicate the environments in which they can appear. Aspects of grammar was organized into three major components:
EXTENDED STANDARD THEORYEXTENDED STANDARD THEORY
Ray Jackendoff offered a substantial criticism to the Standard Theory and showed that surface structure played a much more important role in semantic interpretation than the Deep structure. Here the partial representation of meaning is determined by grammatical structure. The derivation of logical form proceeds step by step which is determined by a derivational process analogous to those of syntax and phonology.
ADVANTAGESADVANTAGES
Developing Machine Translation (MT) systems between Tamil and other languages particularly English and Hindi
Building lexical resources in Tamil that are essential for researchers and developers
Developing basic tools for computational work in Tamil, such as morph analyzer, Part-Of-Speech (POS) tagger etc.
Application of NLP tools for Information Extraction from domain specific texts so as to build Information Extraction systems for various domains such as medicine, agriculture etc.
The choice of Tamil-Hindi MAT is because, both are Free word-order languages unlike English which is a positional language. Ultimately our aim is to built a Human Aided Machine Translation System for Hindi-Tamil. A MT system basically has three major components.
TAMIL-HINDI SYSTEMTAMIL-HINDI SYSTEM
Tamil WordTamil Word MAMA
GeneratorGeneratorHindi WordHindi Word
Mapping UnitMapping UnitTamil to Hindi TranslationTamil to Hindi Translation
Morphological Analyser (MA)Morphological Analyser (MA) SplitingSpliting
WordWord WordWord WordWord
MorphonsMorphons MorphonsMorphons
Tamil SentenceTamil Sentence
MorphonsMorphons
Root wordRoot word Help wordHelp word
Tense makerTense maker GNP makerGNP maker VibakthiVibakthi
Example : “ ”
DictionaryDictionary
MorphonsMorphons
ConvertionConvertion
GeneratorGenerator
Mapping UnitMapping Unit
GeneratorsGenerators
Root wordRoot word Help wordHelp word
WordWord WordWord WordWord
SentenceSentence
In this paper the development of Tamil – Hindi Translation is described. In Tamil most information for generating sentence from UNL structure is tackled in morphological and syntactical level.
The humble one could potentially alleviate for the most pressing issues of the NLP. The application of NLP is vast like ocean. We see a little drop of that ocean. In the feature NLP helps to comfortably communicate with computer.
CONCLUSIONCONCLUSION
Morphological Analysis
Semantic Analysis
Natural Access to Internet & Other ResourcesHeadline GenerationHeadline TranslationDocument TranslationMultilingual Multi document Summarization
Cross-lingual Information ManagementMultilingual and Cross-lingual IROpen Domain Question Answering
Name the component: Morphological Analyzer For Tamil Morphological Analyzer For Hindi
( would like to collaborate with consortium)
The performance of these techniques in other languages?Kimmo Analyser –95% English
For Tamil Morphological Analyser : Present Performance is 92%
1st Year : 96% 2nd Year : 98-99%
Language pair: Tamil –Hindi
Name the component: POS Tagger
The performance of these techniques in other languages.English Brills Tagger 99%
Tamil: Present Performance: 90+%• 1st Year : 96%; • 2nd Year : 98-99%
Language pair: Tamil –Hindi
Evaluation metrics in addition to the domain: Precision and Recall
Name the component: : NP Chunker
The performance of these techniques in other languages:FnTBL 98%
Tamil : Present Performance: 94+%1st Year : 96%;
2nd Year : 98-99%
Language pair: Tamil –Hindi
Name the domain for which the performance will be optimized:Crime/ Tourism
Name other evaluation metrics in addition to the domain: Precision and Recall
Name the component: :Transfer Grammar Component
The performance of these techniques in other languages?NA
Tamil : Present Performance: 50% 1st Year : 90%;
2nd Year : 95 and above
Language pair: Tamil –Hindi
Name other evaluation metrics in addition to the domain: Precision and Recall
Name the component: : Word Generator and Local Language Splitter for Target Language
Present Performance: 50%
1st Year : 90%; 2nd Year : 95 and above
Language pair: Tamil –Hindi
Name other evaluation metrics in addition to the domain: Precision, Recall and F measure
Name the lexical resource: Hindi- Tamil Bilingual Dictionary
The final size of the lexical resource?30,000 root word
The average size of such a resource in other languages20, 000 root words
1st Year 15,000 root words2nd Year15, 000 root words
Language pair: Tamil -Hindi