arabic nlp: challenges & opportunities
DESCRIPTION
Arabic NLP: Challenges & Opportunities. Dr. Samir Tartir Scientific Day Faculty of Information Philadelphia University May 15 th 2013. ثمن. علم. قِ. General Information. History (Classical) Arabic has remained unchanged, intelligible and functional for more than fifteen centuries. - PowerPoint PPT PresentationTRANSCRIPT
Arabic NLP: Challenges & Opportunities
Dr. Samir Tartir
Scientific DayFaculty of InformationPhiladelphia University
May 15th 2013
ثمن
علم
ق
General Information
• History– (Classical) Arabic has remained unchanged, intelligible
and functional for more than fifteen centuries.• Strategically important
– 330 million speakers living in an important region• huge oil reserves, sacred sites.
– 1.4 billion Muslims use in their prayers.• Cultural and literary heritage
– Closely associated with Islam
Distribution
Versions
• Classical• Modern• Dialects
Arabic Language Characteristics
• Highly structured• Highly derivational language
– Morphology• Free word order• Modern Arabic lacks diacritics (short vowels)
Example*
*Microsoft Arabic NLP Toolkit (ATK) For Academia in the Arab World Presentation, 11/2012
Arabic Language Characteristics
• Synonymy and confusion of non-standardized terms– Thermometer: ميزان حرارة، مقياس محرار، محر،
ترمومتر حرارة،• Technical translation
– Hydrometer: السوائل كثافة قياس جهاز• Uncle, parent…
Letters
• One letter, one sound• Letters change shape• Hamza• No capital letters• Can use normalization
Ambiguity• Homographs
– قدم• Internal word structure ambiguity
– بعقوبة• Syntactic ambiguity
– الجديد البنك مدير قابلت• Semantic ambiguity
– ابراهيم من اكثر احمد علي يحب• Anaphoric ambiguity
– انتقده الذي الوزير الصحفي قابل
NLP• Automatic summarization• Machine translation• Named entity recognition
(NER)• Natural language
generation• Natural language
understanding• Optical character
recognition (OCR)
• Question answering• Sentiment analysis• Speech recognition• Word sense disambiguation• Information retrieval (IR)• Speech processing• Text-to-speech• Natural language search• Automated essay scoring• etc
Question Answering**
Hammo et al. QARAB: A Question Answering System to Support the Arabic Language. Workshop on Computational Approaches to Semitic Languages. ACL 2002
Arabic NLP Issues
• Lack of tools• Lack of linguistic references• Lack of training data
Available Tools
• Arabic Treebank• Arabic WordNet
– MySQL database– SUMO Ontology– Java
• Microsoft Arabic Toolkit (ATK)
Summary
• Arabic is difficult to deal with• Progress has been made• More work is done on different parts• Any progress is valuable
– Business– Personal– Governmental
Thank you