arabic nlp: challenges & opportunities

Post on 24-Feb-2016

50 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Arabic NLP: Challenges & Opportunities. Dr. Samir Tartir Scientific Day Faculty of Information Philadelphia University May 15 th 2013. ثمن. علم. قِ. General Information. History (Classical) Arabic has remained unchanged, intelligible and functional for more than fifteen centuries. - PowerPoint PPT Presentation

TRANSCRIPT

Arabic NLP: Challenges & Opportunities

Dr. Samir Tartir

Scientific DayFaculty of InformationPhiladelphia University

May 15th 2013

ثمن

علم

ق

General Information

• History– (Classical) Arabic has remained unchanged, intelligible

and functional for more than fifteen centuries.• Strategically important

– 330 million speakers living in an important region• huge oil reserves, sacred sites.

– 1.4 billion Muslims use in their prayers.• Cultural and literary heritage

– Closely associated with Islam

Distribution

Versions

• Classical• Modern• Dialects

Arabic Language Characteristics

• Highly structured• Highly derivational language

– Morphology• Free word order• Modern Arabic lacks diacritics (short vowels)

Example*

*Microsoft Arabic NLP Toolkit (ATK) For Academia in the Arab World Presentation, 11/2012

Arabic Language Characteristics

• Synonymy and confusion of non-standardized terms– Thermometer: ميزان حرارة، مقياس محرار، محر،

ترمومتر حرارة،• Technical translation

– Hydrometer: السوائل كثافة قياس جهاز• Uncle, parent…

Letters

• One letter, one sound• Letters change shape• Hamza• No capital letters• Can use normalization

Ambiguity• Homographs

– قدم• Internal word structure ambiguity

– بعقوبة• Syntactic ambiguity

– الجديد البنك مدير قابلت• Semantic ambiguity

– ابراهيم من اكثر احمد علي يحب• Anaphoric ambiguity

– انتقده الذي الوزير الصحفي قابل

NLP• Automatic summarization• Machine translation• Named entity recognition

(NER)• Natural language

generation• Natural language

understanding• Optical character

recognition (OCR)

• Question answering• Sentiment analysis• Speech recognition• Word sense disambiguation• Information retrieval (IR)• Speech processing• Text-to-speech• Natural language search• Automated essay scoring• etc

Question Answering**

Hammo et al. QARAB: A Question Answering System to Support the Arabic Language. Workshop on Computational Approaches to Semitic Languages. ACL 2002

Arabic NLP Issues

• Lack of tools• Lack of linguistic references• Lack of training data

Available Tools

• Arabic Treebank• Arabic WordNet

– MySQL database– SUMO Ontology– Java

• Microsoft Arabic Toolkit (ATK)

Summary

• Arabic is difficult to deal with• Progress has been made• More work is done on different parts• Any progress is valuable

– Business– Personal– Governmental

Thank you

top related