Transcript
Page 1: Arabic NLP: Challenges & Opportunities

Arabic NLP: Challenges & Opportunities

Dr. Samir Tartir

Scientific DayFaculty of InformationPhiladelphia University

May 15th 2013

Page 2: Arabic NLP: Challenges & Opportunities

ثمن

Page 3: Arabic NLP: Challenges & Opportunities

علم

Page 4: Arabic NLP: Challenges & Opportunities

ق

Page 5: Arabic NLP: Challenges & Opportunities

General Information

• History– (Classical) Arabic has remained unchanged, intelligible

and functional for more than fifteen centuries.• Strategically important

– 330 million speakers living in an important region• huge oil reserves, sacred sites.

– 1.4 billion Muslims use in their prayers.• Cultural and literary heritage

– Closely associated with Islam

Page 6: Arabic NLP: Challenges & Opportunities

Distribution

Page 7: Arabic NLP: Challenges & Opportunities

Versions

• Classical• Modern• Dialects

Page 8: Arabic NLP: Challenges & Opportunities

Arabic Language Characteristics

• Highly structured• Highly derivational language

– Morphology• Free word order• Modern Arabic lacks diacritics (short vowels)

Page 9: Arabic NLP: Challenges & Opportunities

Example*

*Microsoft Arabic NLP Toolkit (ATK) For Academia in the Arab World Presentation, 11/2012

Page 10: Arabic NLP: Challenges & Opportunities

Arabic Language Characteristics

• Synonymy and confusion of non-standardized terms– Thermometer: ميزان حرارة، مقياس محرار، محر،

ترمومتر حرارة،• Technical translation

– Hydrometer: السوائل كثافة قياس جهاز• Uncle, parent…

Page 11: Arabic NLP: Challenges & Opportunities

Letters

• One letter, one sound• Letters change shape• Hamza• No capital letters• Can use normalization

Page 12: Arabic NLP: Challenges & Opportunities

Ambiguity• Homographs

– قدم• Internal word structure ambiguity

– بعقوبة• Syntactic ambiguity

– الجديد البنك مدير قابلت• Semantic ambiguity

– ابراهيم من اكثر احمد علي يحب• Anaphoric ambiguity

– انتقده الذي الوزير الصحفي قابل

Page 13: Arabic NLP: Challenges & Opportunities

NLP• Automatic summarization• Machine translation• Named entity recognition

(NER)• Natural language

generation• Natural language

understanding• Optical character

recognition (OCR)

• Question answering• Sentiment analysis• Speech recognition• Word sense disambiguation• Information retrieval (IR)• Speech processing• Text-to-speech• Natural language search• Automated essay scoring• etc

Page 14: Arabic NLP: Challenges & Opportunities

Question Answering**

Hammo et al. QARAB: A Question Answering System to Support the Arabic Language. Workshop on Computational Approaches to Semitic Languages. ACL 2002

Page 15: Arabic NLP: Challenges & Opportunities

Arabic NLP Issues

• Lack of tools• Lack of linguistic references• Lack of training data

Page 16: Arabic NLP: Challenges & Opportunities

Available Tools

• Arabic Treebank• Arabic WordNet

– MySQL database– SUMO Ontology– Java

• Microsoft Arabic Toolkit (ATK)

Page 17: Arabic NLP: Challenges & Opportunities

Summary

• Arabic is difficult to deal with• Progress has been made• More work is done on different parts• Any progress is valuable

– Business– Personal– Governmental

Page 18: Arabic NLP: Challenges & Opportunities

Thank you


Top Related