challenges in building nlp applications in nepali language
DESCRIPTION
This presentation gives an overview of challenges in building Natual Language Processing for Nepali Language and why python is good for NLP developments.TRANSCRIPT
![Page 1: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/1.jpg)
CHALLENGES IN BUILDING NATURAL LANGUAGE PROCESSING
APPLICATIONS FOR !पाली LANGUAGE
- Chandan Goopta
Unicode number: U+0915 HTML-code: क
![Page 2: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/2.jpg)
NATURAL LANGUAGE PROCESSING
![Page 3: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/3.jpg)
NLP Task English Indic Languages Nepali
Machine Translation Very Good Good Very Poor(Google/M$)
Named Entity Recognition Very Good Fair None
(Few Ground work)
Optical Character Recognition Very Good Poor Very Poor
POS Tagging Good Poor Very Poor
Sentiment Analysis Very Good Fair Poor (works on-going)
Speech Recognition Good Poor None (Google’s on-work)
What So Far?
![Page 4: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/4.jpg)
![Page 5: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/5.jpg)
SENTIMENT ANALYSIS
• Chunking | Sentence Chunker
• Tagging | POS Tagger
• Resources | SentiWordNet, Subjectivity WordList
• Machine Learning | Corpus, Tagged Samples
![Page 6: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/6.jpg)
Build Everything from Scratch
![Page 7: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/7.jpg)
OR
I CAN USE ENGLISH LANGUAGE RESOURCES FOR NEPALI
![Page 8: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/8.jpg)
SENTIMENT ANALYSIS
• Chunking | Sentence Chunker
• Tagging | POS Tagger
• Resources | SentiWordNet, Subjectivity WordList
• Machine Learning | Corpus, Tagged Samples
![Page 9: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/9.jpg)
I am like Others are Like Professors are Like
![Page 10: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/10.jpg)
BACK TO CHALLENGES
• Unicode Rendering in Dev-tools
• Lack of Resources
• Very Less Previous Works/Research
![Page 11: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/11.jpg)
WHY PYTHON?
![Page 12: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/12.jpg)
–Prof. James A. HendlerUniversity of Maryland
“I have the students learn Python in our undergraduate and graduate Semantic Web
courses. Why? Because basically there's nothing else with the flexibility and as many web
libraries”
![Page 13: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/13.jpg)
WHY PYTHON?
• NLTK, although not the most efficient implementation, provides a lot of awesome tools to quickly prototype a hypothesis
Source: Quora
![Page 14: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/14.jpg)
WHY PYTHON?
• Scipy + Numpy: Everything that isn't in NLTK is definitely in these libraries. If you want to use more advanced algorithms like Latent Semantic Indexing or Latent Dirichlet Allocation, Python has libraries to do that.
Source: Quora
![Page 15: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/15.jpg)
WHY PYTHON?
• Python has really great XML/HTML parsing libraries such as Beautiful Soup and Scrape.py. You can use these libraries to quickly scrape the web and generate large data sets to improve the performance of your models (because lets face it, big data trumps complexity)
Source: Quora
![Page 16: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/16.jpg)
WHY PYTHON?
• Python has great web-frameworks like Django/Pylons/Tornado. If you invent a revolutionary sarcasm detector that can predict trends in the stock market, you can quickly integrated it into a web service, make millions, and buy a large island in a third-world country.
Source: Quora
![Page 17: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/17.jpg)
WHY PYTHON?
• Consider your other options: It would not make sense to use a compiled language like C++/Java for this type of work unless you needed to increase performance (computational speed, not model accuracy). As far as I can tell, Ruby is completely useless for any Machine Learning, Data Mining, or Natural Language Processing task. Maybe you could use Lisp, but at this point, Python has a larger eco-system.
Source: Quora
![Page 18: Challenges in Building NLP Applications in Nepali Language](https://reader034.vdocuments.net/reader034/viewer/2022051411/540d74728d7f728d7e8b494d/html5/thumbnails/18.jpg)
THANK YOU