natural language processing tools lê Đức trọng 1
TRANSCRIPT
![Page 1: Natural language processing tools Lê Đức Trọng 1](https://reader036.vdocuments.net/reader036/viewer/2022082505/56649f165503460f94c2bb40/html5/thumbnails/1.jpg)
Natural language processing toolsLê Đức Trọng
1
![Page 2: Natural language processing tools Lê Đức Trọng 1](https://reader036.vdocuments.net/reader036/viewer/2022082505/56649f165503460f94c2bb40/html5/thumbnails/2.jpg)
Crawler and Parser tools• Crawler tools:• Crawler 4j: http://code.google.com/p/crawler4j/• httpClient: http://hc.apache.org/httpclient-3.x/
• Parser tools:• htmlParser: http://htmlparser.sourceforge.net/• Jsoup html parser: http://jsoup.org/• Neko html parser: http://nekohtml.sourceforge.net/
2
![Page 3: Natural language processing tools Lê Đức Trọng 1](https://reader036.vdocuments.net/reader036/viewer/2022082505/56649f165503460f94c2bb40/html5/thumbnails/3.jpg)
Vietnamese NLP – Tools• JVnTextPro: http://sourceforge.net/projects/jvntextpro/• Sentence Segmentation, Sentence Tokenization, Word
Segmentation, POS-Tagging• VnToolkit: http://www.loria.fr/~lehong/softwares.php• An automatic tagger for Vietnamese texts • A tokenize for automatic word segmentation of Vietnamese texts • A sentence detector for automatic detecting sentences of
Vietnamese texts • VLSP Tools:
http://vlsp.vietlp.org:8080/demo/?page=resources• Vietnamese Chunking
3
![Page 4: Natural language processing tools Lê Đức Trọng 1](https://reader036.vdocuments.net/reader036/viewer/2022082505/56649f165503460f94c2bb40/html5/thumbnails/4.jpg)
NLP Toolkits• LingPipe: http://alias-i.com/lingpipe/• Find the names of people, organizations or locations in news• Automatically classify Twitter search results into categories• Suggest correct spellings of queries
• Mallet - Machine Learning for Language Toolkit: http://mallet.cs.umass.edu/• Statistic, document classification, clustering, topic modeling, information
extraction• Stanford NLP softwares: http://www-nlp.stanford.edu/software/• Word segmentation, part-of-speech tagging, named entity recognition,
chunking, parsing, classification and coreference resolution• NLTK: http://www.nltk.org/• Open source Python modules, linguistic data and documentation for research
and development in natural language processing and text analytics.• OpenNLP: http://opennlp.apache.org/• Tokenization, sentence segmentation, part-of-speech tagging, named entity
extraction, chunking, parsing, and coreference resolution4
![Page 5: Natural language processing tools Lê Đức Trọng 1](https://reader036.vdocuments.net/reader036/viewer/2022082505/56649f165503460f94c2bb40/html5/thumbnails/5.jpg)
Machine learning libraries• Conditional random fields (CRF)• CRF: http://crf.sourceforge.net/
• Maximum entropy (Maxent)• OpenNLP, Mallet
• Support vector machine (SVM)• libSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm/• svmLight: http://svmlight.joachims.org/
5