implementasi algoritma naÏve bayes classifier dalam...
TRANSCRIPT
i
IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER
DALAM KLASIFIKASI USER BERDASARKAN TWEET
TUGAS AKHIR
Sebagai Persyaratan Guna Meraih Gelar Sarjana Strata 1
Teknik Informatika Universitas Muhammadiyah Malang
Disusun Oleh:
Lalu Taqi Mustaqim
09560301
JURUSAN TEKNIK INFORMATIKA
FAKULTAS TEKNIK
UNIVERSITAS MUHAMMADIYAH MALANG
2015
KATA PENGANTAR
Alhamdulillahirrobbil’alamin, segala puji bagi Allah SWT Tuhan semesta alam yang telah
memberi hidayah dan rahmat-Nya sehingga penulis dapat menyelesaikan penelitian tugas akhir
dengan judul “IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM
KLASIFIKASI USER BERDASARKAN TWEET”.
Pada penelitian ini, penulis ingin mengklasifikasi user berdasarkan tweet untuk mencari opini
mining positif dan negatif yang terdapat pada tweet berita user yang meliputi berita politik,
entertainment, olahraga, maupun pendidikan.
Penulis menyadari bahwa penelitian ini masih jauh dari kesempurnaan. Oleh karena itu,
penulis mengharapkan kritik dan saran yang membangun. Akhir kata penulis mengucapkan terima
kasih kepada semua pihak yang telah membantu hingga tugas akhir ini terselesaikan.
Malang, April 2015
Lalu Taqi Mustaqim
ix
DAFTAR ISI
Halaman
HALAMAN JUDUL ........................................................................................ i
LEMBAR PERSETUJUAN ............................................................................. ii
LEMBAR PENGESAHAN .............................................................................. iii
LEMBAR PERNYATAAN .............................................................................. iv
ABSTRAK ........................................................................................................ v
ABSTRACT ....................................................................................................... vi
LEMBAR PERSEMBAHAN ........................................................................... vii
KATA PENGANTAR ...................................................................................... viii
DAFTAR ISI .................................................................................................... ix
DAFTAR GAMBAR ........................................................................................ x
DAFTAR TABEL ............................................................................................ xi
BAB I : PENDAHULUAN ............................................................................. 1
1.1 Latar Belakang ............................................................................................ 1
1.2 Rumusan Masalah ....................................................................................... 2
1.3 Tujuan ......................................................................................................... 3
1.4 Batasan Masalah ......................................................................................... 3
1.5 Metodologi .................................................................................................. 3
1.6 Sistematika Penulisan ................................................................................. 5
BAB II : LANDASAN TEORI ....................................................................... 7
2.1 Twitter ......................................................................................................... 7
2.2 Sentiment Analysis ..................................................................................... 9
2.3 Text Mining ................................................................................................ 10
2.3.1 Text Preprocessing ........................................................................ 11
2.3.2 Feature Selection .......................................................................... 11
2.4 Stemming .................................................................................................... 12
x
2.4.1 Morfologi ...................................................................................... 12
2.4.2 Proses Morfologi .......................................................................... 13
2.4.2.1 Afiksasi ............................................................................ 13
2.4.2.2 Sisipan (Infiks) ................................................................. 18
2.4.2.3 Akhiran (Sufiks) .............................................................. 18
2.4.2.4 Konfiks ............................................................................ 19
2.5 N-Gram ....................................................................................................... 19
2.6 Algoritma Confix-Stripping........................................................................ 20
2.6.1 Aturan Peluruhan Kata Dasar ....................................................... 21
2.7 Naïve Bayes Classifier ................................................................................ 23
BAB III : ANALISA DAN PERANCANGAN SISTEM ............................. 25
3.1 Analisa Sistem ............................................................................................ 25
3.1.1 Deskripsi Perangkat Lunak .......................................................... 31
3.1.2 Analisis Kebutuhan Sistem .......................................................... 32
3.1.3 Usecase Diagram ......................................................................... 33
3.2 Perancangan Sistem .................................................................................... 34
3.2.1 Activity Diagram Sistem ............................................................. 34
3.2.2 Sequence Diagram Sistem ........................................................... 38
3.2.3 Class Diagram .............................................................................. 40
3.2.4 Desain Database ........................................................................... 41
3.2.6 Desain Interface ........................................................................... 44
BAB IV : IMPLEMENTASI DAN PENGUJIAN ........................................ 46
4.1 Implementasi Sistem ................................................................................... 46
4.1.1 Implementasi Pengembangan Perangkat Keras ........................... 46
4.1.2 Implementasi Lingkungan Pengembangan Perangkat Lunak ...... 46
4.1.3 Implementasi Case Aplikasi ........................................................ 47
4.1.4 Key Dan Token Twitter Api ........................................................ 48
xi
4.2 Pengujian Sistem ........................................................................................ 49
4.2.1 Code Dan Screenshot Halaman Utama Pencarian Query ............ 50
4.2.2 Code Dan Screenshot Load Data Tweet (Twitter API) ............... 51
4.2.3 Code Dan Screenshot Klasifikasi User Berdasarkan Tweet ........ 53
4.2.4 Code Dan Screenshot Output Sentimen Positif Dan Negatif ...... 55
4.2.5 Code Dan Screenshot Klasifikasi Naïve Bayes Data Training .... 56
4.2.6 Pengujian Akurasi ........................................................................ 58
4.2.7 BlackBox Testing ........................................................................ 59
BAB V : KESIMPULAN DAN SARAN ....................................................... 63
5.1 Kesimpulan ................................................................................................. 63
5.2 Saran ........................................................................................................... 64
xii
xiii
DAFTAR GAMBAR
Halaman
Gambar 3.1 Flowchart Sistem .................................................................... 25
Gambar 3.2 Skema Proses Pengambilan Tweet ......................................... 26
Gambar 3.3 Urutan Proses Secara Umum .................................................. 32
Gambar 3.4 Usecase Diagram .................................................................... 33
Gambar 3.5 Activity Diagram Input Query Tweet ..................................... 34
Gambar 3.6 Activity Diagram Load Twitter API ....................................... 35
Gambar 3.7 Activity Diagram Klasifikasi User Berdasarkan Tweet ......... 36
Gambar 3.8 Activity Diagram Output Opini Positif Dan Negatif .............. 37
Gambar 3.9 Sequence Diagram Input Query .............................................. 38
Gambar 3.10 Sequence Diagram Load Twitter API ................................... 39
Gambar 3.11 Sequence Diagram Klasifikasi User Berdasarkan Tweet ..... 39
Gambar 3.12 Sequence Diagram Output Opini Positif Dan Negatif .......... 40
Gambar 3.13 Class Diagram ....................................................................... 41
Gambar 3.14 Interface Aplikasi .................................................................. 45
Gambar 4.1 Folder Dan File Code Aplikasi ............................................... 47
Gambar 4.2 Key Twitter API Yang Digunakan ......................................... 48
Gambar 4.3 Token Twitter API yang Digunakan ....................................... 49
Gambar 4.4 Code Halaman Utama Pencarian Query ................................. 50
Gambar 4.5 Screenshot Aplikasi dari Kode Gambar 4.4 ............................ 50
Gambar 4.6 Code Load Data Tweet (Twitter API) .................................... 52
Gambar 4.7 Screenshot Load Data tweet (Twitter API) ............................. 53
Gambar 4.8 Code Klasifikasi User Berdasarkan Tweet ............................. 54
Gambar 4.9 Code Output Positif Dan Negatif ............................................ 55
Gambar 4.10 Screenshot Aplikasi Dari Kode Gambar 4.8 dan 4.9 ............ 56
Gambar 4.11 Code Klasifikasi Naïve Bayes Data Training ....................... 57
xiv
Gambar 4.12 Screenshot Klasifikasi Naïve Bayes Data Training .............. 57
DAFTAR TABEL
Halaman
Tabel 2.1 Contoh Pemotongan N-Gram Berbasis Karakter ....................... 19
Tabel 2.2 Contoh Pemotongan N-Gram Berbasis Kata .............................. 20
Tabel 2.3 Kombinasi Prefix Dan Sufiks Yang Tidak Diperbolehkan ........ 20
Tabel 2.4 Aturan Peluruhan Kata Dasar ..................................................... 21
Tabel 2.5 Aturan Peluruhan Kata Dasar ..................................................... 22
Tabel 3.1 Contoh Sebagian Daftar Tabel Keyword .................................... 27
Tabel 3.2 Hasil Proses Text Preprocessing Yang Dijadikan Input ............. 28
Tabel 3.3 Kumpulan Stopword ................................................................... 28
Tabel 3.4 Hasil Proses Filtering .................................................................. 28
Tabel 3.5 Tabel Data Training (Dokumen Tweet) ..................................... 30
Tabel 3.6 Tabel Term (Pecah Perkata) ....................................................... 30
Tabel 3.7 Structure Tabel Realtime ............................................................ 42
Tabel 3.8 Structure Tabel Stopword ........................................................... 42
Tabel 3.9 Structure Tabel Kata Dasar ......................................................... 42
Tabel 3.10 Structure Tabel Keyword .......................................................... 43
Tabel 3.11 Structure Tabel Status ............................................................... 43
Tabel 3.12 Structure Tabel Term ................................................................ 44
Tabel 3.11 Structure Tabel Training ........................................................... 44
Tabel 4.1 Pengujian Akurasi ....................................................................... 58
Tabel 4.2 Confusion Matrix ........................................................................ 58
Tabel 4.3 Blackbox Testing ........................................................................ 59
xv
DAFTAR PUSTAKA
[1] Kaskus.co.id. http://www.alexa.com/siteinfo/kaskus.co.id. Diakses pada
tanggal 21 Februari 2013.
[2] Farber, Dan. 2012. Twitter hits 400 million tweets per day, mostly mobile.
http://www.cnet.com/news/twitter-hits-400-million-tweets-per-day-
mostly-mobile/. Diakses tanggal 27 November 2014.
[3] Liu, Bing. 2012. Sentiment Analysis And Opinion Mining. Chicago:
Morgan & Claypool Publisher.
[4] Pak, A. & Paroubek, P. 2010. Twitter as a Corpus for Sentiment Analysis
and Opinion Mining.
[4] Pang, B., Lee, L., & Vithyanathan, S. (2008). Sentiment Classification
Using Machine Learning Techniques.
[6] Feldman, R & Sanger, J. 2007. The Text Mining Handbook: Advanced
Approaches in Analyzing Unstructured Data. Cambridge University
Press: New York.
[7] Xhemali, D., Hinde, C.J. & Stone, R.G. 2009. Naive Bayes vs. Decision
Trees vs. Neural Networks in the Classification of Training Web Pages.
[8] Wang, A. H. 2010. Don't Follow Me: Twitter Spam Detection.
Proceedings of 5th International Conference on Security and
Cryptography (SECRYPT) Athens 2010.
[9] Dehaff, M. 2010. Sentiment Analysis, Hard But Worth It!. Tersedia di:
http://www.customerthink.com/blog/sentiment_analysis_hard_but_worth_
it (diunduh 25 februari 2015).
[10] Saraswati, N.W.S., 2011, Text Mining dengan Metode Naive Bayes
Classifier dan Support Vector Machines Sentiment Analysis.
[5] Barber, I. 2010. Bayesian Opinion Mining. http://phpir.com/bayesian-
opinion-mining (diunduh 25 februari 2015).
[6] Berry, M.W. & Kogan, J. 2010. Text Mining Aplication and theory.
[7] Han, J & Kamber, M. 2006 Data Mining: Concepts and Techniques
Second Edition.
[8] Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J. 2005. Text Mining :
Predictive Methods fo Analyzing Unstructered Information.
[9] Dragut, E., Fang, F., Sistla, P., Yu, S. & Meng, W. 2009. Stop Word and
Related Problems in Web Interface Integration.
[10] Tala, Fadillah Z. 2003. A Study of Stemming Efects on Information
Retrieval in Bahasa Indonesia .
[11] Agusta, L. 2009. Perbandingan Algoritma Stemming Porter dengan
algoritma Nazief & Adriani untuk Stemming Dokumen Teks Bahasa
Indonesia.
[13] Asian, J, Williams, H.E, Tahaghoghi, S.M.M. 2005. Stemming Indonesia.
Proceedings of the Twenty-eighth Australasian conference on Computer
Science.
[14] Adriani, M., Asian, J., Nazief, B. Tahaghoghi, S.M.M., Williams, H.E.
2007. Stemming Indonesian: A Confix-Stripping Approach. Transaction on
Asian Langeage Information Processing.
[15] Zaman B. dan E. Winarko. 2011. Analisis Fitur Kalimat untuk Peringkas
Teks Otomatis pada Bahasa Indonesia.
[16] Kridalaksana, H. 2009. Pembentukkan Kata dalam Bahasa Indonesia.
Gramedia Pustaka Utama : Jakarta.
[17] Muslich, Masnur. 2008. Tata Bentuk Bahasa Indonesia: Kajian Ke Arah
Tata Bahasa Deskriptif.
[18] Alwi, H., Dardjowidjojo, S., Lapoliwa, A.M., 2003. Tata Bahasa Baku
Bahasa Indonesia: Edisi Ketiga. Pusat Bahasa Departemen Pendidikan
Nasional. Balai Pustaka : Jakarta.
[19] Cavnar, William B., Trenkle, M. N-gram based text categorization.
Proceedings of the third Annual Symposium on Document Analysis and
Information Retrieval.