implementasi algoritma naÏve bayes classifier dalam...

14
i IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM KLASIFIKASI USER BERDASARKAN TWEET TUGAS AKHIR Sebagai Persyaratan Guna Meraih Gelar Sarjana Strata 1 Teknik Informatika Universitas Muhammadiyah Malang Disusun Oleh: Lalu Taqi Mustaqim 09560301 JURUSAN TEKNIK INFORMATIKA FAKULTAS TEKNIK UNIVERSITAS MUHAMMADIYAH MALANG 2015

Upload: others

Post on 03-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

i

IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER

DALAM KLASIFIKASI USER BERDASARKAN TWEET

TUGAS AKHIR

Sebagai Persyaratan Guna Meraih Gelar Sarjana Strata 1

Teknik Informatika Universitas Muhammadiyah Malang

Disusun Oleh:

Lalu Taqi Mustaqim

09560301

JURUSAN TEKNIK INFORMATIKA

FAKULTAS TEKNIK

UNIVERSITAS MUHAMMADIYAH MALANG

2015

Page 2: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,
Page 3: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,
Page 4: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,
Page 5: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

KATA PENGANTAR

Alhamdulillahirrobbil’alamin, segala puji bagi Allah SWT Tuhan semesta alam yang telah

memberi hidayah dan rahmat-Nya sehingga penulis dapat menyelesaikan penelitian tugas akhir

dengan judul “IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM

KLASIFIKASI USER BERDASARKAN TWEET”.

Pada penelitian ini, penulis ingin mengklasifikasi user berdasarkan tweet untuk mencari opini

mining positif dan negatif yang terdapat pada tweet berita user yang meliputi berita politik,

entertainment, olahraga, maupun pendidikan.

Penulis menyadari bahwa penelitian ini masih jauh dari kesempurnaan. Oleh karena itu,

penulis mengharapkan kritik dan saran yang membangun. Akhir kata penulis mengucapkan terima

kasih kepada semua pihak yang telah membantu hingga tugas akhir ini terselesaikan.

Malang, April 2015

Lalu Taqi Mustaqim

Page 6: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

ix

DAFTAR ISI

Halaman

HALAMAN JUDUL ........................................................................................ i

LEMBAR PERSETUJUAN ............................................................................. ii

LEMBAR PENGESAHAN .............................................................................. iii

LEMBAR PERNYATAAN .............................................................................. iv

ABSTRAK ........................................................................................................ v

ABSTRACT ....................................................................................................... vi

LEMBAR PERSEMBAHAN ........................................................................... vii

KATA PENGANTAR ...................................................................................... viii

DAFTAR ISI .................................................................................................... ix

DAFTAR GAMBAR ........................................................................................ x

DAFTAR TABEL ............................................................................................ xi

BAB I : PENDAHULUAN ............................................................................. 1

1.1 Latar Belakang ............................................................................................ 1

1.2 Rumusan Masalah ....................................................................................... 2

1.3 Tujuan ......................................................................................................... 3

1.4 Batasan Masalah ......................................................................................... 3

1.5 Metodologi .................................................................................................. 3

1.6 Sistematika Penulisan ................................................................................. 5

BAB II : LANDASAN TEORI ....................................................................... 7

2.1 Twitter ......................................................................................................... 7

2.2 Sentiment Analysis ..................................................................................... 9

2.3 Text Mining ................................................................................................ 10

2.3.1 Text Preprocessing ........................................................................ 11

2.3.2 Feature Selection .......................................................................... 11

2.4 Stemming .................................................................................................... 12

Page 7: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

x

2.4.1 Morfologi ...................................................................................... 12

2.4.2 Proses Morfologi .......................................................................... 13

2.4.2.1 Afiksasi ............................................................................ 13

2.4.2.2 Sisipan (Infiks) ................................................................. 18

2.4.2.3 Akhiran (Sufiks) .............................................................. 18

2.4.2.4 Konfiks ............................................................................ 19

2.5 N-Gram ....................................................................................................... 19

2.6 Algoritma Confix-Stripping........................................................................ 20

2.6.1 Aturan Peluruhan Kata Dasar ....................................................... 21

2.7 Naïve Bayes Classifier ................................................................................ 23

BAB III : ANALISA DAN PERANCANGAN SISTEM ............................. 25

3.1 Analisa Sistem ............................................................................................ 25

3.1.1 Deskripsi Perangkat Lunak .......................................................... 31

3.1.2 Analisis Kebutuhan Sistem .......................................................... 32

3.1.3 Usecase Diagram ......................................................................... 33

3.2 Perancangan Sistem .................................................................................... 34

3.2.1 Activity Diagram Sistem ............................................................. 34

3.2.2 Sequence Diagram Sistem ........................................................... 38

3.2.3 Class Diagram .............................................................................. 40

3.2.4 Desain Database ........................................................................... 41

3.2.6 Desain Interface ........................................................................... 44

BAB IV : IMPLEMENTASI DAN PENGUJIAN ........................................ 46

4.1 Implementasi Sistem ................................................................................... 46

4.1.1 Implementasi Pengembangan Perangkat Keras ........................... 46

4.1.2 Implementasi Lingkungan Pengembangan Perangkat Lunak ...... 46

4.1.3 Implementasi Case Aplikasi ........................................................ 47

4.1.4 Key Dan Token Twitter Api ........................................................ 48

Page 8: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

xi

4.2 Pengujian Sistem ........................................................................................ 49

4.2.1 Code Dan Screenshot Halaman Utama Pencarian Query ............ 50

4.2.2 Code Dan Screenshot Load Data Tweet (Twitter API) ............... 51

4.2.3 Code Dan Screenshot Klasifikasi User Berdasarkan Tweet ........ 53

4.2.4 Code Dan Screenshot Output Sentimen Positif Dan Negatif ...... 55

4.2.5 Code Dan Screenshot Klasifikasi Naïve Bayes Data Training .... 56

4.2.6 Pengujian Akurasi ........................................................................ 58

4.2.7 BlackBox Testing ........................................................................ 59

BAB V : KESIMPULAN DAN SARAN ....................................................... 63

5.1 Kesimpulan ................................................................................................. 63

5.2 Saran ........................................................................................................... 64

Page 9: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

xii

Page 10: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

xiii

DAFTAR GAMBAR

Halaman

Gambar 3.1 Flowchart Sistem .................................................................... 25

Gambar 3.2 Skema Proses Pengambilan Tweet ......................................... 26

Gambar 3.3 Urutan Proses Secara Umum .................................................. 32

Gambar 3.4 Usecase Diagram .................................................................... 33

Gambar 3.5 Activity Diagram Input Query Tweet ..................................... 34

Gambar 3.6 Activity Diagram Load Twitter API ....................................... 35

Gambar 3.7 Activity Diagram Klasifikasi User Berdasarkan Tweet ......... 36

Gambar 3.8 Activity Diagram Output Opini Positif Dan Negatif .............. 37

Gambar 3.9 Sequence Diagram Input Query .............................................. 38

Gambar 3.10 Sequence Diagram Load Twitter API ................................... 39

Gambar 3.11 Sequence Diagram Klasifikasi User Berdasarkan Tweet ..... 39

Gambar 3.12 Sequence Diagram Output Opini Positif Dan Negatif .......... 40

Gambar 3.13 Class Diagram ....................................................................... 41

Gambar 3.14 Interface Aplikasi .................................................................. 45

Gambar 4.1 Folder Dan File Code Aplikasi ............................................... 47

Gambar 4.2 Key Twitter API Yang Digunakan ......................................... 48

Gambar 4.3 Token Twitter API yang Digunakan ....................................... 49

Gambar 4.4 Code Halaman Utama Pencarian Query ................................. 50

Gambar 4.5 Screenshot Aplikasi dari Kode Gambar 4.4 ............................ 50

Gambar 4.6 Code Load Data Tweet (Twitter API) .................................... 52

Gambar 4.7 Screenshot Load Data tweet (Twitter API) ............................. 53

Gambar 4.8 Code Klasifikasi User Berdasarkan Tweet ............................. 54

Gambar 4.9 Code Output Positif Dan Negatif ............................................ 55

Gambar 4.10 Screenshot Aplikasi Dari Kode Gambar 4.8 dan 4.9 ............ 56

Gambar 4.11 Code Klasifikasi Naïve Bayes Data Training ....................... 57

Page 11: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

xiv

Gambar 4.12 Screenshot Klasifikasi Naïve Bayes Data Training .............. 57

Page 12: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

DAFTAR TABEL

Halaman

Tabel 2.1 Contoh Pemotongan N-Gram Berbasis Karakter ....................... 19

Tabel 2.2 Contoh Pemotongan N-Gram Berbasis Kata .............................. 20

Tabel 2.3 Kombinasi Prefix Dan Sufiks Yang Tidak Diperbolehkan ........ 20

Tabel 2.4 Aturan Peluruhan Kata Dasar ..................................................... 21

Tabel 2.5 Aturan Peluruhan Kata Dasar ..................................................... 22

Tabel 3.1 Contoh Sebagian Daftar Tabel Keyword .................................... 27

Tabel 3.2 Hasil Proses Text Preprocessing Yang Dijadikan Input ............. 28

Tabel 3.3 Kumpulan Stopword ................................................................... 28

Tabel 3.4 Hasil Proses Filtering .................................................................. 28

Tabel 3.5 Tabel Data Training (Dokumen Tweet) ..................................... 30

Tabel 3.6 Tabel Term (Pecah Perkata) ....................................................... 30

Tabel 3.7 Structure Tabel Realtime ............................................................ 42

Tabel 3.8 Structure Tabel Stopword ........................................................... 42

Tabel 3.9 Structure Tabel Kata Dasar ......................................................... 42

Tabel 3.10 Structure Tabel Keyword .......................................................... 43

Tabel 3.11 Structure Tabel Status ............................................................... 43

Tabel 3.12 Structure Tabel Term ................................................................ 44

Tabel 3.11 Structure Tabel Training ........................................................... 44

Tabel 4.1 Pengujian Akurasi ....................................................................... 58

Tabel 4.2 Confusion Matrix ........................................................................ 58

Tabel 4.3 Blackbox Testing ........................................................................ 59

xv

Page 13: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

DAFTAR PUSTAKA

[1] Kaskus.co.id. http://www.alexa.com/siteinfo/kaskus.co.id. Diakses pada

tanggal 21 Februari 2013.

[2] Farber, Dan. 2012. Twitter hits 400 million tweets per day, mostly mobile.

http://www.cnet.com/news/twitter-hits-400-million-tweets-per-day-

mostly-mobile/. Diakses tanggal 27 November 2014.

[3] Liu, Bing. 2012. Sentiment Analysis And Opinion Mining. Chicago:

Morgan & Claypool Publisher.

[4] Pak, A. & Paroubek, P. 2010. Twitter as a Corpus for Sentiment Analysis

and Opinion Mining.

[4] Pang, B., Lee, L., & Vithyanathan, S. (2008). Sentiment Classification

Using Machine Learning Techniques.

[6] Feldman, R & Sanger, J. 2007. The Text Mining Handbook: Advanced

Approaches in Analyzing Unstructured Data. Cambridge University

Press: New York.

[7] Xhemali, D., Hinde, C.J. & Stone, R.G. 2009. Naive Bayes vs. Decision

Trees vs. Neural Networks in the Classification of Training Web Pages.

[8] Wang, A. H. 2010. Don't Follow Me: Twitter Spam Detection.

Proceedings of 5th International Conference on Security and

Cryptography (SECRYPT) Athens 2010.

[9] Dehaff, M. 2010. Sentiment Analysis, Hard But Worth It!. Tersedia di:

http://www.customerthink.com/blog/sentiment_analysis_hard_but_worth_

it (diunduh 25 februari 2015).

[10] Saraswati, N.W.S., 2011, Text Mining dengan Metode Naive Bayes

Classifier dan Support Vector Machines Sentiment Analysis.

[5] Barber, I. 2010. Bayesian Opinion Mining. http://phpir.com/bayesian-

opinion-mining (diunduh 25 februari 2015).

[6] Berry, M.W. & Kogan, J. 2010. Text Mining Aplication and theory.

[7] Han, J & Kamber, M. 2006 Data Mining: Concepts and Techniques

Second Edition.

Page 14: IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DALAM …eprints.umm.ac.id/22490/1/jiptummpp-gdl-lalutaqimu-40670... · 2016-03-26 · KATA PENGANTAR Alhamdulillahirrobbil’alamin,

[8] Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J. 2005. Text Mining :

Predictive Methods fo Analyzing Unstructered Information.

[9] Dragut, E., Fang, F., Sistla, P., Yu, S. & Meng, W. 2009. Stop Word and

Related Problems in Web Interface Integration.

[10] Tala, Fadillah Z. 2003. A Study of Stemming Efects on Information

Retrieval in Bahasa Indonesia .

[11] Agusta, L. 2009. Perbandingan Algoritma Stemming Porter dengan

algoritma Nazief & Adriani untuk Stemming Dokumen Teks Bahasa

Indonesia.

[13] Asian, J, Williams, H.E, Tahaghoghi, S.M.M. 2005. Stemming Indonesia.

Proceedings of the Twenty-eighth Australasian conference on Computer

Science.

[14] Adriani, M., Asian, J., Nazief, B. Tahaghoghi, S.M.M., Williams, H.E.

2007. Stemming Indonesian: A Confix-Stripping Approach. Transaction on

Asian Langeage Information Processing.

[15] Zaman B. dan E. Winarko. 2011. Analisis Fitur Kalimat untuk Peringkas

Teks Otomatis pada Bahasa Indonesia.

[16] Kridalaksana, H. 2009. Pembentukkan Kata dalam Bahasa Indonesia.

Gramedia Pustaka Utama : Jakarta.

[17] Muslich, Masnur. 2008. Tata Bentuk Bahasa Indonesia: Kajian Ke Arah

Tata Bahasa Deskriptif.

[18] Alwi, H., Dardjowidjojo, S., Lapoliwa, A.M., 2003. Tata Bahasa Baku

Bahasa Indonesia: Edisi Ketiga. Pusat Bahasa Departemen Pendidikan

Nasional. Balai Pustaka : Jakarta.

[19] Cavnar, William B., Trenkle, M. N-gram based text categorization.

Proceedings of the third Annual Symposium on Document Analysis and

Information Retrieval.