teknik data mining pada prediksi klasifikasi jenis
TRANSCRIPT
TEKNIK DATA MINING PADA PREDIKSI
KLASIFIKASI JENIS PENYAKIT ANEMIA
SKRIPSI
Diajukan Guna Memenuhi Persyaratan Memperoleh Gelar Sarjana Komputer (S.Kom.)
Dita Amalia
00000025849
PROGRAM STUDI SISTEM INFORMASI
FAKULTAS TEKNIK DAN INFORMATIKA
UNIVERSITAS MULTIMEDIA NUSANTARA
TANGERANG
2021
ii
PERNYATAAN
iii
HALAMAN PERSETUJUAN
Skripsi dengan judul
“Teknik Data Mining Pada Prediksi Klasifikasi Jenis Penyakit Anemia”
oleh
Dita Amalia
telah disetujui untuk diajukan pada
Sidang Ujian Skripsi Universitas Multimedia Nusantara
Tangerang, 7 Juni 2021
Menyetujui,
Dosen Pembimbing 1 Dosen Pembimbing 2
Johan Setiawan, S.Kom., M.M., M.B.A. Iwan Prasetiawan, S.Kom., M.M.
Ketua Program Studi
Ririn Ikana Desanti, S.Kom., M.Kom.
iv
HALAMAN PENGESAHAN
Skripsi dengan judul
“Teknik Data mining Pada Prediksi Klasifikasi Jenis Penyakit Anemia”
oleh
Dita Amalia
telah diujikan pada hari Rabu, 16 Juni 2021,
pukul 08.30 s.d 10.00 dan dinyatakan lulus
dengan susunan penguji sebagai berikut.
Ketua Sidang Penguji
Wella, S.Kom., M.MSI. Friska Natalia, Ph.D.
Dosen Pembimbing 1 Dosen Pembimbing 2
Johan Setiawan, S.Kom., M.M., M.B.A. Iwan Prasetiawan, S.Kom., M.M.
Ketua Program Studi
Ririn Ikana Desanti, S.Kom., M.Kom.
v
TEKNIK DATA MINING PADA PREDIKSI
KLASIFIKASI JENIS PENYAKIT ANEMIA
ABSTRAK
Oleh : Dita Amalia
Anemia dapat dicegah dan ditangani berdasarkan jenis-jenis penyakit anemia
untuk menyesuaikan kebutuhan dan perawatan nya. Anemia juga dapat diprediksi
berdasarkan faktor utama yang dapat mendeteksi penyakit ini yaitu data Complete
Blood Count. Penderita anemia dapat dianalisa dengan memanfaatkan data
demografi untuk melakukan mitigasi lebih lanjut terkait data penderitanya.
Untuk menanggulangi masalah tersebut, maka dilakukan pembuatan model
prediksi klasifikasi data mining dengan memanfaatkan framework CRISP-DM
yang terdiri dari 6 tahapan. Penelitian ini membandingkan tiga algoritma
supervised learning yaitu Naïve Bayes, J48 Decision Tree, dan Random Forest
dengan menggunakan tools RapidMiner. Penilaian performa algoritma paling
optimal dilihat berdasarkan hasil akurasi, rata-rata sensitivitas dan rata-rata
presisi.
Hasil yang didapatkan setelah melakukan perbandingan pembuatan model
menyimpulkan bahwa J48 Decision Tree mempunyai hasil yang paling optimal,
dilanjutkan dengan Random Forest dan Naïve Bayes. Hasil analisis jumlah jenis
anemia tertinggi ada pada jenis anemia of reneal disease dan anemia kronis
dengan persentase terbesar ada pada jenis kelamin wanita.
Kata Kunci : Anemia, Data mining, J48 Decision Tree, Naïve Bayes, Random
Forest
vi
DATA MINING TECHNIQUES IN PREDICTION
CLASSIFICATION TYPES OF ANEMIA DISEASE
ABSTRACT
By : Dita Amalia
Anemia can be prevented and treated based on the types of anemia to suit their
needs and treatments. Anemia can also be predicted based on the main factor that
can detect this disease, namely Complete Blood Count data. Anemia sufferers can
be analyzed by utilizing demographic data to further mitigate the patient's data.
To solve the problem, the data mining classification prediction model is done by
utilizing CRISP-DM framework consisting of 6 phases. This study compared three
supervised learning algorithms namely Naïve Bayes, J48 Decision Tree, and
Random Forest using RapidMiner tools. The most optimal algorithm performance
assessment is based on accuracy, mean recall, and mean precision results.
The results obtained after comparing the creation of the model concluded that J48
Decision Tree had the most optimal results, followed by Random Forest and
Naïve Bayes. The results of the analysis of the highest number of types of anemia
are in the type of anemia of reneal disease and chronic anemia with the largest
percentage present in the female sex.
Keywords: Anemia, Data mining, J48 Decision Tree, Naïve Bayes, Random
Forest
vii
KATA PENGANTAR
Puji dan syukur kepada Tuhan Yang Masa Esa atas segala rahmat-Nya
sehingga skripsi yang berjudul “Teknik Data mining Pada Prediksi Klasifikasi
Jenis Penyakit Anemia” dapat selesai tepat pada waktunya. Skripsi ini penulis
ajukan kepada Program Strata 1, Program Studi Sistem Informasi, Fakultas
Teknik dan Informatika, Universitas Multimedia Nusantara. Setelah melewati
proses penyusunan skripsi ini dengan segala hambatan dan rintangan yang
dihadapi, penulis menyadari dalam penyusunan skripsi ini tidak akan selesai tanpa
bantuan dari berbagai pihak. Maka dari itu pada kesempatan ini penulis ingin
mengucapkan rasa terima kasih sebesar-besarnya kepada :
1. Bapak Johan Setiawan, S.Kom., M.M., M.B.A. yang telah memberikan
bimbingan, pedoman penulisan skipsi, dan saran-saran yang diberikan
kepada penulis selama pengerjaan skripsi,
2. Bapak Iwan Prasetiawan, S.Kom., M.M. yang telah memberikan
bimbingan, berdiskusi, dan senantiasa membantu penulis dalam
mengarahkan terkait materi skipsi,
3. Bapak Yanwira, Ibu Lilis, Kakak Nadira, Adik Adi, Adik Zui, keluarga,
dan saudara semua yang senantiasa mendoakan daniimendukung penulis
dalam menyelesaikaniiskripsi,
4. Sahabat tercinta dan teman-teman penulis yang selalu memberikan
semangat, dukungan, saran, bantuan, serta doa selama penyusunan skripsi
ini berlangsung.
viii
5. Serta kepada semua pihak yang tidak dapat diucapkan satu per satu yang
telah terlibat dalam pembuatan laporan skripsi ini.
Penulis sudah berusaha sebaik mungkin dalam menyusun skripsi ini, namun
penulis jugaooomenyadari bahwa skripsi ini masihiiiada kekurangan dalam
penulisannya. Maka dari itu diharapkan kritik serta saran yang membangun dari
berbagai pihak. Semoga skripsi yang dibuat oleh penulis dapat memberikan
wawasan dan inspirasi yang bermanfaatiibagi para pembaca.
Tangerang, 7 Juni 2021
Dita Amalia
ix
DAFTAR ISI
PERNYATAAN...................................................................................................... ii
HALAMAN PERSETUJUAN ............................................................................... iii
HALAMAN PENGESAHAN................................................................................ iv
ABSTRAK ...............................................................................................................v
ABSTRACT ............................................................................................................. vi
KATA PENGANTAR .......................................................................................... vii
DAFTAR ISI .......................................................................................................... ix
DAFTAR TABEL ................................................................................................. xii
DAFTAR GAMBAR ........................................................................................... xiii
DAFTAR RUMUS ................................................................................................xv
BAB I PENDAHULUAN ........................................................................................1
1.1. Latar Belakang ......................................................................................... 1
1.2. Rumusan Masalah .................................................................................... 4
1.3. Batasan Masalah ....................................................................................... 5
1.4. Tujuan Penelitian ...................................................................................... 5
1.5. Manfaat Penelitian .................................................................................... 6
BAB II TINJAUAN PUSTAKA..............................................................................7
2.1. Anemia ..................................................................................................... 7
2.2. Jenis-jenis Anemia ................................................................................... 7
2.2.1. Anemia Aplastik ................................................................................ 7
2.2.2. Anemia Kronis ................................................................................... 8
2.2.3. Iron Deficiency Anemia..................................................................... 8
2.2.4. Thalassemia ...................................................................................... 9
x
2.2.5. Anemia of Renal Disease .................................................................. 9
2.3. Data mining .............................................................................................. 9
2.3.1. Teknik Data mining ........................................................................ 10
2.3.2. CRISP-DM ...................................................................................... 12
2.4. Naïve Bayes ............................................................................................ 14
2.5. J48 Decision Tree ................................................................................... 14
2.6. Random Forest ....................................................................................... 15
2.7. Akurasi ................................................................................................... 16
2.8. Sensitivitas .............................................................................................. 16
2.9. Presisi ..................................................................................................... 17
2.10. RapidMiner ......................................................................................... 17
2.11. Penelitian Terdahulu ........................................................................... 18
BAB III METODOLOGI PENELITIAN ..............................................................20
3.1. Gambaran Umum Objek Penelitian ....................................................... 20
3.2. Metode Penelitian ................................................................................... 20
3.3. Teknik Pengumpulan Data ..................................................................... 25
3.3.1. Data Collection ............................................................................... 25
3.3.2. Variabel Penelitian .......................................................................... 27
3.4. Teknik Analisis Data .............................................................................. 28
3.4.1. Business Undertanding ................................................................... 28
3.4.2. Data Understanding ........................................................................ 29
3.4.3. Data Preparation ............................................................................ 29
3.4.4. Data Modeling ................................................................................ 30
3.4.5. Evaluation ....................................................................................... 32
3.4.6. Deployment ..................................................................................... 32
xi
BAB IV ANALISIS DAN HASIL PENELITIAN ...............................................33
4.1. Business Understanding Phase .............................................................. 33
4.2. Data Understanding Phase .................................................................... 34
4.3. Data Preparation Phase ......................................................................... 36
4.3.1. Data Cleansing ............................................................................... 41
4.3.2. Set Parameter .................................................................................. 42
4.3.3. Split Data ........................................................................................ 44
4.4. Modeling Phase ...................................................................................... 44
4.4.1. Modeling Naïve Bayes..................................................................... 45
4.4.2. .Modeling J48 Decision Tree .......................................................... 49
4.4.3. Modeling Random Forest ................................................................ 52
4.5. Evaluation Phase .................................................................................... 57
4.6. Result and Discussion ............................................................................ 60
BAB V KESIMPULAN DAN SARAN.................................................................66
5.1. Kesimpulan ............................................................................................. 66
5.2. Saran ....................................................................................................... 68
DAFTAR PUSTAKA ............................................................................................69
DAFTAR LAMPIRAN ..........................................................................................73
xii
DAFTAR TABEL
Tabel 2.1. Deskripsi Phase CRISP-DM ................................................................ 13
Tabel 2.2. Perbandingan Penelitian Terdahulu ..................................................... 18
Tabel 3.1. Perbandingan Teknik data mining ....................................................... 21
Tabel 3.2. Perbandingan Tools Data mining [25], [26] ........................................ 22
Tabel 3.3. Tabel Perbandingan Penggunaan Algoritma........................................ 24
Tabel 3.4. Atribut data Complete Blood Count NHANES.................................... 26
Tabel 4.1. Deskripsi Statistik Atribut ................................................................... 39
Tabel 4.2. Confusion Matrix Naïve Bayes ............................................................ 57
Tabel 4.3. Confusion Matrix J48 Decision Tree ................................................... 58
Tabel 4.4. Confusion Matrix Random Forest........................................................ 58
Tabel 4.5. Perbandingan Hasil Performance Algoritma ....................................... 60
Tabel 4.6. Perbandingan Hasil Penelitian dengan Penelitian Terdahulu .............. 61
xiii
DAFTAR GAMBAR
Gambar 2.1. Tahapan Proses CRISP-DM [19] ..................................................... 12
Gambar 3.1. Logo CDC (Centers For Disease Control anad Prevention) [28]..... 25
Gambar 3.2 Alur Penelitian CRISP-DM [19] ....................................................... 28
Gambar 3.3. Flowchart Modeling Phase .............................................................. 31
Gambar 4.1. Tiga Kategori Data NHANES .......................................................... 35
Gambar 4.2. Merge Data TurboPrep .................................................................... 36
Gambar 4.3. Tampilan Stat/Trasnfer .................................................................... 37
Gambar 4.4. Operator Retrieve ............................................................................. 37
Gambar 4.5. Data Preparation Operator .............................................................. 38
Gambar 4.6. Select Attributes................................................................................ 38
Gambar 4.7. Visualisasi Distribusi Data Atribut Numerik ................................... 40
Gambar 4.8.Visualisasi Distribusi Data Atribut Kategorial.................................. 41
Gambar 4.9. Missing Value Atribut ...................................................................... 41
Gambar 4.10. Operator Filter Examples ............................................................... 42
Gambar 4.11. Operator Generate Attribute .......................................................... 43
Gambar 4.12. Operator Set Role ........................................................................... 43
Gambar 4.13. Screenshot Data Training dan Data Testing .................................. 44
Gambar 4.14. Operator Modeling Proses ............................................................. 45
Gambar 4.15. Proses Cross-validation Naïve Bayes ............................................ 46
Gambar 4.16. Parameter Operator Model Naïve Bayes ....................................... 46
Gambar 4.17. Hasil Confidence Naïve Bayes ...................................................... 47
Gambar 4.18. Hasil Performance Akurasi Naïve Bayes ...................................... 47
Gambar 4.19. Grafik Simple Distribusi Atribut Naïve Bayes ............................... 48
Gambar 4.20. Proses Cross-validation J48 Decision Tree .................................. 49
Gambar 4.21. Parameter Operator Model J48 Decision Tree .............................. 49
Gambar 4.22. Hasil Confidence J48 Decision Tree ............................................. 50
Gambar 4.23. Hasil Performance Akurasi J48 Decision Tree............................. 51
Gambar 4.24. Hasil Tree J48 Decision Tree ......................................................... 52
Gambar 4.25. Deskripsi Tree J48 Decision Tree .................................................. 52
xiv
Gambar 4.26. Proses Cross-validation Random Forest ....................................... 53
Gambar 4.27. Parameter Operator Model Random Forest .................................. 54
Gambar 4.28. Hasil Confidence Random Forest ................................................. 54
Gambar 4.29. Hasil Performance Akurasi Random Forest ................................. 55
Gambar 4.30. Hasil Pohon Random Forest .......................................................... 55
Gambar 4.31. Deskripsi Pohon Random Forest.................................................... 56
Gambar 4.32. ROC Curve Jenis-jenis Anemia...................................................... 59
Gambar 4.33. Visualisasi Analisis Anemia Berdasarkan Gender......................... 62
Gambar 4.34. Visualisasi Analisis Anemia Berdasarkan Usia ............................. 63
Gambar 4.35. Visualisasi Analisis Anemia Berdasarkan Tinggi Badan dan Berat
Badan..................................................................................................................... 64
Gambar 4.36. Visualisasi Analisis Anemia Berdasarkan Race/origin .................. 64
Gambar 4.37. Visualisasi Analisis Anemia Berdasarkan Citizen ......................... 65
xv
DAFTAR RUMUS
Rumus 2.1 Akurasi ................................................................................................ 16
Rumus 2.2. Sensitivitas ......................................................................................... 16
Rumus 2.3. Presisi ................................................................................................. 17