alzheimer's disease stage prediction using machine ...829245/fulltext01.pdf ·...

Master Thesis Computer Science September 2012

Alzheimer's Disease Stage Prediction using Machine Learning and Multi

Agent System

Ezedin Wangoria and Henok Wordoffa

School of Computing Blekinge Institute of Technology Campus GräsvikSE-371 79 Karlskrona Sweden

- 1 -

This thesis is submitted to the School of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full time studies.

Contact Information: Authors: Name: Ezedin Wangoria Address: Lindblomsvagen 96, 37233 Ronneby, Sweden E-mail: [email protected] Name: Henok Wordoffa Address: Lindblomsvagen 96, 37233 Ronneby, Sweden Email: [email protected]

University advisor: Prof. GuoHua Bai Email: [email protected]

School of Computing Website: : www.bth.se/com Blekinge Institute of Technology Phone: +46 455 38 50 00 SE-37179 Karlskrona Fax: +46 455 38 50 57 Sweden Email: [email protected]

mailto:[email protected]



http://www.bth.se/com


- 2 -

Abstract

Context : Alzheimer's disease is a memory impairment disease which mostly affects elderly people. Currently, about 4 million Americans and 5 million Europeans are affected by this disease. The occurrence of Alzheimer's disease is expected to quadruple by the year 2020. Alzheimer's disease cannot be cured or stopped its progression rather delay its progression. Early diagnosis of the disease helps the patients, the caregivers and health institutions to save time, cost and minimize patients suffering.

Objectives : In this thesis, different machine learning algorithms used for classification purpose are evaluated and various Alzheimer's disease diagnosis techniques are identified. Among these algorithms, a suitable classifier that has better classification accuracy on the National Alzheimer Coordinating Center (NACC) dataset is selected. This classifier is customized in order to make it compatible for the NACC dataset and to receive the new instance from the user. Then a multi agent system model is develop that can improve the classification accuracy.

Methods : Different research works are reviewed and experiments are conducted throughout this research work. A dataset for this research is obtained from National Alzheimer's Coordinating Center, university of Washington. Using this dataset, two experiments are conducted in WEKA. In the first experiment, the five candidate algorithms are compared to select the significant classifier for medical history and cognitive function data. For the second experiment, two datasets are used; a dataset contains Medical History (MH) with Cognitive Function (CF) data and a dataset that contains only medical history data to check in which dataset the selected classifier has better accuracy.

Results : From the first experiment, J48 classifier has a better stage prediction accuracy than the candidate algorithms with 61.12%. J48 is customized to classify a new instance received from the user and to improve the classification accuracy. Then the accuracy increase to 87.09% when the classifier's parameters are optimized. When the medical history and cognitive function data is experimented in WEKA separately, the classification accuracies of J48 on MH, CF and their combination datasets are 81.42%, 64.20% and 87.09% respectively. The agents simulation result showed that some misclassified instance by J48 algorithm can be corrected by multi agent system. The experimental results are presented in graphical format.

Conclusions : Hence we conclude that machine learning and agent system in combination can be used for Alzheimer's disease diagnosis and its stage prediction by extracting knowledge from a dataset which contains patients medical history and cognitive function data.

Keywords : Alzheimer's disease stage prediction, classification, multi agent system, machine learning

- 3 -

Acknowledgements

A heart full thanks to the University of Washington for giving us this big dataset for this thesis. We also want to thank Lilah Besser, research scientist of the National Alzheimer's Coordinating Center, for her support in the medical area and provide us useful materials for the dataset. Thanks for our advisor Professor Dr. GuoHua Bai for his feedbacks and comments. Finally, the experience of studying at the Blekinge Institute of Technology has been wonderful and for that we are thankful to the faculty and staffs at the institute.

- 4 -

ABSTRACT .......................................................................................................................................................... - 2 -

ACKNOWLEDGEMENTS ...................................................................................................................................... - 3 -

LIST OF TABLES .................................................................................................................................................. - 6 -

LIST OF FIGURES ................................................................................................................................................. - 7 -

CHAPTER 1 INTRODUCTION ............................................................................................................................... - 8 -

1.1 PROBLEM DEFINITION .................................................................................................................................... - 9 -

1.2 AIM AND OBJECTIVES ..................................................................................................................................... - 9 -

1.2.1 Aim ...................................................................................................................................................... - 9 -

1.2.2 Objective: ............................................................................................................................................ - 9 -

1.3 RESEARCH QUESTIONS .................................................................................................................................... - 9 -

1.4 RESEARCH APPROACH .................................................................................................................................. - 10 -

1.5 RESEARCH OUTCOME ................................................................................................................................... - 10 -

1.6 STRUCTURE OF THE THESIS ............................................................................................................................ - 10 -

CHAPTER 2 : BACKGROUND ............................................................................................................................. - 11 -

2.1 OVERVIEW ................................................................................................................................................. - 11 -

2.2 LITERATURE REVIEW..................................................................................................................................... - 15 -

CHAPTER 3 : RESEARCH METHODOLOGY ......................................................................................................... - 19 -

3.1 RESEARCH PROCESS ..................................................................................................................................... - 19 -

3.2 PROBLEM DEFINITION .................................................................................................................................. - 20 -

3.3 LITERATURE REVIEW..................................................................................................................................... - 20 -

3.4 RESEARCH QUESTIONS .................................................................................................................................. - 20 -

3.5 HYPOTHESIS ............................................................................................................................................... - 20 -

3.6 DATA PREPARATION .................................................................................................................................... - 21 -

3.7 DATA PREPARATION AND EXPERIMENTATION .................................................................................................... - 21 -

3.8 RESULT ANALYSIS AND DESCRIPTION ................................................................................................................ - 21 -

CHAPTER 4 : EXPERIMENTAL DESIGN ............................................................................................................... - 22 -

4.1 ALGORITHM EVALUATION AND SELECTION ........................................................................................................ - 22 -

4.2 DATA MINING TOOLS ................................................................................................................................... - 25 -

4.3 WEKA .................................................................................................................................................... - 25 -

4.4 DATA PREPARATION ..................................................................................................................................... - 26 -

4.5 DATA OVERVIEW ......................................................................................................................................... - 27 -

4.6 ATTRIBUTES EVALUATION AND SELECTION ........................................................................................................ - 27 -

4.6.1 Medical history ................................................................................................................................. - 27 -

4.6.2 Cognitive function ............................................................................................................................. - 28 -

4.7 DATA PREPROCESSING ................................................................................................................................. - 29 -

4.8 SIMULATION .............................................................................................................................................. - 31 -

CHAPTER 5 : EXPERIMENT RESULT AND ANALYSIS ........................................................................................... - 33 -

5.1 WEKA EXPERIMENTATION’S RESULT FOR MEDICAL HISTORY AND COGNITIVE FUNCTION DATA ..................................... - 33 -

5.2 WEKA EXPLORER INTERPRETATION ................................................................................................................ - 34 -

5.3 CUSTOMIZATION OF THE SELECTED ALGORITHM ................................................................................................. - 36 -

- 5 -

5.4 WEKA EXPERIMENTATION FOR MEDICAL HISTORY AND COGNITIVE FUNCTION DATA ................................................ - 38 -

5.5 Hypothesis ......................................................................................................................................... - 39 -

5.6 SIGNIFICANCE TEST ...................................................................................................................................... - 40 -

5.7 ACCURACY IMPROVEMENT USING AGENT SYSTEMS ............................................................................................. - 41 -

5.8 THE AGENT ARCHITECTURE ............................................................................................................................ - 45 -

5.8.1 Machine learning classifier (MLC) ..................................................................................................... - 46 -

5.8.2 Agent Classifier (AC) .......................................................................................................................... - 47 -

5.8.3 Efficiency evaluator (EE) ................................................................................................................... - 47 -

5.8.4 Classifier agents ................................................................................................................................ - 47 -

5.8.5 Classification history analyzer (CHA) ................................................................................................ - 48 -

5.8.6 Influential factors analyzer (IFA) ....................................................................................................... - 48 -

5.8.7 Online learner (OL) ............................................................................................................................ - 48 -

5.9 SIMULATION RESULT .................................................................................................................................... - 48 -

CHAPTER 6 : CONCLUSION AND FUTURE WORK ............................................................................................... - 52 -

REFERENCES ........................................................................................................................................................ - 53 -

APPENDIX A ............................................................................................................................................................ - 60 -

HOW TO USE THE SYSTEM .......................................................................................................................................... - 60 -

APPENDIX B ............................................................................................................................................................ - 62 -

- 6 -

List of Tables

Table 1 : Top 10 algorithms..................................................................................................................... - 23 -

Table 2 : Influential risk factors ............................................................................................................... - 29 -

Table 3 : Experiment data refinement proportion by class .................................................................... - 30 -

Table 4: MMSE result distribution in dataset by percent ....................................................................... - 31 -

Table 5: Sex distribution in dataset by percent ...................................................................................... - 31 -

- 7 -

List of Figures Figure 1 : Research work flow ................................................................................................................. - 19 -

Figure 2: Data preparation steps ............................................................................................................ - 27 -

Figure 3 : Comparison of 5 different algorithms ..................................................................................... - 33 -

Figure 4 : Algorithms comparison by classification accuracy percentage .............................................. - 34 -

Figure 5 : J48 classifier output ................................................................................................................ - 35 -

Figure 6 : Summary and confusion matrix of J48 classifier ..................................................................... - 37 -

Figure 7 : Comparison between original and modified J48 on classification accuracy .......................... - 37 -

Figure 8: A prototype for AD diagnosis ................................................................................................... - 38 -

Figure 9 : Classification accuracy of J48on MH, MMSE and combined datasets .................................... - 39 -

Figure 10 : T-test result of candidate algorithms .................................................................................... - 40 -

Figure 11 : T-test result of datasets ........................................................................................................ - 41 -

Figure 12 : Confusion matrix of customized J48 algorithm .................................................................... - 41 -

Figure 13 : classification simulation result J48 algorithm with instances that their age value is

continuously increasing .......................................................................................................................... - 43 -

Figure 14 : classification simulation result of J48 algorithm with instances that their MMSE value is

continuously increasing .......................................................................................................................... - 43 -

Figure 15 : AD stage progression assumption vs. the J48 algorithm classification result ...................... - 44 -

Figure 16: Multi agent architecture ........................................................................................................ - 46 -

Figure 17 : J48 classifier classification result with instances continuously decreasing in MMSE result . - 49 -

Figure 18 : pattern tracking agent classification result with instances continuously decreasing in MMSE

result ....................................................................................................................................................... - 50 -

Figure 19 : attribute analyzer agent classification result with instances continuously decreasing in MMSE

result ....................................................................................................................................................... - 50 -

Figure 20 : The prototype interface with input ...................................................................................... - 60 -

Figure 21 : The prototype interface with input and classification result ................................................ - 61 -

- 8 -

Chapter 1 Introduction In this modern era, since the overall life expectancy has increased, the rate of age related diseases occurrence has also increased. Alzheimer’s disease (AD) is an age related and common form of dementia which mostly affects elderly people [2]. AD is the permanent and progressive brain disease which slowly decrease memory, thinking, remembering, and reasoning skills [3]. There is a high possibility that Alzheimer disease increases progressively its' severity after the age of 65. Before AD becomes severe, it shows some warning signs like, poor judgment, changes in emotional behavior, difficulty to do familiar tasks, misplacing items, challenge in solving problems and inability to learn new things [2]. Some of risk factors associated with AD are smoking, hypertension, age, diabetes, obesity and others. During recent decades, the number of patients has dramatically growing specially on developed countries which have high life expectancy. Currently, around 5.4 million Americans and 5 million Europeans are affected by AD. The occurrence of AD is expected to quadruple by the year 2020[1] and estimated that, in 2050 one individual will develop the disease in every 30 seconds [2]. AD is being diagnosed using different techniques such as examination of the medical history, physical examination, laboratory test, neuropsychological or cognitive function testing and brain imaging scan. The diagnosis require specialist doctors such as Psychiatrists, Neurologists and Psychologists [3]. Several kinds of diagnosis may involve and it will likely take more than a day to diagnose AD. The patients have to visit hospitals periodically to make periodic checkups. Currently, AD is also diagnosed using machine learning and agent system [1]. Machine learning and agent system have made significant advances in the field of weather forecasting, robotics, search engines, natural language processing, speech recognition, medical diagnosis, and handwriting recognition. Machine learning, which is a core of Artificial Intelligence[3] [4], is rapidly growing new technology[5] which used to design and develop classifiers that allow computers to “learn” [6]. This technology intends to solve problems of inference and prediction based on the available data and these data are helpful for decision making for human or intelligent computer system like an agent System. An agent system is "a

computer system, situated in some environment, that is capable of flexible autonomous action in

order to meet its design objectives”[7]. An Important aspect of agents is their ability to offer intelligence with interaction [8]. Intelligent agents have reactive, proactive and sociability properties. Intelligent agents have reactive, proactive and sociability properties. Multi agent system is a system which consists multiple agents which are working together either cooperatively or competitively to achieve their design objective.

Various AD stage prediction has been proposed by many researchers using machine learning on different diagnosis techniques. These researches achieved their goal but some of them were working on AD using only diagnose with brain image scanning and the others require special devices that needs the healthcare center visit [1][3].

- 9 -

1.1 Problem definition European e-health study shows that in each year, there are around 2.4 Million unnecessary healthcare center visits by patients. Elderly people, including Alzheimer disease patients, are among those who are unnecessarily visiting the hospitals. AD patients can diagnosed using different techniques but physical examination, laboratory test, and brain imaging scan require the physical appearance of the patient to the medical center. This results a large number of AD patients that visit the healthcare institutions. The increase in the number of visitors create a workload on professionals, high number of patients’ queues and extra cost in terms of time and

money on both the patients and health institutions. On each visit, the healthcare centers register and stored massive patients' data for future follow up. Usually these massive data are used only when it is necessary to refer medical history of patients. Machine learning creates another possibility to diagnosis AD patients using the massive stored cognitive function and medical history data. In this research, machine learning and agent system will be combined with cognitive function and medical history data to diagnose patients based the existing 47,000 AD patients' data.

1.2 Aim and objectives

1.2.1 Aim

The aim of this research is to investigate a machine learning algorithms for Alzheimer disease stage prediction and integrate the algorithm with multi agent system for accuracy improvement.

1.2.2 Objective:

Identify and evaluate the existing AD diagnosis techniques Evaluate different machine leaning algorithm for classification purpose Identify a suitable machine learning classification algorithm to classify patient’s AD

stages using NACC dataset Customize and implement the selected machine learning algorithm. Develop multi agent system model to improve the classification accuracy

1.3 Research questions

Which machine learning algorithm is significantly better to classify NACC's medical health and cognitive function data for AD Diagnosis?

Does the combination or separation of medical history and cognitive function data affect the classification accuracy?

How to improve the classification accuracy using multi agent system?

- 10 -

1.4 Research approach In the previous section, there are three proposed research questions that need to be addressed. Depending on the nature of the questions, a combination of different research methods will be used. Literature review and experiments will be conducted in order to select a better machine learning algorithm to classify AD patient using the NACC's dataset. The literature review also will be done to identify different AD diagnosis techniques and AD risk factors. In the literature peer-reviewed articles and journals will be reviewed.

The dataset for this research is obtained from National Alzheimer's Coordinating Center (NACC), University of Washington. This dataset has more than 47,000 patients' full information. Based on the disease risk factors identified in literature review, relevant information will be extracted from the dataset. The efficiency of data mining significantly decline if the data contains noise, missing values, incomplete and inconsistent data. For this reason the data preparation will be done before the experiment is conducted. Then the significantly better algorithm will be identified by literature review and/or experiment. The final result of this process will enable to determine the severity level of the patient. The detailed research methodology is discussed in chapter 3.

1.5 Research outcome

The expected result from this research is:

A knowledge extracted from the NACC dataset to diagnose AD patients A suitable machine learning classification algorithm for NACC dataset A multi agent system model used for a better prediction accuracy

1.6 Structure of the thesis This thesis work is organized as follows: The first chapter gives introduction about AD, define the problem, raise the research questions, state the aim and objective of the research. The next chapter provides a detailed background on the existing different AD diagnosis techniques, machine learning and agent system. Related research works are also reviewed in literature review subsection of this chapter. In chapter 3, the research methodology used in this research is explained using diagram and with short description. In the next chapter, Experimental design, machine learning algorithms and tools are evaluated and selected. Alzheimer's disease risk factors are also selected from the dataset in this chapter. It followed by chapter 5 which is experimental result and analysis. In this chapter, the result from WEKA is presented and analyzed, the result is confirmed using Friedman's statistical testing and accuracy improvements using multi agent system are provided. Finally, the conclusion of this thesis work is presented and a future work is suggested in chapter 6.

- 11 -

Chapter 2 : Background

2.1 Overview In developed countries which have high life expectancy, the incidence of age related diseases rise exponentially with the increasing of age. AD is slowly progressive common form of dementia disorder which primarily affects the elderly population. This disease first affects the parts of the brain that control thought, memory and language. Elder people with AD may have trouble in remembering things which happened recently, names of people they knew, performing regular task and many more. Currently, about 4 million Americans and 5 million Europeans are affected by AD. The occurrence of AD is expected to quadruple by the year 2020[1] and it is estimated that, in 2050 one individual will develop the disease in every 30 seconds [2] . In America, AD is the one of the leading cause of death for elders around aged 65 years. Until these days, there is no cure for this disease. Many researchers are using various kinds of diagnosis and treatments to delay the stage progression. Even though there's no specific test that confirms presence of AD, early and accurate diagnosis has a major impact on the development of AD stage change. In order to distinguish AD from other causes of memory loss, doctors typically rely on either one of or the combination of historical data, physical exam, cognitive testing, laboratory studies and brain imaging [9]. A person's medical history or historical data is one part of the assessment which is done by taking the person's AD related risk factors information including family history of AD, smoking, alcohol, diabetes, hypertension, heart disease, obesity (BMI) and gender [1][10][11]. Physical examination is a source of assurance that the patient is as healthy as expected. During this examination, the physician check the blood pressure, temperature, pulse, lungs, heart beat and collect blood or urine samples for laboratory testing of the patient. The purpose of cognitive testing or neuropsychological testing is to investigate how the respondent understands the questions and provide accurate answer. The well-known among these testing techniques are MMSE (Mini Mental State Examination) and FAQ (Functional Activity Questionnaire). AD imaging is a brain diagnosis method to identify the person abnormalities. There are different kinds of brain imaging such as Magnetic resonance imaging (MRI), functional Magnetic resonance imaging (fMRI), Positron emission tomography (PET) and Single-photon emission computed tomography (SPECT). These diagnosis techniques help to classify the person as healthy or AD patient. The severity level of AD might differ between patients and it can be broadly divided into 5 stages: “No”, “Questionable”, “Mild”, “moderate” and “severe”. On this information era, dealing with the available huge amount of raw information is become one of the main problems. In order to process these mass of information and convert it into knowledge that people can use it, a potential data analysis technique, like machine learning, is necessary. Machine learning (ML), which is a core of Artificial Intelligence[3] [4], is rapidly growing new technology[5] which used to design and develop classifiers that allow computers to

- 12 -

“learn” [6]. This technology enables a computer to analyze different size of data and learn which information is the most relevant from the specific dataset. Machine learning has made significant advances in the field of weather forecasting, robotics, search engines, natural language processing, speech recognition, medical diagnosis, and handwriting recognition. ML aims to solve prediction and classification problems based on already existing data by learning the pattern. In ML there are four different ways of representing the structure: supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning[6]. Among these, supervised and unsupervised learning are prominently used [12]. The main difference between these prominently used learning techniques is the availability of labeled examples or classified instances. Unlike in supervised learning, in unsupervised learning labeled examples don't be included [6]. Supervised learning, also called classification [11], aims to classify the new input efficiently and accurately. The classifier learns from a set of training data and corresponding classes in order to predict unlabeled instance[13][14]. Many researchers have been studying and using the most popular classification algorithms such as AdaBoost, C4.5, kNNs, decision trees, Naïve Bayes, neural networks and support vector machines (SVM) [15][16][17]. Supervised learning is applicable in weather forecasting, predicting risk factors of disease in medical field, classification of network packet in telecom and different other areas. Unsupervised learning, also called clustering [18][14], takes a set of objects as an input and discovers a pattern if some of the sequences share the same object type. There are different unsupervised learning algorithms such as k-means, agglomerative clustering and Gaussian mixture model. The good clustering will collect similar objects together from the given environment near the leaf levels of the hierarchy and defer merging dissimilar subsets until near the root of the hierarchy[13]. Since the space of possible patterns that needs to be discovered is much larger, it makes unsupervised learning harder than supervised learning[19].

Agent system is a computer system which is capable of making autonomous action on its environment aiming to achieve its design objective. Agents perceive their environment using sensors to identify what is happening in the environment. Based on the information acquired, the agents select and execute the appropriate action on the environment using their actuators. Reactivity, proactiveness and sociability are important behaviors of intelligent agents. Beside these behaviors, intelligent agents might be built having a learning behavior as an additional feature. Agent technology is a combined results of computer science technologies such as artificial intelligence, object oriented programming and others. An Important aspect of agents is their ability to offer intelligence with interaction [8].

Multi-agent system (MAS) is a system which consists multiple interacting agents that working together cooperatively or competitively to achieve their design objectives. Usually, multi agent

- 13 -

system is used to solve problems which are difficult to be solved using single agent. In other words, MAS is an appropriate choice when the expertise and/or the knowledge required for solving the problems are distributed. The interaction between agents is not simply exchanging data; rather it includes cooperation, coordination and negotiation between agents as we do in our life.

Agent systems have been used and suggested by researchers to solve different problems raised across different spectrum of life. Some of them are: to handle emergency cases at hospitals [20], to assist patients and healthcare providers [21][22], to enhance conversation and text processing [23] [24], for data mining in online social networks [25][26], for problem and solution modeling and for prediction [27][28][29], and for solving different problems in robotics[30], manufacturing[31],business and economics[32][33].

Agent architecture is a blueprint of an autonomous computer system (Agent). It specifies how the agents will be build. Its purpose is to show what component modules exist in the agent, how these modules works and how they interact to each other. The components and their interactions designated by boxes and arrows which indicate the data and control flows in the module. There are three types of agent architectures: symbolic/logical, reactive and hybrid architectures.

Deductive reasoning agent is a reasoning agent in which the agent behavior and its environment symbolically represented. The agent selects and executes the best actions by deducing these symbolic representations. Difficulty of translating the real world to symbolic description and complexity of symbolically representing real world entities and processes are the two key problems that should be solved in order to build deductive reasoning agent. Another reasoning agent type is practical reasoning agent. It is based on the way how the human beings reason. Deliberation and means-ends reasoning are the fundamental activities in human practical reasoning. Deliberating is the process of deciding what state of affairs wants to achieve and means end reasoning is how to achieve that state of affairs. BDI and PRS architectures are found in this category of agent architecture. The problem of practical reasoning approach is deliberation and means end reasoning have cost in terms of time.

Reactive agents perceive their environment and automatically react for changes occurred on their environment without reasoning about them. Reactive agents made a decision based on the present changes regardless of what happened in past. Simplicity, robustness against failure and computational traceability can be mentioned among the advantages of reactive agents. The fundamental problems of reactive agents are: it is harder to engineer the agent when its number of behavior increases, designing of learning agent is difficult using this approach and others. These show that neither completely reactive nor completely deliberative way of agent building is not enough solution. Due to this, many researchers suggest using hybrid approach which combines both deliberative and reactive agent behaviors. Hybrid agents have hierarchy of agent component layers. Turing machines and InteRRaP architecture follow this approach.

- 14 -

Turing machine is horizontal layered hybrid agent architecture. It has three layers: reactive layer, planning layer and modeling layer. The reactive layer gives immediate response for the changes occurred in the environment. The planning layer has contains the agent proactive behavior. The modeling layer represents world entities. The embedded control subsystem decides which layer to have a control on the agent.

InteRRaP is hybrid two-pass vertically layered agent architecture. It has three layers. The upper layer is responsible for social interaction of the agent, the middle layer is responsible for local planning and the lower layer is responsible for immediate reaction (reactivity). Each layer has associated knowledge base which contains appropriate world representation for the layer.

Belief-Desire-Intention (BDI) architecture is prominent agent design architecture. It is based on the analogy of human mental stances: Belief, Desires and Intention. The agent’s behavior

abstracted in terms of these three mental stances [34]. In this architecture, the interpreter plays a fundamental role. The interpreter updates the belief of the agent constantly based on the information collected from the environment. Based on the agent belief, an appropriate desire and plan to be executed will be selected then the selected plan will be executed to achieve the intention of the agent [35][29].

Procedural Reasoning System (PRS) is an agent architecture which consists explicitly represented mental stances those exist in BDI architecture. PRS architecture is the best established and durable implementation of BDI paradigm [36]. This architecture has been used for the implementation of major industrial multi-agent applications such as air-traffic control system at Sydney airport OASIS, SPOC business process management system and others.

Subsumption architecture was invented by Rodney Brooks to address the shortcomings of the above mentioned ways of agent constructions. This architecture doesn’t require a representation

of world views in the memory [37]. In this architecture, the agent dynamically reacts to the events which occurred on its surrounding, accordingly. This architecture is extremely effective in a situation where there is a limited memory capacity [38]. The agent behavior organized in the form of hierarchical layers. One advantage of this architecture is that the behavior of the agent can be easily changed by adding another behavior layer on the top layers[39].

- 15 -

2.2 Literature review Several researches have made study on different AD diagnosis data which is retrieved from MRI, fMRI, MMSE, PET, SPECT, demographic details and behavioral data. In [40], aimed to classify a person as AD patient or healthy using random forest algorithm. In order to meet their objective, the authors used a supervised learning that consists of five different steps. In the first step, the fMRI data are preprocessed to remove noise, missing values and errors from the dataset. In the next step, modeling of fMRI data is performed. Features, which are AD and healthy subjects, are extracted in the third step and then a subset of them is selected in the fourth step. Finally, a supervised random forests classification algorithm is applied on the data which consists of 41 subjects. The proposed method was evaluated using two different datasets. The first dataset consist healthy young, healthy old and demented subjects. The second dataset only consist healthy old and demented subjects. Another research [41] has been studied by the same authors on similar fMRI data in five steps but the random forest classification algorithm has been modified using majority and weighted voting schemes. In recent work[42] of these researchers similar method, stage and dataset with their 41 subjects had used aiming to provide a supervised method to assist the diagnosis and monitoring the progression of AD using information which extracted from fMRI experiment. Some researchers focus on retrieving different data from more than one diagnosis data for several reasons such as better classification and prediction accuracy. Sandhya Joshi et.al. [9] combined MMSE and FAQ tests in conjunction with four machine learning systems on a sample of 860 patients and control groups for their two goals i.e. to find an accuracy which save time and money in diagnosis of dementia and improving the sensitivity and specificity. Their model of classification of dementia states has four steps: data collection, data preprocessing, feature selection and classification. The objective of this model is to classify the various stages of dementia by applying a selected machine learning methods: C4.5, Naive Bayes and random forest classification. In addition to machine learning, neural network methods like multilayer perceptron was also applied for the evaluation of different stages of dementia. In their proposed system, the machine learning algorithms trained and tested on a sample of both MMSE and FAQ dataset separately. The algorithms learn a description of four dementia MMSE classes (severe, mild cognitive, moderate AD and normal) as well as three dementia FAQ classes (severe, mild cognitive AD and normal) from the training sample of 860 subjects of each MMSE and FAQ. These different classes depicts by numbers known as Clinical Dementia Rating (CDR) score. This score is used to determine the presence or absence of dementia and also to find out patient's dementia severity stage. The CDR score of 0 indicates "No Cognitive Impairment" and CDR score of, 0.5, 1, 2 and 3 shows "Questionable", "Mild", "Moderate" and "Severe dementia" respectively.

- 16 -

The author of[43] proposes an automated segmentation methods on MR brain volumes and classifier models for patient’s CDR score prediction. Their aim was predicting the CDR using Bayesian classifier on 371 subjects of MRI data. The classification was computed by the probability of each class given the particular instance composed by the attributes and then selecting the class label with the highest probability. The authors finally conclude that their idea is promising for diverse clinical information on computer-aided diagnosis application to accurately identify Alzheimer’s disease. A little more MRI data, that consists a cross-sectional collection of 218 subjects aged 18 to 59 years and 198 subjects aged 60 to 96 years, was used in[44]. The authors proposed an automated novel method to classify of a person as Alzheimer patient or healthy using MRI-scan data. This method combines independent component analysis and voxel of interest for classification with five steps. These steps are preprocessing of MRI data, segmentation of gray matter of the brain, decomposition using independent component analysis, extraction of voxel of interest, and classification by a support vector machine. After their experiment, the results shows that the proposed method is able to provide better classification results than other related works that presented in their paper. Having difficulties in organizing and planning things, exhibiting memory losses, forgetting names and words, making poor judgment, easily losing direction and having difficulties on speech are problems most of the time faced by AD patients on their early and moderate stages. Many researchers and computer scientists have proposed different computing methods and techniques to assist AD patients to enjoy their normal life or at least partial of it by persisting these problems.

Reminding patients for their medication or other activities and giving patients a hint to trigger their memory to remember forgotten things are form of assistances commonly found on systems proposed by computer scientists to assist AD patients[45][46][47]. In [47] the authors suggested using of Ambient Intelligence (AmI) to support AD patients. In this paper the authors proposed a mobile application which gives a memory trigger for AD patients to remind them sequence of steps to prepare simple day to day activities. Furthermore in this system, significant deviation from their normal daily activities will be notified for the caregivers. The proposed system consists of three components: patient terminal, caregivers’ station and server station. The patient

terminal, which is a multi agent system, is responsible to collect and send patient’s context information to the web server and to assist the patient based on the acquired information. On the web server, the data will be recorded, analyzed and processed. The caregiver gets information about the patients directly from the patient terminal or the server accordingly. Generally in the proposed system, the patient gets memory trigger assistance and get reminder for the task to be done. On the other hand, the caregivers benefited from the system by getting remote communication service with patients, up-to-date information about the patients whenever they required and alerting message when the patient is in dangerous situation.

- 17 -

The same ambient intelligence also used in[48] to assist AD patient, but in this research, it was used to assist the patients on another perspective i.e. predicting hazardous situations and giving physical and cognitive support for the patient. The authors developed and tested an intelligent system prototype to support elderly and AD patients. Multiple agents, mobile phone devices, Wi-Fi technologies and radio frequency identification (RFID) were integrated in the environment. The system had four kind of interacting agents: patient agents, doctor agents, nurse agents and manager agent. The agents’ built based on the concept of Belief, Desire and Intention (BDI) and special case-based reasoning. The agents have learning behavior and capable of working on dynamic environment. Each agent in the system interacts with each other to achieve its goal on the bases of its belief. Similarly, in [49] the risks associated with forgetfulness, which causes problem on AD patient, was identified and analyzed. The authors propose a multi agent system which uses different sensors to sense the patient environment to collect information and to detect if some risky situation is happening around the patient. The agents divide the risk situation into three level categories: low, medium and high. The system simulated using Java Agent Development (JADE) and tested on different risky scenarios. Based on the result, the system is capable of successfully evaluating risky situation.

Caregivers sometimes ought to control over daily life activities of their AD patients. If the caregivers able monitoring their patient remotely, it will alleviate much of the burdens from the shoulder of the caregivers. In [46], using intelligent multi agent system along with other techniques such as knowledge-based system, neural network, and chaotic analysis proposed for remote monitoring and supporting of people with dementia at their home. The authors proposed this system for remote monitoring of patient situations such as electrocardiogram (ECG) irregularities and congestive heart failure. Using telemedicine technologies, suggested by the authors for both remote observation and remote intervention of caregivers towards the patient.

In addition to indoor monitoring, peoples with dementia need also outdoor monitoring, tracking, navigation help to predefined location and sometimes they need emergency rescue service on the time of accident. In [50], a novel application is proposed for outdoor monitoring and assistance of people with dementia. The proposed system is a mobile application where the data about the patient and around their environment collected by sensors such as global positioning system (GPS), gyroscope, and accelerometer sensors. The collected data will be processed and the system will make informed decision when the patient is in hazardous situation. Moreover, the lost patient location can easily be identified using the proposed system. An intelligent algorithm to navigate the patient towards predefined locations such as health centers, shopping moles and others, planned to be included in the system. In [51], an integration of RFID, GPS and global system for mobile communications (GSM) used to implement an application having four monitoring scheme to assist patients: indoor patient monitoring, outdoor activity monitoring, emergency rescue and remote monitoring. The system is accessible through mobile phone, PDAs and PCs. When the patient push and release a button on the patient locater device, the caregivers get information about the activities of the patients. When the patients find themselves in a

- 18 -

position of emergency help needed, pressing emergency button on the patient locater device will be enough to request help; automatically the patient profile will be retrieved, the geographical position will be identified and handover to the concerned parties. In [45], the authors presented a user centered designing approach to develop cognitive prosthetic device which enhance AD patients’ safety feeling and to alleviate social problems they are facing. From the workshop and conducted interview which carried out in Amsterdam, Lulea and Belfast on patients and caregivers by psychologists, the authors identified that maintaining social contacts, remembering, performing daily life activities and enhancing patients’ safety feelings are main areas of cognitive support needed by the patients. The authors implemented a prototype of a system which incorporates these features. Patients’ memory impairment increases progressively through time [52]. Currently, it is impossible to cure memory impairment caused by dementia rather than delaying its progression [53]. Delaying the impairment could be one sort of assistance which could be achieved using computation techniques and technologies. In [54], the authors proposed a novel system which incorporate portable mini stationary bike to enable the patient to make physical exercise on it. The system also provides multiple choice question game in visual and interactive fashion. The game with combination of the physical exercise aimed to enhance some areas of patient’s brain

capacity such as patents’ memorizing capacity, judgment skill, recollection, matching capability and problem solving ability. While the patient driving the mini stationary bike, the system gives them multi choice question to answer it. In order to be successful in the game, the patient has to answer 30 questions within the time the animation on the screen moves from starting point to the end point. By doing so, the patient simultaneously carried out physical and cognitive exercises enjoyably, which possibly delays patient’s mental degradation. The aim of this research is to investigate a machine learning algorithms for Alzheimer disease stage prediction and integrate the algorithm with multi agent system for accuracy improvement. The diagnosis of AD patients carries out by their medical history and cognitive function data using machine learning and multi agent systems unlike any other research.

- 19 -

Chapter 3 : Research methodology Research is a systematic process used to discover a solution for a specific problem in a formal, structured and organized manner. The theoretical perspective of this process is called methodology. Methodology is "the systematic study of methods that are, can be, or have been

applied within a discipline" [55]. In this study, both quantitative and qualitative approaches are used. Qualitative approach has been used to search and understand what others done, to evaluate various machine learning algorithms and different AD diagnosis techniques. Quantitative research is used in order to compare the evaluated algorithms and select a significantly better algorithm. The algorithm selection is made through data collection, feature selection, data preparation and experiment in machine learning tool.

3.1 Research process

The research process of this thesis work has different steps; in the first step, related works which helps to formulate the research design are explored. Influential risk factors of the disease and a different machine learning classifiers are evaluated. Data preparation on the dataset has been done which is necessary for the experiment. On the next phase, the well known machine learning tool called WEKA[56] is used to compare the evaluated algorithms in order to select the algorithm with a better classification accuracy. On the last phase, MAS model is developed to improve the classification accuracy further and its effectiveness checked using simulation. The overall research methodology plan is shown in Figure 1.

Figure 1 : Research work flow

test against hypothesis

Data collection

and overview

attribute evaluation and selection

Preprocessing

Literature review

. Research Question

. Formulate Hypothesis

Problem Definition

Data Experimentation Experimental

result and analysis

Data preparation

- 20 -

3.2 Problem definition The increase in a number of visitors in the hospital and medical centers have created a workload on professionals, high number of patients’ queues and extra cost in terms of time and money on both the patients and health institutions. There is also a massive stored patients’ data at the

hospitals and healthcare centers that registered and stored for patients’ future follow up and treatment. Usually the data is only used when it is necessary to refer a specific patient's medical history. The collected data at university of Washington can be a good example and there is a need for usage of this NACC dataset to predict AD stage progression, minimize workload, and unnecessary visits.

3.3 Literature review In the literature review various research papers published on the defined problem area or related topics are revised. A number of medical and computing research papers are reviewed to formulate the research questions, to identify the existing AD diagnosis techniques and risk factors, to evaluate machine learning algorithms and tools.

3.4 Research questions The following three research questions are raised to fill the gap identified in the literature review.

Which machine learning algorithm is significantly better to classify NACC's medical health and cognitive function data for AD Diagnosis?

Does the combination or separation of medical history and cognitive function data affect the classification accuracy?

How to improve the classification accuracy using agent system? In order to answer these research questions literature review and experiments are conducted.

3.5 Hypothesis

Null Hypothesis H0_1: There is no significance difference between all candidate algorithms. Alternate Hypothesis H1_1: There is a significantly better algorithm in the classification of NACC dataset. Dependent Variable: accuracy Independent Variables: NACC dataset, candidate algorithms, and experimental environment Null Hypothesis H0_2: The separated existing diagnosis techniques , medical history or cognitive function, data are not significantly different in prediction accuracy. Alternate Hypothesis H1_2: The combined diagnosis techniques, medical history and cognitive function, data gives a better prediction accuracy. Dependent Variable: accuracy Independent Variables: NACC dataset, candidate algorithms, and experimental environment.

- 21 -

3.6 Data Preparation

For this study, the data for the experiment will be prepared through three stages: data collection and overview, attribute evaluation and selection, and data preprocessing. The data was collected from May, 2005 to August 2011 by National Alzheimer's Coordinating Center (NACC), University of Washington. The dataset is in CSV (Comma Separate Value) format. This dataset consists more than 47,000 instances and 396 attributes including noise, error and missing values which significantly decline the efficiency of data mining. For this reason, data preprocessing will be used to check for missing value, noise and inconsistent data.

3.7 Data preparation and Experimentation The collected dataset is preprocessed using MySQL workbench and MS excel VBA to clean the noise, error values and missing data which affects the classification quality. Then two different experiments are conducted using 10,000 refined instances of patients' data. The first experiment is conducted to select significantly better algorithm among the candidate algorithms. The second experiment is conducted using the same NACC dataset but having only the cognitive function data. The simulation is used to address the last research question.

3.8 Result analysis and description

The experimental and simulation results are presented in graphical and tabular format, interpreted and discussed.

- 22 -

Chapter 4 : Experimental Design

Nowadays, the prevalent availability of data storage devices with cheaper price increases the tendency of keeping large amount of data on storage devices for a long time. Usually organizations have large accumulation of data on their storage devices. Companies keep their customers day to day transaction; governmental institutions keep governmental information; health organizations keep patients’ information for future use. These data utilized by the owners

on different manner. Different algorithms and software programming instructions have been developed to enable easy retrieval and use of data. Usually, algorithms have intended for retrieval and presentation of actual recorded information. Retrieving and using the actual data is one aspect of data utilization. But by using data mining techniques, it is possible to capture the pattern from the stored data to generate further information beyond the actual recorded information.

4.1 Algorithm evaluation and selection Data Mining is the process of extracting meaningful patterns from data [57]. There are different algorithms developed for data mining such as visualization, machine learning and statistics. However, a data mining algorithm based on machine learning technique produces a pattern which is easily understandable [58]. “Machine learning is the study of computational methods

for improving performance by mechanizing the acquisition of knowledge from experience”[59]. For different data mining tasks, several algorithms have been developed. Currently, there are several data mining algorithms used in different scenario in data mining such as classification, clustering, association and others. In [60], the IEEE International Conference on Data Mining (ICDM) has identified the top 10 most influential and wildly used data mining algorithms. The selection process had three evaluation phases. In the first phase, the researchers asked to nominate most influential and widely used machine learning algorithm and give their justification along with its representative publication reference. The second phase was to remove those nominations that don’t have at least 50 citations. The final phase was researchers voting on those algorithms successfully pass through the first and second phase. Passing through all these phases, top 10 most influential and widely used algorithms have identified and shown in Table 1.

Rank Algorithm name Category

1 C4.5 Classification 2 k-means Clustering 3 SVM Classification 4 Apriori Association analysis 5 EM Statistical learning 6 PageRank Link mining

- 23 -

7 Adaboost Ensemble learning 8 kNN Classification 9 Naïve Bayes Classification 10 CART Classification Table 1 : Top 10 algorithms

As indicated in the Table 1, five of them used for classification and the rest algorithms used for statistical learning, clustering, association analysis, ensemble learning and link mining task. Short descriptions about each of these algorithms have given below.

C4.5

Among decision tree algorithms, ID3 and C4.5 algorithms are the most influential decision tree algorithms [61]. C4.5 is a descendant of ID3. It is simple, effective and commonly used algorithm for data mining purpose [62][60]. It is a classification algorithm which uses divide and conquer strategy to produce a decision tree for classification. This algorithm can provide high classification accuracy and it works better for classification of instances that their attributes have small number of possible values and objects which have not conflicting attributes [62].

k-means

k-means algorithm is one among the simplest unsupervised learning algorithms. It is an iterative clustering algorithm which classifies the given instances in to user defined number of clusters [60]. The number of partition is fixed and specified in advance. SVM

SVM is a statistical set of supervised learning algorithms. In SVM the candidate instance will be classified in to either of two possible classes. The classification is done by dividing examples into two classes which are separated by a clear gap between them. The gap between the categories is known as the optimal separating hyper plane. SVM basically used for two class instance classification but it is also used for multiclass problems [63]. Apriori

Identifying associations between data is difficult and sometimes it is a complicated task. Apriori is an algorithm which is mostly used for mining of association rule from the data sets. Association rule mining is a method to find out the association knowledge hided in the dataset [64]. Its’ approach is, first it identifies frequent item sets from the instances and then it generates

the association rule.

EM

Expectation–Maximization (EM) algorithm is an algorithm which is mostly used to solve data clustering problems. It works by finding iteratively likelihoods between instances in the dataset. EM is a commonly used and more appropriate choice when there is a missing values or the data

- 24 -

is not complete because of the limitations on observation process or on a situation when there is a missing of parameters [65].

PageRank

PageRank is an algorithm which is used for measuring relative importance of documents in the World Wide Web. Originally the aim of the algorithm was to analyze the relative importance of a research article based on the number articles refers it. After a while, this aim developed and changed from measuring the importance of documents to measuring of the importance of web documents or web page. In this algorithm, PageRank value of web document is depend on the number of links pointed to it [66]. Google search engine has been built using this algorithm [60].

AdaBoost.

Boosting is supervised ensemble learning classifying algorithm[67], which is a combination of weak learners, to create a single strong learning algorithm. It combines the output from different modules to classify the input instance. AdaBoost is widely used ensemble classifier. Due to the diversity of weak classifier algorithms it comprises, it performs well [67]. Moreover, many research studies indicate that Adaboost is comparatively less exposed for over fitting problems.

KNN (k-nearest neighbor algorithm)

KNN is a classification algorithm which classify incoming instance is classified based on the majority vote of its neighbors. It works by memorizing entire training dataset and calculating the distance between the given instance and the samples from the training set. It is the most popular algorithm which used for long time for pattern recognition, exploratory data analysis and data mining problems [68]. But KNN is not able to classify instances that their attributes don’t

exactly match with any of the training dataset instances [60].

Naïve Bayes

Naïve Bayes algorithm is supervised learning classifier algorithm. It works by calculating the probabilistic likelihood of the given instance with the reference of the training dataset. It assumes the attributes of the datasets are independent to each other [69]. It is easy, understandable and performs classification very well [60].

CART

Classification and regression trees (CART) is a classification machine learning algorithm which employees decision tree technique. It generates either regression tree or classification tree based on dependant variable. CART applies recursive partitioning mechanism in order to build the tree. The generated tree will be used in processing of new instance to classify it [70].

- 25 -

Among the top 10 algorithms, C4.5, kNN, Naïve Bayes, SVM, CART and AdaBoost are used for classification in data mining process. Classification algorithms are supervised learning algorithms used for class prediction of a given instances after analyzing the pattern of attribute values and their class in the training dataset. In many researches, classification algorithms used to assist different aspect of health science like patient diagnosis, treatment and assistance[71][72][73]. As indicated in[74], AD patients' mental impairment level can be classified into number of stages (classes). These stages indicate the severity level of the disease on the patient. On the other hand, the data acquired from National Alzheimer’s Coordinating

Center University of Washington contains large number of instances contains real data of AD patients’ diagnosis and their result. The above mentioned algorithms can be a candidate among those top 10 machine learning algorithms mentioned in [60]. Nevertheless, since CART has low computation efficiency of extraction rules and long rule length and unstable decision tree excluded from the candidacy [75]. To identify the best algorithm among the candidates with regards to its suitability with NACC dataset, further comparison using data mining tool is required.

4.2 Data mining tools

Data Mining is the process of automatic or semi-automatic extraction of useful unknown information and patterns from real world different sized and complex data[76]. There are 12 popular open source data mining software and application used to perform different data mining tasks such as classification and clustering. These systems, such as ADAM, TANAGRA, WEKA, KNIME, AlphaMiner, Databionic ESOM, Gnome Data Miner, Mining Mart, MLC++, Orange, Rattle and YALE, are developed in either C++, Java, Python, R or C++ and Python programming languages. Except TANAGRA and MLC++, all the other systems can operate on Linux, Mac and Windows platform. The frequency and time of latest updates are low and medium in Gnome Data Miner and ADAM respectively but the rest updates highly. These data mining software are released under GPL (General Public License) except that of KNIME, Mining Mart, TANAGRA, MLC++ and ADAM. Based on their activity, license, language and platform as shown above, most of the systems have similar characteristics. In real world, most of the data sources have different formats. In the view of data source and usability, AlphaMiner, Rattle, WEKA and YALE have more ability to access different application data format with better human interaction and interoperability but Rattle has less capable for data preprocessing[77]. Out of the selected three data mining, (AlphaMiner, WEKA and YALE) WEKA is widely used and popular platform for sharing algorithms[78][79] [80].

4.3 WEKA

WEKA, (Waikato Environment Knowledge Analysis), is an open source GPL data mining software package written in Java that provides a collection of machine learning algorithms for a data mining task and available in http://www.cs.waikato.ac.nz/ml/weka/. WEKA has recognized

http://www.cs.waikato.ac.nz/ml/weka/

- 26 -

as a landmark system in machine learning and achieved widespread acceptance within different academic areas and become a widely used tool for data mining research[56]. Since WEKA being placed on Source-Forge in April 2000 until 2007, it was downloaded more than 1.4 million times [81] but in the 6 months of 2007, there were the average of 21,152 downloads per month. The main WEKA GUI has four different interfaces: the Explorer, Experimenter, Knowledge Flow, and simple CLI (command line) mode to access WEKA. WEKA has two primary modes; data exploration (Explorer) and an experiment (Experimenter) mode [82]. The WEKA Explorer is an application designed for exploring the data. It has functionality of data management (loading data, feeding an algorithm, assigning training and testing data and others), classification and reporting classification results. There are six different tabs that used for various purposes: Preprocess (to choose and modify the dataset), classify (to train learning that perform classification or regression), cluster (to learn cluster for the dataset), associate, select attributes and visualize. The WEKA Experimenter is enables the comparison of algorithm performance based on different evaluation criteria and shows significantly better classifier using setup, run and analyze tabs. The setup helps to select the dataset, classifiers, cross validation and other options. The experiment mode allows large-scale experiments to be run with results stored in a dataset. The result from these two modes can be affected by noise, error and missing values. Hence, before running the NACC dataset on explorer and experimenter, the dataset must be pass through data preparation process.

4.4 Data preparation

Data Preparation (DP) is the critical part of predictive algorithm in successful projects [83]. DP is an important part of the mining process[84] and it is 60% more time consuming than the whole data mining process[85][86]. Data preparation or processing steps are tedious and needs making multiple passes over the data before any scientific mining can begin[87]. There are some important aspects of data preparation [86] and these are:

The real world data might be impure due to missing value (empty attribute value), noisy data (having error) and inconsistence data which DP helps to clean it

DP generates a smaller dataset than original and this can significantly improve the efficiency of data mining by selecting relevant attribute and reducing instances

DP also generates quality data by filling missed values, correcting errors and others The data for the experiment will be prepared through data collection and overview, attribute evaluation and selection, and data preprocessing as shown in Figure 2.

- 27 -

Figure 2: Data preparation steps

4.5 Data overview In this study, the experiment is conducted by the data that was collected from May, 2005 to August 2011 by National Alzheimer's Coordinating Center (NACC), University of Washington. The sampling method that has used is an opportunity sampling as the data is simply grabbed from the university. The file is in CSV (Comma Separate Value) format and the total number of records in the file has more than 47,000 instances and 396 attributes. Each instance represents a single person information that diagnosed AD. Selecting a relevant attribute from this dataset is a difficult task without the NACC Uniform Data Set (UDS) data element dictionary for Initial Visit Packet (IVP) manual.

4.6 Attributes evaluation and selection As mentioned above, the dataset in this study consists of more than three hundred attributes. Out of these attributes, some of them are risk factors for AD. Varieties of factors associated with the AD patients have been identified in different researches. For this research, the factors can be divided into two categories; Medical history and cognitive function.

4.6.1 Medical history

Medical History of a patient can be an interview or questionnaire conducted by physicians; it includes personal and behavioral information. This information contains important factors that affect the diagnosis of AD and helps to assess AD’s patients. This information assists to evaluate and select the relevant attributes from the NACC dataset for this experiment. The AD's risk factors need to be retrieved from the dataset are age, family history of AD, smoking, alcohol, diabetes, hypertension, heart disease, obesity (BMI) and female gender [1][10][11].

Data collection

and overview

Attribute evaluation

and selection

Preprocessing the

record

- 28 -

4.6.2 Cognitive function

Currently, doctors are using a variety of tools and techniques to assess an AD patient’s memory. MMSE (Mini Mental Status Examination) is a standard and widely used screening test for assessing cognitive mental status and also used as a research tool to examine cognitive disorders [88]. The MMSE test is divided into eight categories [89]. The sample MMSE questions are shown below.

1. Orientation of place a. What is the address of your house?

2. Orientation of time a. What will be the day after tomorrow?

3. Register and recall a. Give three words and ask the patient to rewrite it sequentially after Question 4

4. Attention and calculation a. Begin with 100 and count down by 7 then stop after 5 subtraction.

Based on NACC UDS coding guidebook, the risk factors mentioned in Medical history and Cognitive function are extracted from NACC dataset. Table 1 shows the risk factors, their attribute name in the dataset and their description.

Risk Factor Name Attribute Name in NACC

Dataset

Remark

Age VISITYR, VISITMO, BIRTHYR and BIRTHMO

The age will be calculated as the difference of patient’s visiting time

and birth date. i.e. (VISITYR_VISITMO) – (BIRTHYR_BIRTHMO)

Family history of AD MOMDEM and DADDEM Does / did subject’s mother/father

have dementia, as indicated by symptoms, history or diagnosis?

Smoking

TOBAC30 Has the subject smoked within the last 30 days?

TOBAC100 Has the subject smoked more than 100 cigarettes in his/her life?

SMOKYRS Total years smoked PACKSPER Average number of packs per day

smoked QUITSMOK If subject quit smoking, specify age

when last smoked

- 29 -

Alcohol

ALCOHOL Having a significant problem occurring either in work, driving, legal or social

Diabetes DIABETES presence of Diabetes Hypertension HYPERTEN History or presence of hypertension Heart disease CVHATT Heart attack or cardiac arrest Obesity (BMI) WEIGHT and HEIGHT Subject’s weight and height in lbs and

inch respectively. These measurements will convert into Kg and meter. (WEIGHT / HEIGHT 2)

Sex SEX Subject’s gender MMSE MMSE Total MMSE score CDRGLOB Class Table 2 : Influential risk factors

4.7 Data Preprocessing

Real world datasets are usually contains noise, missing values, incomplete and inconsistent data which significantly decline the efficiency of data mining. These datasets tend to be too large and high-dimensional. For this reason, these datasets are not directly suitable for data mining [90]. In order to clean these impure data, data preprocessing is an important step before classification is performed [91]. In data preprocessing, the data is checked for missing value, noise and inconsistent data. For instance, a patient's age attribute value must be in numeric format and the field might not be filled (Missing data) or the field might be filled with erroneous value like "ETH" (Noise) or might be filled with a number but with unrealistic values e.g. 520 (inconsistent). Missing values occur when no data value is stored for the attribute of the current instance; it is also known as incomplete data [84]. There are different approaches to handle missing values in the instance such as listwise deletion, pairwise deletion, weighting techniques, single imputation and multiple imputations. Noise is a random error or variance of a measured variable and there are many possible reasons for noisy data, such as measurement errors during the data acquisition, human and computer errors at data entry. Different noise handling (denoising) techniques, such as manual inspection, binning methods, clustering and outlier detection, have been used in data mining [90]. The actual dataset is CSV format which is compatible to run in WEKA. In data preprocessing phase, visualization of data with such format can be done using Microsoft Excel, WEKA explorer interface and using other kind of techniques such as writing SQL queries in any relational database management systems (RDBMS). Using Microsoft Excel for data visualization is time taking and tedious but it is suitable and easy tool for data filtering task. Similarly, WEKA explorer interface can also be used for data visualization purpose but its drawback is it loads the

- 30 -

entire dataset on the system main memory for data analysis. As a result of this, it will be difficult to use WEKA explorer interface for visualization of a large instance dataset by using limited processing capacity such as personal computers. The final refined learning dataset for this research has 10,000 instances with 17 attributes. It occupies a large memory space if it loads in the main memory once. In this case writing SQL queries in one of RDBMS for manual data visualization and manipulation is a better solution. MYSQL is an open source relational database management system which can be used for storage and manipulation of data. The data visualization is done by using it. Simple queries in MYSQL can give information about the different aspect of the dataset such as how many number of instances belongs to each classes, how many of the instances contain some specific value in some specific column and others. Furthermore, it is also possible to manipulate the dataset easily by specifying criteria in the queries.

In the preprocessing phase, primarily, instances which contain blank data and incorrect values are removed from the dataset by filtering them using Microsoft excel. After these instances removed the total number of instances remained in the dataset become 28314. It was mandatory step to refine the dataset further to assure fair proportional distribution of values and classes in the dataset. In addition to this, the processing capacity of the computers dedicated for this research is limited. Beside this, it is necessary to minimize the number of instances in the dataset to a number which is WEKA experiment can be done on the dataset using the processing capacity of computers dedicated for the research, i.e. personal computer. Considering these two points, the primarily refined data further refined to have 10000 instances which have fair distribution of classes and values. The final refined dataset contains number of classes proportional to the original and primarily refined dataset. This indicated in Table 3 .

Class Name Number of instances

in the original dataset

instances included

after primary

refinement

instances included

after final refinement

In number In percent In number

In percent In number

In percent

No Impairment 20,507 43.5% 11655 41.1% 4,100 41% Questionable Impairment

12,973 27.5% 8189 28.9% 2,900 29%

Mild Impairment 8,003 19.7% 5112 18.0% 1800 18% Moderate Impairment

3,823 8.1% 2228 7.8% 790 7.9%

Severe Impairment 1,833 3.8% 1131 3.9% 410 4.1% Table 3 : Experiment data refinement proportion by class

As indicated in the table, the classes are represented by similar percentage of instances throughout the transition of the dataset. This shows that each class in the original dataset exists with equivalent proportion as it was in the original dataset. The final refined dataset contains

- 31 -

10000 instances out of 28314 primarily refined instances which is 35.3% of them. According to this proportion, the final dataset has to hold around 35% of values which exist in primarily refined dataset columns to make the value distribution proportional to the previously refined datasets. To check this, each column examined for their values distribution with regards to the previous datasets columns. Table 4 and Table 5 show the distribution of values in MMSE and sex columns in primarily refined and finally refined dataset. Based on the analysis, most of the columns values distribution is between 30-40 percent which is not far from the ratio of number of instances in the final dataset vs. the number of instances in primarily and finally refined dataset. The following two tables show the distribution of values in MMSE and sex columns of primarily refined and finally refined datasets. As indicated in the tables the distribution of values in primarily refined dataset vs. final dataset is in between 30.2 and 41.5 percent. It is not much far from the ratio of number of instances in primarily refined dataset vs. finally refined dataset i.e. that is 35.3%.

MMSE Number of Instances in primarily refined dataset

Number of Instances in finally refined dataset

percentage

Between 0 and 10 1190 427 35.8% Between 11 and 20 3522 1205 34.2% Between 21 and 30 16497 5746 34.8% Physical problem 74 28 37.8% Behavioral problem 334 101 30.2% Other problem 253 77 30.4% Verbal refusal 6444 2416 37.4% Table 4: MMSE result distribution in dataset by percent

Sex Number of Instances in primarily refined dataset

Number of Instances in finally refined dataset

percentage

Male 11453 4754 41.5% Female 16861 5246 31.1% Table 5: Sex distribution in dataset by percent

4.8 Simulation A simulation is a process of imitating the operation of a real world process or a system. In order to look the success of agents in modifying or minimizing misclassified instance a simulation code is written. This simulation code feeds the agent a randomly selected instance. The simulation code receives the instance and reproduces other instances by changing various attributes of the original instance periodically. These generated instances classified using the machine learning classifier or/and the agent classifier. The classification results will be generated in excel sheet document. Then the graph can easily be generated from it to compare and contrast

- 32 -

between the classification results of the machine learning based classifier and agent based classifier. By comparing the difference between these outputs, the modification can be seen visually.

- 33 -

Chapter 5 : Experiment result and analysis

In the literature review, the most computational efficient algorithms in machine learning are discussed and filtered based on different criteria including attribute and class type. Among these top 10 algorithms[60], Naïve Bayesian, SVM (SMO), KNN (IBk), AdaBoost and J48 (the WEKA re-implementation of C4.5) are candidate for WEKA experiment to compare and select significantly better algorithm based on their classification accuracy. This machine learning tool provides an optimal or significantly better algorithm from those five. The experiment is conducted on a Toshiba laptop with Intel core i5 @ 2.4GHz processor and 8GB RAM. These five algorithms' accuracy is tested on the same dataset that has 10,000 instances of patient data and run on WEKA using their default values. To run this dataset on the Toshiba machine takes around 6hrs.

5.1 WEKA experimentation’s result for medical history and cognitive

function data

WEKA has two primary modes: data exploration (Explorer) and an experiment (Experimenter) mode [82]. The WEKA Experimenter is enables the comparison of algorithm performance based on different evaluation criteria and shows significantly better classifier using setup, run and analyze tabs. Using medical history and cognitive function data, the experiment has been taken and the following result is recorded.

Figure 3 : Comparison of 5 different algorithms

At the bottom of each algorithm in Figure 3 of test output section, there are 0s and 1s within the braces; for instance, (0/0/1) is below 56.66*. In the sequence (a/ b/ c), if the algorithm is

- 34 -

represented with (1/0/0) is better, (0/1/0) is the same as, or (0/0/1) is worse than the baseline or reference algorithm on the dataset used in the experiment. The annotation ‘*’ next to the accuracy percentage indicates that a specific result is statistically worse than the reference algorithm. Based on the experiment, J4.8 is better algorithm with the accuracy of 61.12% than support vector machine, Naive Bayesian, k-nearest neighbor and AdaBoost. The classification accuracy of each of algorithms is indicated in Figure 4.

Figure 4 : Algorithms comparison by classification accuracy percentage

5.2 WEKA explorer interpretation

One of the primary modes of WEKA is data exploration (Explorer). Explorer mode provides an easy access to all of WEKA’s data preprocessing, learning, attribute selection, and data visualization modules in an environment that encourages initial exploration of the data. A summary of J48 classification result with its default setting on NACC dataset for each class is shown in Figure 5.

0

10

20

30

40

50

60

70

J48 SVM Naïve Bayes KNN AdaBoost

- 35 -

Figure 5 : J48 classifier output

From the total number of 10,000 NACC's dataset instances, the algorithm correctly classify 6110 records and the rest are misclassified. The detailed accuracy section shows the TP Rate, FP Rate, Precision, Recall, F-Measure and ROC areas. The green rectangles in confusion matrix are called diagonal element which shows the number of correctly classified instances for each class. TP (True Positive) is a proportion example in the positive class that predicted as a positive among all examples and it is equivalent to Recall[92]. From the confusion matrix, the TP rate can be calculated as the diagonal element divided by the sum of the corresponding row. i.e (TP/ (TP+FN)). For instance; TP Rate for class 'No' is: TP Rate= 3362/4081=0.823817 ~ 0.824 This shows that the dataset has 4081 'No' class and out of these, 3362 are predicted as 'No' and the rest are misclassified to another class. FP (False Positive) refers to a proportion example that has been classified as class ‘a’ but it

belongs to another class. In the confusion matrix, the FP Rate is calculated as the column sum of

- 36 -

class ‘b’ minus the diagonal element, divided by the rows sums of all other classes other than class ‘b’ and the formula is FP= (FP/ (FP+TN)) [92]. For instance; FP rate for class 'Moderate' is: FP Rate = (the sum of class ‘b’ column – diagonal value) / (the rows sums of class ‘a’, ’c’,’d’, ’e’) = (754-362) / (1800+4081+2919+391) =392 / 9191 =0.042650 ~ 0.043 Precision, P, is the proportion of the examples which truly have class ‘x’ among all those which

were classified as class ‘x’. In the confusion matrix, this is the correctly classified element

divided by the sum over the corresponding column and is calculated as P=TP/ (TP+FP). For instance; precision of class ‘No’ is: P=3362/(103+7+3362+1163+3) =3362/4638 =0.724881 ~ 0.725

F-Measure F, defined as the harmonic mean of precision and recall, provides a useful summary score for the algorithm[93][94]. F-measure, from the confusion matrix, can be calculated as twice the product of precision and recall divided by the sum of precision and recall. F= (2*P*R) / (P+R) For instance; F-Measure of class ‘No’ is: F= (2*0.725*0.824) / (0.725+0.82) = 1.1948 / 1.549 =0.771336 ~ 0.771

5.3 Customization of the selected algorithm In WEKA, the percentage of correctly classified instances using NACC dataset by J48 algorithm is 61.12%. In order to make the algorithm compatible with the NACC dataset, the algorithm must be customized. Basically the modification of J48 classifier has two goals: classify the new instance received from the user and increase the classification accuracy percentage by optimizing the parameters. The customized code is written in java and this enables to receive the instance from the users as an input and to classify it with J48. By setting three optimizing parameters for J48, the accuracy is increased to 87.09%. These parameters are binary split, confidence factor and unprunned which help to perform pruning and to increase post pruning when the tree is build. The summary and confusion matrix for the new algorithm is shown in Figure 6.

- 37 -

Figure 6 : Summary and confusion matrix of J48 classifier

The following graph shows the comparison between original and modified J48 algorithm based on their classification accuracy for each class.

Figure 7 : Comparison between original and modified J48 on classification accuracy

Apart from the improvement gained from the customization, the algorithm implemented on a prototype application which receives the input from users with GUI. The application prototype is developed using Java programming language. Swing is a toolkit for creating a graphical user interfaces (GUI) in Java applications and applets [95]. The following figure shows the interface of the developed application prototype.

Mild Moderat

e No

Questionable

Severe Total

Accuracy

Original J48 51.34 45.36 82.1 41.33 65.18 61.12

Customized J48 86.89 83.93 94.32 78.11 86.19 87.09

0 10 20 30 40 50 60 70 80 90

100

Accuracy in %

Original and Customized J48

- 38 -

Figure 8: A prototype for AD diagnosis

Detail description of how to use the application is included in Appendix A.

5.4 WEKA experimentation for Medical History and cognitive function

data On the first experiment, J48 algorithm classify instances with a better accuracy than the candidate algorithms. The dataset used for the experiment had medical history and cognitive function (CF) data in combination. In this section, the CF data separated and only medical history (MH) data is used in the experiment to check whether the separation or combination of these two AD diagnosis techniques data affect the accuracy of the classification. Three datasets are used for the second experiment. Medical history, cognitive function and their combination dataset. After the experiment is conducted in WEKA, the results indicate that J48 is a better classification algorithm than the other candidates. Then WEKA Explorer is used with three different dataset and the optimized J48. The classification accuracies of MH, CF and their combination are 81.42%, 64.20% and 87.09% respectively. The summary of classifier output of each dataset shown in Appendix B. This implies the classification accuracy when both of the diagnosis data merged together is better than of using a single diagnosis technique data and shown in Figure 9 .

- 39 -

Figure 9 : Classification accuracy of J48on MH, MMSE and combined datasets

5.5 Hypothesis

Depending on the nature of research questions, the hypothesis is formulated for the first two research questions: RQ1 and RQ2. The hypothesis used to make comparisons between different groups, and bring clearness in the research The hypothesis is carried out as follows:

RQ1: Which machine learning algorithm is significantly better to classify NACC's

medical health and cognitive function data for AD Diagnosis? Null Hypothesis H0_1: There is no significance difference between all candidate algorithms to classify NACC's dataset. Alternate Hypothesis H1_1: There is a significantly better algorithm in the classification of NACC's dataset. Dependent Variable: accuracy Independent Variables: NACC dataset, candidate algorithms, and experimental environment

RQ2: Does the combination or separation of medical history and cognitive function data affect the classification accuracy?

Null Hypothesis H0_2: The separated diagnosis techniques, medical history or cognitive function, data are not significantly different in prediction accuracy. Alternate Hypothesis H1_2: The combined diagnosis techniques, medical history and cognitive function, data gives a better prediction accuracy. Dependent Variable: accuracy Independent Variables: NACC dataset, candidate algorithms, and experimental environment

MH MMSE Combined

Accuracy 81.42 64.2 87.09

81.42

64.2

87.09

0

20

40

60

80

100

Acc

ura

cy in

%

Classification accuracy on the three datasets

- 40 -

5.6 Significance test Statistical significance testing is used to confirm how likely a result that assumes the null hypothesis to be proven as true. In this section, the statistical tests that used for comparing the five classifiers on NACC dataset are explored. The main reason of conducting statistical test is to validate the results found on the experiments. The experiments in the WEKA has a 95% confidence level and in statistical testing a p-value less than 0.05 is considered as statistically significant[96][97]. To check the significance difference, the CSV file prepared which contains the accuracy of the five candidate algorithms in 10 fold cross validation. Most of the algorithms' accuracy in each fold is normally distributed and the rest are nearly normal. For testing significance difference between candidate algorithms' accuracy and diagnosis techniques data, two tail independent sample t-test is selected. In order to perform the testing, MS Excel 2007 program is used for calculating the mean, variance and T-Test.

Figure 10 : T-test result of candidate algorithms

The t-test result in Figure 10 showed that, there is no significant difference is found in Naive Bayesian and SVM while J48 is accepted against KNN and AdaBoost. For this reason, a null hypothesis, there is no significance difference between all candidate algorithms to classify

NACC's dataset, is rejected.

J48 Naïve

Bayesian SVM KNN AdaBoost

Accuracy in% 87.09 46.54 56.94 51.32 54.69

0

10

20

30

40

50

60

70

80

90

100

Acc

ura

cy i

n %

T-test result of candidate algorithms

0.0005

-0.0025

0.0005

-0.0005

0.0005

- 41 -

Figure 11 : T-test result of datasets

The t-test result in Figure 11 showed that, the separated diagnosis techniques data are significantly different in prediction accuracy. For this reason, the alternate hypothesis, the

combined diagnosis techniques, medical history and cognitive function, data gives a better

prediction accuracy, is accepted.

5.7 Accuracy improvement using agent systems The confusion matrix indicated below on Error! Not a valid bookmark self-reference., shows the distribution of classification result using J48 algorithm and their actual class in the dataset.

Figure 12 : Confusion matrix of customized J48 algorithm

J48Combined J48MH J48MMSE

Series1 87.09 81.42 64.2

0

10

20

30

40

50

60

70

80

90

100

Acc

ura

cy in

%

T-test result of J48 accuracies on three datasets

0.05

0.05

dataset

- 42 -

The distribution of the results categorized into three categories. The numbers under the rectangle in the middle represent the number of correctly classified instances, that means the actual class in the dataset and the class predicted by the classifier are the same. Instances in the diagonal elements are 8709 which are 87.09 percent of the total instances 10,000. The remaining 1291 instances are misclassified instances either classified above or below the correct severity level. The second part of the categorization of the results in confusion matrix is a set of classification results indicated by bold triangle below the rectangle on the diagram. These are instances that their class is over estimated by the classifier. For instance the 2nd value on the 1st row under the bold triangle ‘179’ is the number of instances which are classified as ‘questionable AD’ class

while their actual class are ‘No AD’. They are totally 477 instances which are 4.77 percent of the total dataset. The last part of categorization in the confusion matrix is the set of instances above the rectangle elements. It indicated by dashed triangle on the diagram. This shows the set of instances which are under estimated by the classifier that means their predicted class value is bellow the actual class. For instance the 1st value on the 2nd row ‘485’ represents the number of instances classified as ‘No AD’ class by the classifier while they are actually labeled as

‘Questionable AD’ on the dataset. These instances are 814 in total which means 8.14 percent of the total dataset. In general, 12.91% of the total instance is misclassified by the classifier either by overestimating or underestimating the class of any given instance. Improving or minimizing misclassified instances on either of the cases means in other words improving the accuracy of the classifier.

Progressive changes on of these factors will affect the stage of the patient progressively. The customized J48 algorithm is simulated using instances which contain a single starting random instance and instances which are generated by increasing various attributes of this instance. The result indicates that continues increase or decrease on single attribute of the instance doesn’t

result much change on the output class from the J48 algorithm except changes in hypertension, diabetes, smoke years and MMSE attributes. This shows that these are the influential and sensitive factors in AD stage progression. For instance as shown on Figure 13, despite of an increase made on age attribute value of the instance the AD stage stays constant mostly.

- 43 -

Figure 13 : classification simulation result J48 algorithm with instances that their age value is continuously increasing

On the other hand as indicated on Figure 14 a continues increase on MMSE attribute of a randomly selected instance results a fluctuated AD stage changes that ranges from ‘No’ stage to ‘Severe’ stage incrementally.

Figure 14 : classification simulation result of J48 algorithm with instances that their MMSE value is continuously increasing

In graphs bellow this section, the severity level of AD is represented using numbers: 1= No, 2=Questionable, 3=Mild, 4=Moderate, 5= severe AD and CDRGLOB represents predicted class.

MMSE result and AD stage of the patient have inverse relationship. If the patient’s performance on MMSE test is decrease, it is an indication that the patient is either progressing to the worst

- 44 -

stage than the previous AD stage or the patient is in a same stage as the previous stage but with the high probability to progress to the worst stage. Despite this fact the J48 classifier in the simulation classify the inserted instances in a way which contradict with this fact. As we can see it from Figure 15Figure 15 sometimes when the MMSE result decrease the stage become better instead of be the same as it was before or worse.

Figure 15 : AD stage progression assumption vs. the J48 algorithm classification result

The bold dashed line in Figure 15 marks the assumption of the inverse relationship between MMSE result and AD stages which is mentioned above. The output in this graph shows some irregularities which contradict with this assumption. When the patient MMSE result is 26 and 25 the AD stage was ‘questionable AD’ stage(2) but even if the patient MMSE result decreases to 24 the patient classified as ‘No AD’ stage(1). Such kind of irregularities which contradict the inverse or direct relationship between progression factors and the stages actually caused by the misclassification which are part of 15.23 % inaccuracy of the selected classifier. By doing such kind of crosschecking between the assumption and the output of the classifier it is possible to identify whether the under estimate misclassification is occurred or not. It is difficult to be sure about over estimated classifications because since all the attributes are factors for the progression, one or multiple attributes can be a reason for stage progression to the worst stage beyond the normal progression. But when the factors progress to the worst and the stage indicate better stage instead of staying at similar stage or going to the worst stages, this implies under estimate misclassification is occurred. The third research question is based on this assumption.

In order to decide whether the patient stage prediction transgress this assumption or not, it is not enough to compare the immediate previous classification record with the current classification record instead it is necessary to look at attributes of the multiple previous records and their stage classification. This is because the immediate previous instance might be an instance overestimated by the classifier and biases the decision. Instead it is necessary to look at the

- 45 -

patterns of multiple previous classified instances and judge whether the current classification result falls under the category of underestimated misclassified instances or not. This task requires making dynamic analysis and intelligence decision. Our third research question requests how agent system can be used to identify when such kind of misclassification is occurred and how to use multi agent system to suggest better prediction for misclassified instance instead of the J48 algorithm prediction.

Efficient decision making in agent system can be achieved by synthesizing decisions across a collection of dispersed interacting local decision making components. There are different ways used to make a better AD stage suggestion when the machine learning algorithm misclassifies a given instance such as looking at the pattern of previously classified instances’ stages and

calculating the next probable class, analyzing each attributes separately to estimate their effect on the class, by applying online machine learning techniques and others. Each of these mechanisms enables us to make a stage suggestion but combining all of these mechanisms gives us a better suggestion instead of using one of them. The local suggestions suggested by each technique, evaluated by analyzing the previous achievements of these techniques. The best suggestion among these alternate suggestions considered as a global suggestion and used instead of the class suggested by J48 algorithm. In this case, multi agent system consists multiple agents which implement the above mentioned mechanisms to suggest a better class for misclassified instance. In this multi agent system agent classifier will be responsible to receive unclassified instance and first it will forward it to machine learning classifier instance. After receiving the result from the machine learning classifier agent, the agent classifier will analyze the suggested class by the machine learning classifier with respect to the previously classified instances. If the agent classifier believes the instance is misclassified by the machine learning classifier, it will forward the instance towards other classifier agents seeking their class suggestion. The best suggestion among these classifier agents will be selected by consulting the efficiency evaluator which keeps the achievement record of the class suggesting agents. The architecture of this multi agent system is discussed in the following section.

5.8 The agent architecture

In this architecture which indicated in Figure 16Figure 16, multiple cooperative agents that use different techniques to suggest a better class for misclassified instances work together with the machine learning classifiers to improve the classification accuracy. The description of each components of the multi agent system is given below the figure.

- 46 -

Figure 16: Multi agent architecture

5.8.1 Machine learning classifier (MLC)

This is an agent which implements the selected machine learning algorithm. Its selection process is discussed above in

- 47 -

: Experiment result and analysis part. This agent receives the instance and classifies it based on the pattern in the dataset. The dataset contains 10,000 instances of real AD patients’ data. It responds

reactively for each classification requests from the agent classifier agent.

5.8.2 Agent Classifier (AC)

The instance from the user arrived to this agent first. It will forward the instance towards MLC agent which has 87.09% classification accuracy. The response from this agent will be analyzed by comparing it with previously classified instances which periodically recorded in classified instances database. If this agent convinced that the classification result from MLC agent is not correct according to AD stage progression assumption which mentioned above, it will announce the classification request for the rest of classifier agents using contract net protocol. The agent can apply either of the following two approaches. The first approach is broadcasting the request to the classifier agents and selecting the best suggestion by analyzing the suggestion itself and the previous achievement of the classifier that suggest the class. The second alternative is direct sending classification request to specific selected classifier agent based on its pervious achievement. Both of these approaches have advantage and disadvantage. The advantage in the 1st approach is the best suggestion will not select merely based on the classifier pervious performance instead the quality of the current suggestion will be considered. The disadvantage of this approach is it always requires a multicasting which is resource intensive. Moreover, it requires analyzing all suggestions from all classifiers. The advantage and disadvantage of 2nd approach is the reverse for the 1st approach. Passing all these processes the agent respond a better class for the instance received from the user.

5.8.3 Efficiency evaluator (EE)

This component calculates the efficiency of the classifier agents periodically. The reference for this calculation is the MLC agent which uses the selected machine learning algorithm. After the class suggested by any of the classifier agents taken as a global decision, it will be stored in the classified instances database and it will be compare with the next successful classification by the machine learning classifier agent. If the next successfully classified instance class by the machine learning classifier resembles more with the class currently suggested by the classifier agent, the efficiency score of the classifier agent in the efficiency evaluator database will be increase. If not, the score will be decrease.

5.8.4 Classifier agents

As indicated on the figure of the architecture, multiples of agents, which implements different classification techniques participate in suggesting alternate class for the misclassified instance. These classifier agents suggest a class for instance they receive them from the agent classifier using a technique which reflect their name.

- 48 -

5.8.5 Classification history analyzer (CHA)

This agent suggests a class for the given instance by looking the pattern of previously classified instances. All the possible class will be considered and their possibility will be calculated. For the beginning it starts by accepting the classification result of the machine learning classifier. Afterwards, the voting mechanism used to calculate the possibility of each possible class. Every time when any classification takes place, whether it is using the agent classifiers or using the MLC, the vote for the current selected class will be increase and the vote for the classes which are far from this class will be diminished. This helps to balance the influences of the most repeated suggestion and the influence of most recent suggestions on the voting.

5.8.6 Influential factors analyzer (IFA)

All the attributes of the instances are factors for the progression of AD stage. Some of them are significantly influential and some are less influential. This agent identifies the most influential factor among the attribute of instance and evaluate how extent each attributes individual value change will affect the class. Base on the combination of these evaluations, the agent will suggest what class should be for the new instance.

5.8.7 Online learner (OL)

Online learning is one branch of machine learning. Online learning algorithm learns through time incrementally by looking the pattern of previous predictions [98]. The instances and their classification result stored periodically in classified instances database. This accumulation of data will be used by the online learner agent to learn the pattern of the classification and to suggest a class for misclassified instances by the MLC.

5.9 Simulation result

The simulation result shows that, the agent classifiers can suggest a better class when the machine learning classifier class is bellow the assumption. The graph in Figure 17 shows the simulation result when only the machine learning classifier is used.

- 49 -

Figure 17 : J48 classifier classification result with instances continuously decreasing in MMSE result

The bold dashed line in Figure 17Figure 17 indicates the inverse relationship between the MMSE result and AD stage (class) progression. When the MMSE result decline the AD stage should deteriorate. But the classification output shown in the figure shows some irregularities that contradict with this assumption especially at the end of the graph. The purpose of having other agent classifiers was to avoid or to minimize such irregularities by suggesting a stage which coincides or near to this assumption.

In Figure 18, the implementation of an agent which combines the machine learning technique with the pattern tracking technique enables to minimize some of these irregularities. This agent tracks the pattern of previously classified instances and suggests a stage based on that observation.

- 50 -

Figure 18 : pattern tracking agent classification result with instances continuously decreasing in MMSE result

Similarly the implementation of another agent which combines the machine learning technique with instances attributes analysis technique able to deliver similar result. The agent in this simulation analyzes the change between the value of previous instance and the values of the current instance and suggests a class based on the size of the change. The output of the simulation of this agent is indicated in Figure 19.

Figure 19 : attribute analyzer agent classification result with instances continuously decreasing in MMSE result

- 51 -

As indicated on the above 3 figures some of misclassification replaced by other classes which not contradict with the assumption. The result of these agents also checked by using other instances that their multiple attributes values change simultaneously. The output shows similar result.

- 52 -

Chapter 6 : Conclusion and future work

This study used WEKA tool in order to address the first two research questions. In WEKA, more than 70 algorithms are available that are used for classification and regression. Experiment on WEKA and literature review used to answer the first research question. There are ten mostly used algorithms in data mining that used for classification, clustering, association and statistical analysis[60]. Five of them can be candidates to classify the NACC dataset which consist of 10,000 instances. To select a significant classifier that has a better accuracy, the same dataset has been used for the five algorithms. After the experiment, the accuracy percentage for J48, SVM, Naive Bayes, KNN and AdaBoost were 61.12, 56.66, 46.01, 51.07 and 54.80 respectively. Hence, J48 algorithm is significantly better to classify NACC's medical health and cognitive function data for AD Diagnosis

To address the second research question, another experiment in WEKA conducted. In this experiment, the medical history (MH) and cognitive function (CF) data of the NACC dataset was used. The result showed that J48 had better classification accuracy than the candidates. The detailed classification accuracy of the optimized J48 in the second experiment was conducte in WEKA Explorer. The classification accuracies of MH, CF and their combination datasets are 81.42%, 64.20% and 87.09% respectively. This implies that the classification accuracy when both medical history and cognitive function data merged together is better than of using only medical history data to diagnose AD.

In order to answer the third research question, both literature review and simulation experiment used. A multi agent system model was designed to improve the prediction accuracy achieved by the machine learning classifier. This multi agent system held a classifier agent that implements the selected machine learning algorithm and other agents which implement different techniques for suggesting a class when misclassification occurs. Agents which are part of the MAS were implemented and tested using a simulation code. The simulation result showed that the suggested MAS can be used to minimize misclassified instances.

The existing AD diagnosis are expensive, time taking and require the patient physical attendance in the hospitals for diagnosis and periodic checkups. Machine learning and multi agent system techniques combined in this study to diagnose AD and predict its stage. Combining a machine learning with MAS to diagnose AD patients using medical history and cognitive function data in NACC dataset is a new finding. This finding can be integrated in a web based application accessible on tablet PCs to enable the patients to make checkups remotely by themselves or by the help of their caregivers.

The stage prediction using machine learning and agent system research is in its early stage. There might be other possible alternative methods that can improve this thesis work. These ideas need to be researched and applied on different devices.

- 53 -

References

[1] S. Joshi, D. Shenoy, G. G. Vibhudendra Simha, P. L. Rrashmi, K. R. Venugopal, and L. M. Patnaik, “Classification of Alzheimer’s Disease and Parkinson’s Disease by Using Machine Learning and Neural Network Methods,” in Machine Learning and Computing (ICMLC), 2010 Second International Conference on, 2010, pp. 218 –222.

[2] “2011 Alzheimer’s disease facts and figures,” Alzheimer’s and Dementia, vol. 7, no. 2, pp. 208–244, Mar. 2011.

[3] R. Kohavi, G. John, R. Long, D. Manley, and K. Pfleger, “MLC++: a machine learning library in C++,” in Tools with Artificial Intelligence, 1994. Proceedings., Sixth International Conference on, 1994, pp. 740 –743.

[4] C. Drummond, “Machine learning as an experimental science,” in 2006 AAAI Workshop, July 16, 2006 - July 20, 2006, Boston, MA, United states, 2006, vol. WS-06–06, pp. 1–5.

[5] G. Holmes, A. Donkin, and I. H. Witten, “WEKA: a machine learning workbench,” in Proceedings of ANZIIS ’94 - Australian New Zealnd Intelligent Information Systems Conference, 29 Nov.-2 Dec. 1994, New York, NY, USA, 1994, pp. 357–61.

[6] J. Hua, “Study on the application of rough sets theory in machine learning,” in 2008 2nd International Symposium on Intelligent Information Technology Application, IITA 2008, December 21, 2008 - December 22, 2008, Shanghai, China, 2008, vol. 1, pp. 192–196.

[7] N. R. Jennings, K. Sycara, and M. Wooldridge, “A Roadmap of Agent Research and Development,” Autonomous Agents and Multi-Agent Systems, vol. 1, no. 1, pp. 7–38, Mar. 1998.

[8] J. Tweedale, N. Ichalkaranje, C. Sioutis, B. Jarvis, A. Consoli, and G. Phillips-Wren, “Innovations in multi-agent systems,” Journal of Network and Computer Applications, vol. 30, no. 3, pp. 1089–1115, Aug. 2007.

[9] S. Joshi, P. Deepa Shenoy, K. R. Venugopal, and L. M. Patnaik, “Evaluation of different stages of dementia employing neuropsychological and machine learning techniques,” in Advanced Computing, 2009. ICAC 2009. First International Conference on, 2009, pp. 154 –160.

[10] F. Richard and P. Amouyel, “Genetic susceptibility factors for Alzheimer’s disease,” European Journal of Pharmacology, vol. 412, no. 1, pp. 1–12, Jan. 2001.

[11] A. V. Suhanov, P. I. Pilipenko, A. D. Korczyn, A. Hofman, M. I. Voevoda, S. V. Shishkin, G. I. Simonova, Y. P. Nikitin, and V. L. Feigin, “Risk factors for Alzheimer’s disease in Russia: a case–control study,” European Journal of Neurology, vol. 13, no. 9, pp. 990–995, Sep. 2006.

[12] The Duy Bui, Duy Khuong Nguyen, and Tien Dat Ngo, “Supervising an unsupervised neural network,” in 2009 First Asian Conference on Intelligent Information and Database Systems, ACIIDS, 1-3 April 2009, Piscataway, NJ, USA, 2009, pp. 307–12.

[13] S. Chakrabarti, “Data mining for hypertext: a tutorial survey,” SIGKDD Explor. Newsl., vol. 1, no. 2, pp. 1–11, Jan. 2000.

[14] M. S. Mouchaweh, “Learning in Dynamic Environments: Application to the Identification of Hybrid Dynamic Systems,” in Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on, 2010, pp. 555 –560.

[15] R. Nock and F. Nielsen, “Bregman Divergences and Surrogates for Learning,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 11, pp. 2048 –2059, Nov. 2009.

[16] B.-F. Zhang, J.-S. Su, and X. Xu, “A Class-Incremental Learning Method for Multi-Class Support Vector Machines in Text Classification,” in Machine Learning and Cybernetics, 2006 International Conference on, 2006, pp. 2581 –2585.

[17] M.-L. Zhang and Z.-H. Zhou, “Adapting RBF Neural Networks to Multi-Instance Learning,” Neural Processing Letters, vol. 23, no. 1, pp. 1–26, 2006.

[18] M. Sayed Mouchaweh, “Semi-supervised classification method for dynamic applications,” Fuzzy Sets and Systems, vol. 161, no. 4, pp. 544–563, Feb. 2010.

- 54 -

[19] T. L. Bailey and C. Elkan, “Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization,” Machine Learning, vol. 21, no. 1, pp. 51–80, 1995.

[20] A. Aldea, B. Lopez, A. Moreno, D. Riano, and A. Valls, “A multi-agent system for organ transplant co-ordination,” in Artificial Intelligence in Medicine. 8th Conference on Artificial Intelligence in Medicine in Europe, AIME 2001. Proceedings, 1-4 July 2001, Berlin, Germany, 2001, pp. 413–16.

[21] F. Bergenti and A. Poggi, “Developing Smart Emergency Applications with Multi-Agent Systems,” International Journal of E-Health and Medical Communications, vol. 1, no. 4, pp. 1–13, Oct. 2010.

[22] J. F. De Paz, S. Rodriguez, J. Bajo, J. M. Corchado, and E. S. Corchado, “OVACARE: A multi-agent system for assistance and health care,” in 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2010, September 8, 2010 - September 10, 2010, Cardiff, United kingdom, 2010, vol. 6279 LNAI, pp. 318–327.

[23] K. Crockett, J. O’Shea, and Z. Bandar, “Goal orientated conversational agents: Applications to benefit society,” in 5th KES International Conference on Agent and Multi-Agent Systems: Technologies and Applications, KES-AMSTA 2011, June 29, 2011 - July 1, 2011, Manchester, United kingdom, 2011, vol. 6682 LNAI, pp. 16–25.

[24] J. O’Shea, Z. Bandar, and K. Crockett, “Using a Slim Function Word Classifier to Recognise Instruction Dialogue Acts,” in Agent and Multi-Agent Systems: Technologies and Applications, vol. 6682, J. O’Shea, N. Nguyen, K. Crockett, R. Howlett, and L. Jain, Eds. Springer Berlin / Heidelberg, 2011, pp. 26–34.

[25] R. Abdulrahman, D. Neagu, and D. Holton, “Multi Agent System for Historical Information Retrieval from Online Social Networks,” in Agent and Multi-Agent Systems: Technologies and Applications, vol. 6682, J. O’Shea, N. Nguyen, K. Crockett, R. Howlett, and L. Jain, Eds. Springer Berlin / Heidelberg, 2011, pp. 54–63.

[26] V. Podobnik and I. Lovrek, “An Agent-Based Platform for Ad-Hoc Social Networking,” in Agent and Multi-Agent Systems: Technologies and Applications, vol. 6682, J. O’Shea, N. Nguyen, K. Crockett, R. Howlett, and L. Jain, Eds. Springer Berlin / Heidelberg, 2011, pp. 74–83.

[27] T. Laroum and B. Tighiouart, “A multi-agent system for the modelling of the HIV infection,” in Agent and Multi-Agent Systems: Technologies and Applications. 5th KES International Conference, KES-AMSTA 2011, 29 June-1 July 2011, Berlin, Germany, 2011, pp. 94–102.

[28] C. Zanni-Merk, S. Almiron, and D. Renaud, “A multi-agents system for analysis and diagnosis of SMEs,” in Agent and Multi-Agent Systems: Technologies and Applications. 5th KES International Conference, KES-AMSTA 2011, 29 June-1 July 2011, Berlin, Germany, 2011, pp. 103–12.

[29] S. Telghamti and R. Maamri, “Towards a Predictive Fault Tolerance Approach in Multi-Agent Systems,” in Agent and Multi-Agent Systems: Technologies and Applications. 5th KES International Conference, KES-AMSTA 2011, 29 June-1 July 2011, Berlin, Germany, 2011, pp. 123–9.

[30] K. Kuakowski and T. Stepien, “Dynamic world model with the lazy potential function,” in 5th KES International Conference on Agent and Multi-Agent Systems: Technologies and Applications, KES-AMSTA 2011, June 29, 2011 - July 1, 2011, Manchester, United kingdom, 2011, vol. 6682 LNAI, pp. 190–199.

[31] J. Gascueña, E. Navarro, and A. Fernández-Caballero, “VigilAgent for the Development of Agent-Based Multi-robot Surveillance Systems,” in Agent and Multi-Agent Systems: Technologies and Applications, vol. 6682, J. O’Shea, N. Nguyen, K. Crockett, R. Howlett, and L. Jain, Eds. Springer Berlin / Heidelberg, 2011, pp. 200–210.

[32] K. Fuks and A. Kawa, “Virtual organization networking strategies - simulations experiments,” in Agent and Multi-Agent Systems: Technologies and Applications. 5th KES International Conference, KES-AMSTA 2011, 29 June-1 July 2011, Berlin, Germany, 2011, pp. 602–9.

[33] P. Golinska and M. Hajdul, “Multi-agent coordination mechanism of virtual supply chain,” in 5th KES International Conference on Agent and Multi-Agent Systems: Technologies and Applications,

- 55 -

KES-AMSTA 2011, June 29, 2011 - July 1, 2011, Manchester, United kingdom, 2011, vol. 6682 LNAI, pp. 620–629.

[34] A. Hector, F. Henskens, and M. Hannaford, “A software engineering process for BDI agents,” in 2nd KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications, KES-AMSTA 2008, March 26, 2008 - March 28, 2008, Incheon, Korea, Republic of, 2008, vol. 4953 LNAI, pp. 132–141.

[35] Yen-Tsung Pan and Men-Shen Tsai, “Development a BDI-based intelligent agent architecture for distribution systems restoration planning,” in 2009 15th International Conference on Intelligent System Applications to Power Systems (ISAP), 8-12 Nov. 2009, Piscataway, NJ, USA, 2009, p. 6 pp.

[36] M. D’Inverno, M. Luck, M. Georgeff, D. Kinny, and M. Wooldridge, “The dMARS architecture: A specification of the distributed multi-agent reasoning system,” Autonomous Agents and Multi-Agent Systems, vol. 9, no. 1–2, pp. 5–53, 2004.

[37] J. C. Oh, M. S. Tamhankar, and D. Mosse, “Design of Very Lightweight Agents for reactive embedded systems,” in Proceedings 10th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems. ECBS 2003, 7-10 April 2003, Los Alamitos, CA, USA, 2003, pp. 149–58.

[38] F. Ganjeizadeh, “Robotics and Computer-Integrated Manufacturing: Preface,” Robotics and Computer-Integrated Manufacturing, vol. 27, no. 4, pp. 667–668, 2011.

[39] H. Nakashima and I. Noda, “Dynamic subsumption architecture for programming intelligent agents,” in Proceedings International Conference on Multi Agent Systems, 3-7 July 1998, Los Alamitos, CA, USA, 1998, pp. 190–7.

[40] E. E. Tripoliti, D. I. Fotiadis, and M. Argyropoulou, “A supervised method to assist the diagnosis of Alzheimer’s Disease based on functional Magnetic Resonance Imaging,” in Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE, 2007, pp. 3426 –3429.

[41] E. E. Tripoliti, D. I. Fotiadis, and M. Argyropoulou, “An automated supervised method for the diagnosis of Alzheimer #x2019;s disease based on fMRI data using weighted voting schemes,” in Imaging Systems and Techniques, 2008. IST 2008. IEEE International Workshop on, 2008, pp. 340 –345.

[42] E. E. Tripoliti, D. I. Fotiadis, and M. Argyropoulou, “A supervised method to assist the diagnosis and monitor progression of Alzheimer’s disease using data from an fMRI experiment,” Artificial Intelligence in Medicine, vol. 53, no. 1, pp. 35–45, Sep. 2011.

[43] F. L. Seixas, A. S. de Souza, A. Plastino, D. Saade, and A. Conci, “Clinical dementia rating score prediction based on MR segmentation,” in Bioinformatics and Biomeidcine Workshops, 2008. BIBMW 2008. IEEE International Conference on, 2008, pp. 113 –114.

[44] W. Yang, X. Li, and X. Huang, “Automatic classification of Alzheimer’s patients and age-matched healthy subjects using independent component analysis,” in Biomedical Engineering and Informatics (BMEI), 2011 4th International Conference on, 2011, vol. 1, pp. 151 –154.

[45] D. Zhang, M. Hariz, and M. Mokhtari, “Assisting elders with mild dementia staying at home,” in 6th Annual IEEE International Conference on Pervasive Computing and Communications, PerCom 2008, March 17, 2008 - March 21, 2008, Hong Kong, Hong kong, 2008, pp. 692–697.

[46] D. Hudson and M. Cohen, “Intelligent agents in home healthcare,” Annals of Telecommunications, vol. 65, no. 9, pp. 593–600, 2010.

[47] J. Wan, C. Byrne, G. M. P. O’Hare, and M. J. O’Grady, “OutCare: Supporting dementia patients in outdoor scenarios,” in 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2010, September 8, 2010 - September 10, 2010, Cardiff, United kingdom, 2010, vol. 6279 LNAI, pp. 365–374.

- 56 -

[48] J. M. Corchado, J. Bajo, and A. Abraham, “GerAmi: Improving healthcare delivery in geriatric residences,” Intelligent Systems, IEEE, vol. 23, no. 2, pp. 19–25, 2008.

[49] S. Nefti, U. Manzoor, and S. Manzoor, “Cognitive agent based intelligent warning system to monitor patients suffering from dementia using ambient assisted living,” in Information Society (i-Society), 2010 International Conference on, 2010, pp. 92 –97.

[50] V. Pigadas, C. Doukas, V. P. Plagianakos, and I. Maglogiannis, “Enabling constant monitoring of chronic patient using Android smart phones,” in Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA, 2011, pp. 63:1–63:2.

[51] Chung-Chih Lin, Ming-Jang Chiu, Chun-Chieh Hsiao, Ren-Guey Lee, and Yuh-Show Tsai, “Wireless health care service system for elderly with dementia,” IEEE Transactions on Information Technology in Biomedicine, vol. 10, no. 4, pp. 696–704, Oct. 2006.

[52] A. Serna, H. Pigot, and V. Rialle, “Modeling the progression of Alzheimer’s disease for cognitive assistance in smart homes,” User Modeling and User-Adapted Interaction, vol. 17, no. 4, pp. 415–438, 2007.

[53] M. Silveira and J. Marques, “Boosting Alzheimer Disease Diagnosis Using PET Images,” in 2010 20th International Conference on Pattern Recognition (ICPR), 2010, pp. 2556 –2559.

[54] N. Chilukoti, K. Early, S. Sandhu, C. Riley-Doucet, and D. Debnath, “Assistive technology for promoting physical and mental exercise to delay progression of cognitive degeneration in patients with dementia,” in IEEE Biomedical Circuits and Systems Conference Healthcare Technology, BiOCAS2007, November 27, 2007 - November 30, 2007, Montreal, QC, Canada, 2007, pp. 235–238.

[55] I. Mrcela, V. Sunde, and Z. Bencic, “Method for selection of switches for power electronic converters,” in 2012 Proceedings of the 35th International Convention MIPRO, 2012, pp. 129 –134.

[56] C. Yun and J. Yang, “Experimental Comparison of Feature Subset Selection Methods,” in Seventh IEEE International Conference on Data Mining Workshops, 2007. ICDM Workshops 2007, 2007, pp. 367 –372.

[57] D. Kudenko and M. Grzes, “Knowledge-based reinforcement learning for data mining,” in 4th International Workshop on Agents and Data Mining Interaction, ADMI 2009, May 10, 2009 - May 15, 2009, Budapest, Hungary, 2009, vol. 5680 LNAI, pp. 21–22.

[58] B. Indranil and R. K. Mahapatra, “Business data mining-a machine learning perspective,” Information and Management, vol. 39, no. 2, pp. 211–25, Dec. 2001.

[59] P. Langley and H. A. Simon, “Applications of machine learning and rule induction,” Communications of the ACM, vol. 38, no. 11, pp. 54–64, 1995.

[60] Xindong Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Qiang Yang, H. Motoda, G. J. McLachlan, A. Ng, Bing Liu, P. S. Yu, Zhi-Hua Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, Jan. 2008.

[61] Li Rui, Wei Xian-mei, and Yu Xue-wei, “The improvement of C4.5 algorithm and case study,” in 2009 Second International Symposium on Computational Intelligence and Design (ISCID 2009), 12-14 Dec. 2009, Piscataway, NJ, USA, 2009, vol. vol.2, pp. 190–2.

[62] W. Jia and L. Huang, “Improved C4.5 decision tree,” in International Conference on Internet Technology and Applications, ITAP 2010, August 21, 2010 - August 23, 2010, Wuhan, China, 2010, p. IEEE Wuhan Section; University of Wisconsin at La Crosse; Wuhan University of Technology; Wuhan University.

[63] Zhifeng Li and Xiaoou Tang, “Using support vector machines to enhance the performance of Bayesian face recognition,” IEEE Transactions on Information Forensics and Security, vol. 2, no. 2, pp. 174–80, Jun. 2007.

- 57 -

[64] Changsheng Zhang and Jing Ruan, “A modified Apriori algorithm with its application in instituting cross-selling strategies of the retail industry,” in 2009 International Conference on Electronic Commerce and Business Intelligence, ECBI, 6-7 June 2009, Piscataway, NJ, USA, 2009, pp. 515–18.

[65] N. Balakrishnan and M. H. Ling, “EM algorithm for one-shot device testing under the exponential distribution,” Computational Statistics and Data Analysis, vol. 56, no. 3, pp. 502–509, 2012.

[66] Z. Peng, X. Xiu, and Z. Ming, “An efficient improved strategy for the PageRank algorithm,” in International Conference on Management and Service Science, MASS 2011, August 12, 2011 - August 14, 2011, Wuhan, China, 2011, p. IEEE Wuhan Section; Wuhan University; Nankai University; Fuzhou University; University of Science and Technology Beijing.

[67] T.-K. An and M.-H. Kim, “A new Diverse AdaBoost classifier,” in 2010 International Conference on Artificial Intelligence and Computational Intelligence, AICI 2010, October 23, 2010 - October 24, 2010, Sanya, China, 2010, vol. 1, pp. 359–363.

[68] G. Bhattacharya, K. Ghosh, and A. S. Chowdhury, “An affinity-based new local distance function and similarity measure for kNN algorithm,” Pattern Recognition Letters, vol. 33, no. 3, pp. 356–363, 2012.

[69] M. J. Meena and K. R. Chandran, “Naive Bayes text classification with positive features selected by statistical method,” in 2009 First International Conference on Advanced Computing (ICAC 2009), 13-15 Dec. 2009, Piscataway, NJ, USA, 2009, pp. 28–33.

[70] S. Bianco, G. Ciocca, and C. Cusano, “Color constancy algorithm selection using CART,” in Computational Color Imaging. Second International Workshop, CCIW 2009, 26-27 March 2009, Berlin, Germany, 2009, pp. 31–40.

[71] D. Selvathi and V. S. Sharnitha, “Thyroid Classification and Segmentation in Ultrasound Images Using Machine Learning Algorithms,” in 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011), 21-22 July 2011, Piscataway, NJ, USA, 2011, pp. 836–41.

[72] Y. Choi, A. S. Ralhan, and S. Ko, “A study on Machine Learning Algorithms for Fall Detection and Movement Classification,” in 2011 International Conference on Information Science and Applications (ICISA 2011), 26-29 April 2011, Piscataway, NJ, USA, 2011, p. 8 pp.

[73] C. K. Yoo and K. V. Gernaey, “Classification and diagnostic output prediction of cancer using gene expression profiling and supervised machine learning algorithms,” Journal of Chemical Engineering of Japan, vol. 41, no. 9, pp. 898–914, 2008.

[74] B. Reisberg, R. Doody, A. Stöffler, F. Schmitt, S. Ferris, and H. J. Möbius, “Memantine in moderate-to-severe Alzheimer’s disease,” New England Journal of Medicine, vol. 348, no. 14, pp. 1333–1341, 2003.

[75] Y. Zu-Qiao, X. Xiao-Hong, and G. Han-ping, “An Improved DM Algorithm Based on Rough Set Theory,” in International Conference on Wireless Communications, Networking and Mobile Computing, 2007. WiCom 2007, 2007, pp. 3097 –3100.

[76] F. Palacios, X. Li, and L. E. Rocha, “Data Mining based on CMAC Neural Networks,” in 2006 3rd International Conference on Electrical and Electronics Engineering, 2006, pp. 1 –4.

[77] X. Chen, Y. Ye, G. Williams, and X. Xu, “A survey of open source data mining systems,” Emerging Technologies in Knowledge Discovery and Data Mining, pp. 3–14, 2007.

[78] V. Sugumaran and K. I. Ramachandran, “Effect of number of features on classification of roller bearing faults using SVM and PSVM,” Expert Systems with Applications, vol. 38, no. 4, pp. 4088–4096, Apr. 2011.

[79] X. Xie, J. W. K. Ho, C. Murphy, G. Kaiser, B. Xu, and T. Y. Chen, “Testing and validating machine learning classifiers by metamorphic testing,” Journal of Systems and Software, vol. 84, no. 4, pp. 544–558, Apr. 2011.

- 58 -

[80] X. Xie, J. Ho, C. Murphy, G. Kaiser, B. Xu, and T. Y. Chen, “Application of metamorphic testing to supervised classifiers,” in Quality Software, 2009. QSIC’09. 9th International Conference on, 2009, pp. 135–144.

[81] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18, 2009.

[82] M. A. Hall and G. Holmes, “Benchmarking attribute selection techniques for discrete class data mining,” Knowledge and Data Engineering, IEEE Transactions on, vol. 15, no. 6, pp. 1437–1447, 2003.

[83] Lian Yan, R. H. Wolniewicz, and R. Dodier, “Predicting customer behavior in telecommunications,” IEEE Intelligent Systems, vol. 19, no. 2, pp. 50–8, Mar. 2004.

[84] S. Zhang, “Shell-neighbor method and its application in missing data imputation,” Applied Intelligence, vol. 35, no. 1, pp. 123–133, 2011.

[85] G. Ilczuk and A. Wakulicz-Deja, “Data preparation for data mining in medical data sets,” in Transactions on Rough Sets VI, TRS, Tiergartenstrasse 17, Heidelberg, D-69121, Germany, 2007, vol. 4374 LNCS, pp. 83–93.

[86] S. Zhang, C. Zhang, and Q. Yang, “Data preparation for data mining,” Applied Artificial Intelligence, vol. 17, no. 5–6, pp. 375–381, 2003.

[87] R. Ramachandran, S. Graves, S. Movva, and X. Li, “Agent framework for intelligent data processing,” in 2004 IEEE International Geoscience and Remote Sensing Symposium Proceedings: Science for Society: Exploring and Managing a Changing Planet. IGARSS 2004, September 20, 2004 - September 24, 2004, Anchorage, AK, United states, 2004, vol. 1, pp. 164–167.

[88] T. Tamura, M. Tshji, Y. Higashi, M. Sekine, M. A. Kohdabashi, T. Fujimoto, and M. Mitsuyama, “New Computer-based Cognitive Function Test for the Elderly,” in Engineering in Medicine and Biology Society, 2006. EMBS’06. 28th Annual International Conference of the IEEE, 2006, pp. 692–694.

[89] S. Ramanujam, N. Purohit, and N. Bhoir, “Assessment of Cognitive Level for Alzheimer’s Using Neural Networks,” in Intelligent Systems, Modelling and Simulation (ISMS), 2011 Second International Conference on, 2011, pp. 13–18.

[90] T. Li, Q. Li, S. Zhu, and M. Ogihara, “A survey on wavelet applications in data mining,” SIGKDD Explor. Newsl., vol. 4, no. 2, pp. 49–68, Dec. 2002.

[91] T. A. Dickinson, J. White, J. S. Kauer, and D. R. Walt, “Current trends in `artificial-nose’ technology,” Trends in Biotechnology, vol. 16, no. 6, pp. 250–258, Jun. 1998.

[92] J. Davis and M. Goadrich, “The relationship between Precision-Recall and ROC curves,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 233–240.

[93] P. Arbeláez, M. Maire, C. Fowlkes, and J. Malik, “From contours to regions: An empirical evaluation,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 2294–2301.

[94] P. Arbelaez, “Boundary extraction in natural images using ultrametric contour maps,” in Computer Vision and Pattern Recognition Workshop, 2006. CVPRW’06. Conference on, 2006, pp. 182–182.

[95] M. Tucker, “Objective viewpoint: make your GUI swing,” Crossroads, vol. 6, no. 4, pp. 3–5, Jun. 2000.

[96] M. N. G. Malbrain, D. Chiumello, P. Pelosi, A. Wilmer, N. Brienza, V. Malcangi, D. Bihari, R. Innes, J. Cohen, P. Singer, A. Japiassu, E. Kurtop, B. De Keulenaer, R. Daelemans, M. Del Turco, P. Cosimini, M. Ranieri, L. Jacquet, P.-F. Laterre, and L. Gattinoni, “Prevalence of intra-abdominal hypertension in critically ill patients: a multicentre epidemiological study,” Intensive Care Medicine, vol. 30, no. 5, pp. 822–829, 2004.

- 59 -

[97] D. Wang and L. Jiang, “An Improved Attribute Selection Measure for Decision Tree Induction,” in Fourth International Conference on Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007, 2007, vol. 4, pp. 654 –658.

[98] J. Li and X. Chen, “Online learning algorithms of direct support vector machine,” International Journal of Advancements in Computing Technology, vol. 3, no. 11, pp. 486–497, 2011.

- 60 -

Appendix A

This is a prototype which used for AD diagnosis using machine learning algorithms. In this prototype a

machine learning algorithm called J48 is implemented. Based on the pattern learned from the dataset

provided (10,000 instance of data), the algorithm diagnosis the patient and predict the stage of the

patient.

How to use the system

1. Run the program (jar file inside the dist folder named ‘ADAppl_1’). 2. Insert your data as indicated below. 3. Press classify button to see the diagnosis result.

Figure 20 : The prototype interface with input

4. The application use the J48 algorithm to classify the patient based on the pattern acquired from the dataset. The result of the diagnosis displayed on the right top after the label CLASS: .

- 61 -

Figure 21 : The prototype interface with input and classification result

- 62 -

Appendix B

Classification of J48 on MH dataset

Classification of J48 on CF dataset

- 63 -

Classification of J48 on both MH and CF dataset

alzheimer's disease stage prediction using machine ...829245/fulltext01.pdf ·...

Documents