‘oagait’: a decision support system for grading knee...

‘OAGAIT’: A DECISION SUPPORT SYSTEM FOR GRADING KNEE OSTEOARTHRITIS USING GAIT DATA

A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF INFORMATICS

OF THE MIDDLE EAST TECHNICAL UNIVERSITY

BY

NĐGAR ŞEN KÖKTAŞ

IN PARTIAL FULLFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

IN THE DEPARTMENT OF INFORMATION SCIENCE

JANUARY 2008

Approval of the Graduate School of Informatics

______________________ Prof. Dr. Nazife BAYKAL

Director I certify that this thesis satisfies all the requirements as a thesis for the degree of Doctor of Philosophy.

______________________ Assoc. Prof. Dr. Yasemin YARDIMCI

Head of Department This is to certify that we have read this thesis and that in our opinion it is fully adequate, in scope and quality, as a thesis for the degree of Doctor of Philosophy.

______________________ Prof. Dr. Neşe YALABIK

Supervisor Examining Committee Members Prof. Dr. Volkan ATALAY (METU, CENG) _______________ Prof. Dr. Neşe YALABIK (METU, CENG) _______________ Assoc. Prof. Dr. Erkan MUMCUOĞLU (METU, II) _______________ Assoc. Prof. Dr. Sibel TARI (METU, CENG) _______________ Assoc. Prof. Dr. Güneş YAVUZER (Ankara Unv., Med. Fac.) _______________

iii

I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work.

Name, Last Name: Nigar ŞEN KÖKTAŞ

Signature:__________________________

iv

ABSTRACT

‘OAGAIT’: A DECISION SUPPORT SYSTEM FOR GRADING KNEE

OSTEOARTHRITIS USING GAIT DATA

Şen Köktaş, Nigar

Ph.D., Department of Information Systems

Supervisor: Prof. Dr. Neşe Yalabık

January 2008, 142 pages

Gait analysis is the process of collecting and analyzing quantitative information

about walking patterns of the people. Gait analysis enables the clinicians to

differentiate gait deviations objectively. Diagnostic decision making from gait

data only requires high level of medical expertise of neuromusculoskeletal system

trained for the purpose. An automated system is expected to decrease this

requirement by a ‘transformed knowledge’ of these experts.

This study presents a clinical decision support system for the detecting and

scoring of a knee disorder, namely, Osteoarthritis (OA). Data used for training

and recognition is mainly obtained through Computerized Gait Analysis software.

Sociodemographic and disease characteristics such as age, body mass index and

v

pain level are also included in decision making. Subjects are allocated into four

OA-severity categories, formed in accordance with the Kellgren-Lawrence scale:

“Normal”, “Mild”, “Moderate”, and “Severe”.

Different types of classifiers are combined to incorporate the different types of

data and to make the best advantages of different classifiers for better accuracy. A

decision tree is developed with Multilayer Perceptrons (MLP) at the leaves. This

gives an opportunity to use neural networks to extract hidden (i.e., implicit)

knowledge in gait measurements and use it back into the explicit form of the

decision trees for reasoning.

Individual feature selection is applied using the Mahalanobis Distance measure

and most discriminatory features are used for each expert MLP. Significant

knowledge about clinical recognition of the OA is derived by feature selection

process. The final system is tested with test set and a success rate of about 80% is

achieved on the average.

Keywords: Gait analysis, grading knee OA, combining classifiers, clinical

decision support systems

vi

ÖZ

‘OAGAIT’: YÜRÜYÜŞ VERĐLERĐ KULLANARAK DĐZ OSTEOARTRĐTĐ

DERECELENDĐRMESĐ ĐÇĐN BĐR KARAR DESTEK SĐSTEMĐ

Şen Köktaş, Nigar

Doktora, Bilişim Sistemleri

Tez Yöneticisi: Prof. Dr. Neşe Yalabık

Ocak 2008, 142 sayfa

Yürüyüş analizi insanların yürüyüş örüntüleri hakkında niceliksel bilgi toplama ve

analiz etme yöntemidir. Yürüyüş analizi klinik uzmanların yürüyüşteki sapmaları

objektif olarak ayırt etmelerini sağlar. Sadece yürüyüş verisi ile tanısal karar

vermek kas-sinir-iskelet sistemi hakkında ileri medikal uzmanlık gerektirir.

Otomatik bir sistemin uzmanların “dönüştürülmüş bilgileri” ile bu ihtiyacı

azaltacağı beklenmektedir.

Bu çalışma bir diz rahatsızlığı olan Osteoartrit’in, tespiti ve derecelendirilmesi

için tasarlanan bir klinik karar destek sistemini sunmaktadır. Öğrenme ve tanıma

için kullanılan veri bir Bilgisayarlı Yürüyüş Analizi yazılımı yolu ile toplanmıştır.

vii

Sosyodemografik ve yaş, vücut kitle endeksi ve ağrı seviyesi gibi hastalık

karakteristikleri de karar verme sürecine dahil edilmiştir. Kişiler Kellgren-

Lawrence ölçeğine göre dört OA-şiddet derecesine ayrılmıştır: “Normal”, “Hafif”,

“Orta” ve “Şiddetli”.

Farklı türlerde verileri kapsamak ve daha iyi doğruluk oranları için farklı

sınıflandırıcılar birleştirilmiştir. Yapraklarına Çok Katmanlı Algılayıcılar

yerleştirilen bir karar ağacı geliştirilmiştir. Bu yöntem, sinir ağlarını yürüyüş

ölçülerinde saklı bilgileri çıkarma ve bunları karar ağacının açık biçimine

sebeplendirme için geri bildirme fırsatı verir.

Mahalanobis Uzaklığı ölçüsünü kullanarak bireysel öznitelik seçme uygulanmış

ve her bir uzman Çok Katmanlı Algılayıcılar için en ayırt edici öznitelikler

kullanılmıştır. Öznitelik seçme sürecinde OA’nın klinik tanınması hakkında

önemli bilgiler çıkarılmıştır. Üretilen son sistem test verisi ile test edilmiş ve

ortalamada yaklaşık %80 başarı oranı elde edilmiştir.

Anahtar Kelimeler: Yürüyüş analizi, diz OA’sı derecelendirilmesi, birleşik

sınıflandırıcılar, klinik karar destek sistemleri

viii

DEDICATION

To my beloved husband

Ersoy KÖKTAŞ

ix

ACKNOWLEDGMENTS

I would like to express my deep and sincere gratitude to my supervisor, Prof. Neşe

YALABIK. Her understanding, encouraging and personal guidance have provided

a good basis for the present thesis. We have not only performed academic

researches together, but also enjoyed social activities like cooking, traveling,

going out, chatting and more. Her way of thinking, communicating and

motivating have been of great value for me during our eight-year togetherness

which I hope to be everlasting.

I owe special thanks to Prof. Volkan ATALAY and Assoc. Prof. Güneş

YAVUZER who have always welcomed me whenever I had questions throughout

the whole progress.

I am also grateful to my committee members Assoc. Prof. Erkan MUMCUOĞLU

and Assoc. Prof. Sibel TARI for reading the thesis and for their constructive

comments.

I am sincerely grateful to Professor Robert P.W. DUIN, from Delft University of

Technology Netherlands, for providing a research opportunity at his department

and for providing support and guidance during my stay in Delft.

I also want to thank to my friends, especially to Pınar ONAY DURDU, Habil

KALKAN, Fatih NAR, Serkan ŞIK, Burcu AKMAN and Sibel GÜLNAR for

their endless support and motivation during our lunches and coffee breaks. Special

thanks go to Bülent ÖZTÜRK for his being a gracious friend and for his

invaluable technical supports. I also would like to acknowledge my dear friends

Semra YILMAZ, Ahmet TAŞTEKĐN, Yalçın BOSTAN and Mine IŞIKSAL.

x

They have created a friendly atmosphere and filled my life with fun whenever I

needed.

I express thanks to my friends in Image Processing and Pattern Recognition

Laboratory, Đsmet YALABIK, Özge ÖZTĐMUR, Aykut ERDEM, Erkut ERDEM

and Oral DALAY for creating a warm and friendly environment since my first

day in the department.

Thanking my family never be enough; I express my special appreciation to my

parents and sisters for their encouragements and supports throughout this study

and whole my life.

Finally, thanks go to my beloved husband Ersoy KÖKTAŞ. Without his patience,

understanding and motivation it was impossible to finish this study.

xi

TABLE OF CONTENTS

ABSTRACT........................................................................................................... iv

ÖZ ......................................................................................................................... vi

DEDICATION ..................................................................................................... viii

ACKNOWLEDGMENTS ..................................................................................... ix

TABLE OF CONTENTS....................................................................................... xi

LIST OF TABLES ................................................................................................ xv

LIST OF FIGURES ............................................................................................ xvii

LIST OF ABBREVIATIONS............................................................................. xvii

CHAPTER

1. INTRODUCTION .............................................................................................. 1

1.1. GAIT ANALYSIS .................................................................................... 1

1.2. OSTEOARTHRITIS................................................................................. 3

1.3. LITERATURE SURVEY ABOUT GAIT CLASSIFICATION .............. 5

1.4. OBJECTIVES AND CONTRIBUTIONS OF THE STUDY................... 8

1.5. SUPPORTING PROJECT...................................................................... 10

1.6. ORGANIZATION OF THE WORK...................................................... 11

2. OSTEOARTHRITIS AND DATA PROPERTIES........................................... 14

2.1. OSTEOARTHRITIS............................................................................... 14

2.1.1. Causes .............................................................................................. 14

xii

2.1.2. Symptoms ........................................................................................ 15

2.1.3. Diagnosis ......................................................................................... 16

2.1.4. Treatment ......................................................................................... 18

2.2. DATA COLLECTION METHODS....................................................... 20

2.3. PROPERTIES OF GAIT DATA ............................................................ 23

2.4. DATA STORAGE.................................................................................. 31

3. PATTERN RECOGNITION METHODS ........................................................ 33

3.1. FEATURES AND DIMENSIONALITY REDUCTION ....................... 33

3.1.1. Feature Extraction............................................................................ 34

3.1.2. Feature selection .............................................................................. 37

3.2. CLASSIFIERS........................................................................................ 39

3.2.1. Tree Classifiers ................................................................................ 39

3.2.2. Neural networks............................................................................... 45

3.2.3. Combining Classifiers...................................................................... 49

3.2.4. Combination Schema....................................................................... 51

4. FEATURE SELECTION AND CLASSIFICATION (GRADING)................. 54

4.1. INTRODUCTION .................................................................................. 54

4.2. STATISTICAL ANALYSIS OF GAIT DATA: COMPARISON OF

FEATURE REDUCTION/SELECTION AND CLASSIFICATION

ALGORITHMS ...................................................................................... 55

4.2.1. Feature reduction and selection methods......................................... 56

4.2.2. Comparing Classifiers...................................................................... 61

4.2.3. Results of comparisons .................................................................... 63

4.3. COMBINING MLPS FOR GAIT CLASSIFICATION ......................... 64

4.3.1. Dataset Properties ............................................................................ 65

4.3.2. Classification by Combining MLPs................................................. 67

xiii

4.4. A DECISION TREE-MLP MULTICLASSIFIER FOR GRADING

KNEE OA ............................................................................................... 71

4.4.1. Preprocessing of data ....................................................................... 74

4.4.2. Feature Reduction and Selection ..................................................... 75

4.4.3. Combining classifiers ...................................................................... 79

4.4.4. Training and testing ......................................................................... 82

5. IMPLEMENTATION AND RESULTS........................................................... 85

5.1. INTRODUCTION .................................................................................. 85

5.2. CREATING THE DECISION TREE..................................................... 85

5.3. FEATURE REDUCTION AND SELECTION PROCESSES............... 90

5.4. IMPLEMENTATION OF MLPS ........................................................... 94

5.5. RESULTS ............................................................................................... 96

5.6. CLASSIFICATION EXAMPLES........................................................ 101

5.7. OVERALL ANALYSIS OF THE RESULTS...................................... 103

6. OAGAIT DECISION SUPPORT SYSTEM .................................................. 105

6.1. CLINICAL DECISION SUPPORT SYSTEMS................................... 105

6.2. PROPERTIES OF A GOOD CDSS ..................................................... 108

6.3. FEATURES OF OAGAIT.................................................................... 110

6.4. OAGAIT DATABASE......................................................................... 114

6.5. DATA FLOW DIAGRAMS................................................................. 119

6.6. EVALUATION OF OAGAIT AS A CDSS ......................................... 122

7. CONCLUSION AND FUTURE DIRECTIONS............................................ 125

7.1. PROPERTIES OF OAGAIT SYSTEM................................................ 125

7.2. EVALUATION OF THE GRADING ALGORITHM ......................... 126

7.3. LIMITATIONS OF THE STUDY........................................................ 127

7.4. SUGGESTIONS FOR FUTURE WORK............................................. 128

xiv

REFERENCES.................................................................................................... 130

APPENDICES .................................................................................................... 137

A. AN EXAMPLE TO EXCEL FILE OF A SUBJECT................................. 137

VITA ................................................................................................................... 140

xv

LIST OF TABLES

Table 1.1: Stages of the study and results ............................................................. 10

Table 2.1: Limits of the personal features............................................................. 24

Table 2.2: Properties of the used gait attributes.................................................... 28

Table 4.1: Mahalanobis Distance values of gait attributes ................................... 58

Table 4.2: An example to feature selection by distmaha function of PRTools..... 59

Table 4.3: Datasets after feature reduction and selection processes ..................... 59

Table 4.4: Dataset characteristics.......................................................................... 66

Table 4.5: Properties of MLPs used in experiment 1............................................ 68

Table 4.6: Properties of MLPs used in experiment 2............................................ 69

Table 4.7: Success rates (number and percentage) for combining rules............... 69

Table 4.8: Confusion matrices for used MLPs...................................................... 70

Table 4.9: Limits of the personal features............................................................. 74

Table 4.10: Levels of the selected gait attributes for each binary class case ........ 77

Table 4.11: Selection of gait attributes using MD values ..................................... 78

Table 4.12: Tenfold Crossvalidaton testing .......................................................... 84

Table 5.1: Tree construction algorithm................................................................. 86

Table 5.2: Pruning algorithm ................................................................................ 89

Table 5.3: Selected gait attributes for each two-class case ................................... 93

Table 5.4: Classification errors of MLPs .............................................................. 95

Table 5.5: Distribution of samples in decision tree............................................... 98

Table 5.6: Confusion matrix for combination (training data) ............................... 99

Table 5.7: Confusion matrix for combination (test data).................................... 100

Table 5.8: Confusion matrix for MLP4............................................................... 100

Table 5.9: Examples of the test set classification ............................................... 102

xvi

Table 6.1: Features of a good CDSS................................................................... 109

Table 6.2: Features of OAGAIT compared to the ones suggested in ................. 123

xvii

LIST OF FIGURES

Figure 1.1: XR image of a normal and OA affected knee joint ............................. 4

Figure 1.2: Flowchart for learning and classification phases................................ 12

Figure 1.3: Flowchart for feature reduction and selection processes.................... 13

Figure 2.1: OA of the knee ................................................................................... 18

Figure 2.2: Replacing knee operation .................................................................. 20

Figure 2.3: Data collection with stereo metric method (visible markers are

attached to body of the subject) ................................................................... 22

Figure 2.4: Data collection (a: gait analysis laboratory, b: a subject walking on the

platform)........................................................................................................ 25

Figure 2.5: 3D analysis of human body ................................................................ 26

Figure 2.6: VICON Software marker adjustment screen ...................................... 27

Figure 2.7: Calculation of kinetic variables ......................................................... 27

Figure 2.8: Examples to temporal gait attributes (data set C)............................... 30

Figure 2.9: Patient recording form........................................................................ 32

Figure 3.1: Averaging five consecutive time samples of two gait waveforms ..... 36

Figure 3.2: An example to decision tree classifier................................................ 40

Figure 3.3: Examples to a) balanced and b) unbalanced trees .............................. 42

Figure 3.4: A simple neuron model....................................................................... 46

Figure 3.5: An example to multilayer perceptron with one hidden layer. ............ 47

Figure 3.6: Examples to classification regions by one, two and three layer MLPs 48

Figure 3.7: Reasons why an ensemble classifier may be better than an individual

one................................................................................................................. 50

Figure 3.8: Approaches for building ensemble classifiers (differentiation at each

level may be considered as an approach)...................................................... 52

xviii

Figure 4.1: Error rates of the classifiers for the a) averaged datasets b) FFT

applied datasets ............................................................................................. 62

Figure 4.2: Learning curves for some classifiers .................................................. 63

Figure 4.3: Graphs of the gait data (healthy (a) and knee osteoarthritis (b)) ....... 66

Figure 4.4: MLP combination schemas for experiment 1 (a), and experiment 2 (b)

H/S: Healthy or sick, FV: Feature vector...................................................... 68

Figure 4.5: Flowchart for classifier design process .............................................. 73

Figure 4.6: Flowchart for feature reduction and selection processes.................... 76

Figure 4.7: Flowchart for classifier combining..................................................... 81

Figure 4.8: Proposed example combination in a simplified symbolic

representation. ............................................................................................... 84

Figure 5.1: An example to created decision tree (no pruning).............................. 88

Figure 5.2: Cost (classification error) versus tree size.......................................... 90

Figure 5.3: Averaging consecutive time samples for feature reduction................ 91

Figure 5.4: Dataset dimension versus misclassification rate (a. averaged datasets,

b. FFT applied datasets) ................................................................................ 92

Figure 5.5: ROC curves for MLPs ........................................................................ 96

Figure 5.6: Composed decision tree by prune level 3........................................... 97

Figure 6.1: Patient recording form...................................................................... 111

Figure 6.2: Patient tracking form ........................................................................ 112

Figure 6.3: WOMAC form.................................................................................. 113

Figure 6.4: Grading screen.................................................................................. 114

Figure 6.5: ER diagram for OAGAIT database .................................................. 115

Figure 6.6: Table: gait_data ................................................................................ 116

Figure 6.7: Table: Womac_results ...................................................................... 117

Figure 6.8: Table: time_dist ................................................................................ 118

Figure 6.9: Table: personal_info ......................................................................... 118

Figure 6.10: Table: measure_data ....................................................................... 119

Figure 6.11: Level 1 DFD of the OAGAIT for data recording........................... 120

Figure 6.12: DFD for patient tracking process ................................................... 121

Figure 6.13: DFD for patient grading process .................................................... 122

xix

LIST OF ABBREVIATIONS

OA Osteoarthritis

MRI Magnetic Resonance Imaging

NN Neural Network

FFT Fast Fourier Transform

CCD Charge Coupled Device

VCM Vicon Clinical Manager

GRF Ground Reaction Force

WOMAC Western Ontario and McMaster University Osteoarthritis Index

PCA Principle Component Analysis

DFT Discrete Fourier Transform

ANN Artificial Neural Network

MLP Multilayer Perceptron

CDSS Clinical Decision Support System

CDDSS Clinical Diagnostic Decision Support System

DSS Decision Support System

BMI Body Mass Index

DFD Data Flow Diagram

MD Mahalanobis Distance

XR Roentgen Rays

1

CHAPTER 1

1. INTRODUCTION

1.1. GAIT ANALYSIS

Gait analysis is the process of collecting and analyzing quantitative information

about walking patterns of the people. Gait analysis, when considered as an

automated system, is usually used for two major applications: human

identification and clinical applications.

Human identification is an important security issue. In most cases it is not so easy

to determine the identity of the person but many applications work well for some

special cases, such as gender classification [1], age classification [2] etc. In most

of these studies silhouettes are obtained from image sequences and required

features are gathered from these [3-10]. Since exact human id verification requires

more complex systems and huge amount of data, these studies are at their initial

phases [3-10].

The application of gait analysis in medicine is also a well-studied subject. There

are studies that have shown that the number of surgical procedures is reduced

after a three-dimensional (3-D) gait analysis [11-13]. Moreover, gait analysis is

important for orthopedists to develop a treatment plan or to track the improvement

of persons having gait problems (i.e. Parkinson, cerebral palsy, arthritis [13-20]).

Examples of the application area of clinical gait analysis include [11-15]:

2

• the assessment of orthopedic diseases’ progression to aid in the

determination of appropriate surgical or orthotic intervention

• the examination of the progression of neuromuscular disorders such as

Parkinson's or muscular dystrophy

• the quantification of the effects of surgery, that is, pre and post-operative

patterns

• the evaluation of the effectiveness of prosthetic joint replacement

• the examination of improvements in orthotic design and

• the quantification of changes in prosthetic design

Gait analysis enables the clinicians to differentiate gait deviations objectively. It

serves not only as a measure of treatment outcome, but also as a useful tool in

planning ongoing care of various neuromuskuloskeletal disorders such as cerebral

palsy, stroke, Osteoarthritis (OA), as a support to other approaches such as X-

rays, MRI, chemical tests etc. Gait process is realized in a ‘gait laboratory’ by the

use of computer-interfaced video cameras to measure the patient’s walking

motion by the use of electrodes placed on the skin to follow muscle activity, and

by the use of force platforms embedded in a walkway to monitor the forces and

torques produced between the patient and the ground. Resultant data (such as knee

angle/time) is tabulated in graphic/numerical forms by commercial software.

Storing gait data makes the comparison of patients to each other (i.e. normal and

patient gait classification) and to themselves (examining improvements of patient

by previous data) possible.

In addition to temporal changes of joint angles and ground reaction force data,

time-distance parameters of the gait such as velocity, cadence, stride length, step

length are recorded. It is not possible to detect the resultant biomechanical

musculoskeletal characteristics using other approaches such as the radiographic

(X-ray, computerized tomography and/or MRI) evaluations, which makes gait

analysis a preferable tool for many cases.

3

If the physician him/herself interprets the gait data for clinical decision making,

then it may be called a “non-automated procedure” and “automated procedure” if

it is interpreted partially by any kind of decision support software. Non

automated decision making from gait data only requires high level of expertise of

neuromusculoskeletal system trained for the purpose. An automated system is

expected to decrease this requirement by a ‘transformed knowledge’ of these

experts. This way, clinicians’ time is saved, and the probability of human errors is

decreased. Automated gait analysis in medicine may also be used as a consultative

and educational tool.

In this study, ‘Knee Osteoarthritis (OA)’ is chosen as an application. OA severity

levels are established through the Kellgren-Lawrence radiographic grading system

[21]. The decision support system showed here aims to guess the grade of the

illness without need of radiography which is a more expensive system and also

may have invasive side effects [22].

1.2. OSTEOARTHRITIS

OA is a disorder that affects joint cartilage and surrounding tissue that shows

itself by pain, stiffness and loss of function [22]

This disease occurs mostly because of cartilage deformations. Bone can overgrow

at the edges of the affected joint and bumps can be seen and felt. All the

components of the joint deteriorate in some ways and so alter the structure of the

joint. OA usually begins with one of a few joints and most often gradually

increase. Earliest symptom is the pain which is worsened by weight bearing and

relieved by rest. Stiffness is felt after some inactivity and lessens with movement.

As OA progresses, joint motion becomes restricted, and tenderness and crepitus

may appear.

The muscles surrounding and supporting the joint (such as knee) may stretch. So

the joint becomes unstable and stiff, and loses it range of motion. Touching or

moving the joint (particularly when standing, climbing stairs, or walking) can be

4

very painful. Figure 1.1 shows X-Ray (XR) images of a normal and an OA

affected knee joint. The narrowing of the sick joint can be clearly seen.

Figure 1.1: XR image of a normal and OA affected knee joint [22]

Diagnosis of the disease is made according to symptoms, physical examination

and XR images. XR images show evidence of OA especially in weight-bearing

joints such as hip and knee. Kellgren-Lawrence is a method used for radiological

assessment of OA [21]. According to this method OA is divided into five grades

as follows:

•••• Grade 0 indicates a definite absence of x-ray changes of OA.

•••• Grade 1, doubtful narrowing of joint space and possible outgrowth of the

bone;

•••• Grade 2, definite outgrowth of the bone and possible narrowing of joint

space;

•••• Grade 3, moderate multiple outgrowths, definite narrowing of joints space,

some sclerosis and possible deformity of bone contour;

•••• Grade 4, large outgrowths, marked narrowing of joint space, severe

sclerosis and definite deformity of bone contour.

5

Postural exercises for stretching and strengthening are advised for treatment of the

OA. Exercises may help maintain healthy cartilage, increase range of motion, and

strengthen surrounding muscles. Soft chairs, recliners, mattresses, and car seats

may worsen symptoms. Specific exercises may be needed for OA of the spine.

Exercises should include muscle strengthening and low impact aerobic exercises

(such as walking, swimming, and bicycle riding).

Physical therapy is another effective treatment method. Heat improves muscle

function by reducing stiffness and muscle spasm. Massage by trained therapists

and deep heat treatment may be useful. Cold may be applied to reduce pain.

Drugs are used to reduce the symptoms and thus allow more appropriate

exercises. If a sudden injury occurs the fluid inside the joint may be removed and

a form of cortisone may be injected directly to the joint. But this treatment is not a

long term relief. Another injection method is done by hyaluronate which is a

component of a normal joint fluid. This method may provide significant pain

relief for longer time periods.

The replacement of the damaged knee joint with an artificial joint is a surgical

operation applied for treatment of OA of the knee. Surgery may help when all

other treatments fail to relieve pain. It is usually very successful to improve

motion and decrease pain. Since the artificial joint does not last forever, joint

replacement should be considered when function becomes limited.

1.3. LITERATURE SURVEY ABOUT GAIT CLASSIFICATION

A large number of studies are conducted for gait classification. Since gait data is

high dimensional and complex, to design a complete decision support system may

require a combination of all available features. In non automated traditional

systems physicians make decisions about the illnesses by interpreting all available

data. On the other hand, most of the automated systems ignore history and

symptoms of the patient, such as age, pain grade, family history. There are some

known facts about the effects of some factors to cause or to develop OA [20-30],

such as occurring in the same frequency in both sexes before the age of 55 but

6

being more common in women after 55. Obesity places people (particularly

women) at increased risk for osteoarthritis because of increased weight on the

joints. Injury from different sources can also contribute to osteoarthritis. Repeated

minor injuries or a single injury to a joint may change the normal joint structure.

A genetic defect may promote breakdown of the protective architecture of

cartilage. Actually, in traditional non automated clinical decision making,

physicians listen to the patient and use this kind of qualitative information in

addition to the lab test. So this kind of non-numeric information should also be

included in the decision making process.

The aim of pattern recognition research for clinical gait analysis is to find ways to

support doctors in decision making and treating patients using gait data. Standard

movement patterns are produced for healthy walkers, and then these patterns

served as a baseline for examination of the walking pattern of patients’ gait with

abnormalities or diseases. The interpretations of quantitative gait data are

experimented by pattern recognition techniques before [31, 32]. Most popular of

these are neural networks (NNs) [33-40] and support vector machines (SVMs) [2,

41]. The use of NNs for experimental gait classification is not new. There are

studies in which NNs are trained by force platform data to distinguish ‘healthy’

from ‘pathological’ gait [34, 35, 40]. In addition to these, people are identified

among a few subjects (less than 10) by using joint angles as features [33, 36, 38,

39]. These studies produce reasonable results for NNs use in gait classification. A

few of these studies will be explained in more detail in following paragraphs.

Kohle et al. [34] categorized gait pathology based on the ground reaction forces.

They measured two successive ground reaction forces of 131 subjects having

various diseases like calcaneus fracture, and limb deficiencies. 94 normal

subjects’ data was also gathered. FFT coefficients of vertical components of the

two ground reaction forces were used as inputs to a standard network with one

hidden layer. The accuracy of this network in discriminating healthy from

pathological gait was 95%. This study summarized that simple two-category gait

classification with large number of input parameters is achievable with neural

networks.

7

A similar study conducted by Barton and Lees [36] extended the classification

problem to a three class case. A neural network with two hidden layers is used to

categorize maximum value of ground reaction forces into one of three categories:

healthy feet, pes cavus (a deformity of the foot characterized by an abnormally

high arch and hyperextension of the toes) and hallux (big toe) valgus. The

pressure patterns of 18 subjects were recorded and scaled to a size and normalized

to the interval [0, 1]. The number of inputs of network reached to 1316 which is

much more than the previous examples. The accuracy of the network was claimed

to vary between 77% and 100% based on the size of the train and test set. Since

the conditions classified here can be identified by routine medical exams, the

advantage of the proposed system was not clearly understandable. In second stage

of the study hip-knee joint angles of the eight healthy subjects were calculated via

a set of four reflective markers. They mentioned that hip–knee joint angle

diagrams are characteristic of a subject’s gait pattern and so could be used for

automated identification of gait patterns. Subjects were walked on a walking

platform under three different conditions; normal walking, simulated leg length

difference and simulated leg weight difference. The angles were normalized in

time; Fourier transformed and normalized to the interval [0, 1]. A neural network

again with two hidden layers is used and 83.3% average accuracy rate was

achieved.

In another study Lafuente et al. [40] used a standard feedforward neural network

with one hidden layer to classify four-category gait patterns. They collected data

from 148 subjects with ankle, knee or hip arthritis and 88 normal subjects without

limb pathology. The features consisted of cadence, velocity and five kinetic

magnitudes. A three layered network was trained by these inputs to discriminate

four classes and an accuracy of 80% was reached.

As a summary these studies have shown the potential for multi-category

classification of the gait patterns. However, if the aim of the classification is

medical usage, more detailed classification problems (i.e. grading of a disease)

with high dimensional and diverse data arise.

8

Unfortunately, as the dimension of the obtained data increases accuracies may

deteriorate. Recently, the focus has been on combining several classifiers and

getting a consensus of results for better accuracy [42, 43]. Today, combining

methods are preferred for many well known pattern recognition problems such as

character recognition, speech recognition [42-55] etc. On the other hand, decision

trees have been widely used for medical decision making processes [42, 56-62].

They have not been applied to gait data analysis but have demonstrated potential

in analyzing gait data [31]. These two classifiers neural networks and decision

trees have their specific advantages and disadvantages, and most of these

characteristics complement each other. Hence, advantages of both approaches

might be utilized if combined properly.

Automatic feature selection from many numerical gait parameters is another

subject that’s not studied well in medical applications. Actually, there are many

medical practices, testing the variations in the gait attributes which are caused by

the related disease [20-30]. Selection of attributes is usually done by using the

result of these studies by expert clinicians. However, the judgments may vary in

different experts leading to the different interpretations. Obviously, automated

selection lessens the dependence and the load on the experts and gives more

freedom to the researchers.

1.4. OBJECTIVES AND CONTRIBUTIONS OF THE STUDY

The main objective of this study is to design a decision support system to help

physicians by supplying accurate and practical ways to interpret the gait data and

further follow the progress of OA. Grading of the diseases is helpful for

physicians in treatment and operation plans. The treatment plans for knee OA are

made according to the grade of the illness [20, 21, 26-28]. This grading is usually

done by physicians using radiographic films of the knee. The patient is also

walked on the gait laboratory if available and the data is used as assistive to the

radiography results. Actually the classification of sick and normal subjects has

been studied before [18, 19, 34-39], but the grading of the OA is new. The

decision support system developed by this study is expected to be a supportive

9

tool for grading and treatment planning of knee OA. Moreover computerizing the

current system is expected to provide some additional benefits for gait laboratory

users. Since automated evaluation of the data does not require as much expertise

as manual evaluation, the usage of the gait laboratories may increase. In addition

the misdiagnosis occurring because of different comments of the different experts

are expected to be minimized.

Original contributions of the study are summarized below:

• The system supports a base for patient follow up in time. Gait patterns of

the subjects taken in different times can be compared so that more accurate

treatment plans are possible.

• Automated feature selection process reveals that gait measurements for

different parts of the body such as knee or hip to be more effective for

different scores of the OA. These results may be valuable for physicians

for effective decision making about treatment of OA and reasoning the

conclusion. Also, dependence and the load on the experts are reduced and

more freedom to the researchers is given by the automated feature

selection process.

• Another original contribution of the study is taking advantage of working

with a gait expert at each stage of the study. A physical medicine and

rehabilitation expert gives support to the study as both an expert and a

target user of the proposed decision support system. So, the results of the

data analysis process can be commented for further recognition of the

selected illness and helping the treatment.

• Since, all available various structured data is used in a hierarchical way for

design of the decision support system, it models the expert’s decision

making process well. The combined decision tree-MLP approach is also

expected to be applicable to similar type of medical decision making

processes, where both disease characteristics and clinical measurements

and tests are to be combined.

10

1.5. SUPPORTING PROJECT

This study is implemented as a research project supported by TÜBĐTAK. The

group is composed of physicians, computer scientists, PhD students and technical

personnel. The group communicated well at each stage of the study. Table 1.1

summarizes the milestones of the whole process.

Table 1.1: Stages of the study and results

Stages of the study Results

Problem definition

• Gait laboratory is seen and problems are listed

• Literature survey about previous studies are done

• Objectives of the study are determined

Requirements

analysis

• Hardware/software requirements of laboratory are determined (i.e. new PCs are bought, a database is created for easing data collection)

• A questionnaire is prepared for expert physicians to determine expected features of the system

• Face to face interviews are also implemented

Design

• Design of the system is made iteratively • Different feature reduction/selection and

classification methods are compared • New approaches are searched for better

classification (i.e. combining methods) • Results are discussed with expert,

suggested changes are made and iteration started again

11

Table 1.1 (cont)

Implementation

• Datasets are created by querying the database

• Preprocessing of the data (i.e. cleaning empty entries, converting non numeric features to numeric ones)

• Feature reduction/ selection methods are applied

• Two-class experiments are done by neural network methods

• Multi-class classification is done by combining decision trees and neural networks

Testing • System is tested by unseen data

After determining objectives of the study, requirements of the gait laboratory are

determined. The old PC is replaced with the new one, and then required software

is uploaded. Then, most important deficiency of the laboratory is determined as

need of a database. A complete database is created to keep all data together in one

system to ease the query processes. Some software interfaces are created to read

data from the current system and user friendly interfaces are supplied for

laboratory users.

1.6. ORGANIZATION OF THE WORK

The outline of the report is organized according to the steps of classifier design

process. A classifier topology which is similar to Mixture of Experts (MME)

approach based on decision trees and a number of Multilayer Perceptrons (MLPs),

each expert in separating two adjacent degrees is proposed. A scoring of the OA

(0-3), which specifies the degree of the disease, is used as the categories of the

classifier. Finally, the system is tested by unseen data and results are presented in

forward sections of this report. Figure 1.2 shows the stages in classifier design

process.

12

For better design of the classifier some different feature reduction and selection

methods are compared. Averaging method is selected for reduction and the

Mahalanobis Distance criterion is selected for feature selection. Further details

will be discussed in Chapter 4 of the report. Figure 1.3 summarizes the feature

reduction and selection processes.

Figure 1.2: Flowchart for learning and classification phases

Since the feature set is quite diverse new classification approaches like combining

classifiers are searched in the literature. Following the feature reduction and

selection processes a combination algorithm is created by using decision tree and

neural network approaches together.

13

Figure 1.3: Flowchart for feature reduction and selection processes

The remainder of the report is organized as follows. In the next section, data

collection and recording methods are discussed. Then in chapter 3 pattern

recognition approaches are summarized. In fourth chapter implementation and

analysis of the results are given. Detailed information about OAGAIT system is

discussed in chapter 5. Finally, conclusion and discussions are presented.

14

CHAPTER 2

OSTEOARTHRITIS AND DATA PROPERTIES

2.1. OSTEOARTHRITIS

OA is a disorder that affects joint cartilage and surrounding tissue that shows

itself by pain, stiffness and loss of function [20-23]. Although OA is mostly seen

in older people, it is not caused by the years of use. But, while the younger people

having few symptoms, the older ones develop significant disabilities [22].

2.1.1. Causes

Our joints are normally protected from wearing out by low friction levels,

provided by the cartilages between the bones. OA mostly begins with the

deformation of the cells that form the cartilage. Then, the cartilage may become

soft and cracks on the surface may be seen. Bone can overgrow at the edges of the

affected joint and bumps can be seen and felt. All the components of the joint

deteriorate in some ways and so alter the structure of the joint [20-22].

OA is classified into two groups; primary and secondary. If the cause of the

disease is not known, which is valid for most of the cases, it is called primary OA.

If the cause is another disease or condition like infection, deformity, injury, then it

is called secondary OA. Some people repetitively stress one joint because of their

15

jobs (i.e. coal miners, bus drivers) and so increase the risk of OA. Obesity may be

a major factor in the development of OA, particularly of the knee and especially in

women.

2.1.2. Symptoms

Usually, symptoms show themselves in one or a few joints at first. Most

commonly affected joints are hip, knee, fingers, neck, lower back and big toes.

Pain is the first symptom which usually caused by weight bearing activities.

Stiffness is another important symptom which is felt after some inactivity like

sleep [20-22, 25-30].

The affected joint may become less movable and it may be more difficult to

straighten or bend. The irregular cartilage surfaces cause joints to grind, grate, or

crackle when they are moved.

In some joints (such as the knee) the ligaments, which surround and support the

joint, may stretch. So the joint becomes unstable and stiff, and loses it range of

motion. Touching or moving the joint (particularly when standing, climbing stairs,

or walking) can be very painful.

For OA of the spine, the back pain is one of the most common symptoms.

Usually, damages of disks or joints in the spine cause only mild pain and stiffness.

However, OA in the neck or lower back can cause loss of sense, pain, and

weakness in an arm or leg. The overgrowth of bone may presses on the nerves

within the spinal canal or before they exit the canal to go to the legs. This leg pain

caused by this reason may be confused by the reduced blood supply to the legs.

OA may be stable for many years or may progress very rapidly, but most often it

progresses slowly after symptoms are seen. Many people develop some degree of

disability. In [22] some practical ways to live with OA are advised to the patients.

•••• Exercise affected joints gently (i.e. in a pool)

•••• Massage at and around affected joints (trained therapist would do it better)

16

•••• Apply a heating pad or a damp and warm towel to affected joints

•••• Maintain an appropriate weight (extra stress on joints may be dangerous)

•••• Use special equipment when necessary (for example, walker, neck collar,

or elastic knee support to protect joints from overuse)

•••• Wear well-supported shoes or athletic shoes

2.1.3. Diagnosis

The diagnosis is made according to characteristics of symptoms, physical

examination, and the XR images. The XR images of many people aged about 40

show some evidences of OA especially in weight-bearing joints such as the hip

and knee. However, XR is not very useful for detecting OA early because it does

not show changes in cartilage, which is where the earliest abnormalities occur.

Magnetic resonance imaging (MRI) can reveal early changes in cartilage, but it is

rarely used because of expensive cost. There are no blood tests for the diagnosis

of OA. But there are some researches about detection of hyaluronic acid within a

blood sample [22].

Kellgren-Lawrence is a method used for radiological assessment of OA. Kellgren

and Lawrence defined this scoring according to these radiological features [21]

• The formation of osteophytes on the joint margins or, in the case of the

knee joint, on the tibial spines.

• Periarticular ossicles; these were found chiefly in relation to the distal and

proximal interphalangeal joints.

• Narrowing of joint cartilage associated with sclerosis of subehondral bone.

• Small pseudocystic areas with sclerotic walls situated usually in the

subchondral bone.

• Altered shape of the bone ends, particularly in the head of femur.

17

According to these features OA is divided into five grades as follows:

• None

• Doubtful

• Minimal

• Moderate

• Severe

Grade 0 indicates a definite absence of x-ray changes of OA.

Grade 1, doubtful narrowing of joint space and possible osteophytic lipping;

Grade 2, definite osteophytes and possible narrowing of joint space;

Grade 3, moderate multiple osteophytes, definite narrowing of joints space, some

sclerosis and possible deformity of bone contour;

Grade 4, large osteophytes, marked narrowing of joint space, severe sclerosis and

definite deformity of bone contour.

Figure 2.1 shows XR images of knee joints affected by different grades of OA.

18

Figure 2.1: OA of the knee [21]

2.1.4. Treatment

Exercises like stretching, strengthening, and postural exercises may help maintain

healthy cartilage, increase range of motion, and strengthen surrounding muscles.

Exercises must be balanced with rest of painful joints, but immobilizing a joint

may make the joint worse. Using excessively soft chairs, recliners, mattresses, and

car seats may worsen symptoms; using car seats moved forward, straight-backed

chairs with relatively high seats, firm mattresses, and bed boards is often

recommended.

For osteoarthritis of the spine, specific exercises may help, and back supports may

be needed when pain is severe. Exercises should include muscle strengthening and

low impact aerobic exercises (such as walking, swimming, and bicycle riding).

19

The patient should try to continue his/her normal daily activities such as a hobby

or job.

Physical therapy, often with heat therapy can be helpful. Heat improves muscle

function by reducing stiffness and muscle spasm. Massage by trained therapists

and deep heat treatment may be useful. Cold may be applied to reduce pain.

Splints or supports (such as a cane, crutch, and brace) can protect specific joints

during painful activities. Shoe inserts (orthotics) may help reduce pain during

walking.

Drugs are used to supplement exercise and physical therapy. Drugs do not directly

alter the course of osteoarthritis; they are used to reduce symptoms and thus allow

more appropriate exercises.

If a joint suddenly becomes inflamed, swollen, and painful, most of the fluid

inside the joint may need to be removed and a special form of cortisone may be

injected directly into the joint. This treatment may provide only short-term relief,

and a joint treated with cortisone should not be used too often or damage may

result. A series of injections of hyaluronate (a component of normal joint fluid)

into the joint may provide significant pain relief in some people for longer periods

of time.

The knee replacement surgery is another treatment method which is applied when

the moves become limited. The damaged knee joint may be replaced with an

artificial joint. After a general anesthetic is given, ends of the thigh bone (femur)

and shinbone (tibia) are smoothed so that the parts of the artificial joint (prothesis)

can be attached more easily. One part of the artificial joint is inserted into the

thigh bone, and the other part into the shinbone and the parts are cemented in

place. Figure 2.2 shows the knee replacement operation [22].

20

Figure 2.2: Replacing knee operation [22]

Surgery may help when all other treatments fail to relieve pain. Some joints, most

commonly the hip and knee, can be replaced with an artificial joint. It is usually

very successful to improve motion and decrease pain. Therefore, joint

replacement should be considered when function becomes limited. Because the

artificial joint does not last forever, such surgery is often delayed as long as

possible in young people so the need for repeated replacements can be minimized.

A variety of methods that restore cells inside cartilage have been used in younger

people with OA to help cure small defects in cartilage. However, such methods

have not yet been proven valuable when cartilage defects are extensive, as

commonly occurs in older people.

2.2. DATA COLLECTION METHODS

Kaufman states five existing technologies for data collection for gait analysis

[12]:

• Electromechanical linkage method

• Stereo metric method

• Roentgen graphic method

• Accelerometer method

• Magnetic coupling method.

21

An exoskeleton apparatus is employed with the electromechanical linkage

method to measure joint motion. The primary disadvantage for this technique is

the cumbersome nature of the instrument and, to a lesser extent, cross coupling of

the sensor inputs and joint motion. The requirement for the exoskeleton

instrument affects the motion of young subjects making it unusable for clinical

measurement.

The stereo metric method is the most popular one currently used for clinical gait

analysis. It employs visible markers attached to the skin on rigid segments of the

body structure and tracks their motion using imaging equipment. This technique is

implemented using charge coupled device (CCD) cameras and frame-grabber

electronics to allow digital images to be captured as the subject moves within the

field of view. Digital image analysis allows the physical location of each marker

to be computed, using triangulation of the views from an array of camera systems.

This technique has minimal impact on the natural motion of the subject and

allows data capture without the need to tether the subject to the data acquisition

hardware. Figure 2.3 shows a laboratory collecting data with stereo metric

method.

A disadvantage of this approach is the increased image analysis complexity

resulting from tracking the apparent position of the markers in a two-dimensional

(2-D) image on a camera frame-to-frame basis and correlating the position of each

marker for the multiple camera positions. Occlusion of markers from the camera

field of view and false readings caused by reflection phantoms pose non-trivial,

unresolved complications in data capture. In addition, passive markers provide

unlabeled trajectory segments that must be manually identified and resolved. This

image analysis task requires a significant amount of time for the data gathering

process. A second major disadvantage is the reduction in resolution as the camera

system is altered to allow a larger field of view. The camera imaging sensors have

a fixed number of pixel elements and a compromise must be reached between

optical field of view and pixel element resolution size, limiting the clinical

measurement volume to approximately a single stride. It is not feasible to measure

22

gait patterns or variability with only one traversal of the instrument walkway.

Thus, multiple walking trials need to be collected, which may fatigue the subject.

Figure 2.3: Data collection with stereo metric method (visible markers are attached to body of the subject) [12]

The biplanar roentgen graphic method employs metal markers and x-ray films

for the measurement of static positions of a body joint. This approach is not

appropriate for the study of dynamic joint motion. Due to the use of ionizing

radiation, it also represents a potential health hazard to the subject.

The accelerometric approach employs sensors attached to the rigid areas of the

human subject that measure accelerations in three dimensions. Joint motion is

then derived through integration of the accelerometer waveforms given

appropriate initial conditions. Integration of the waveforms produces velocities for

each of the sensor locations. A second integration step provides the displacement

as a function of time. This technique can provide the kinematics motion

measurement desired but has been implemented with a tether to the subject for the

data acquisition; however, the tether affects the motion of the subject and

represents an undesirable feature. In addition, this approach requires an accurate

estimate of initial conditions, which is difficult to provide.

The magnetic coupling method employs a reference magnetic field source that

surrounds the subject and an array of magnetic field sensing elements attached to

the rigid segments of the subject.

23

Recent rapid developments in hardware technologies created an attractive

environment for image processors. This also created opportunities for gait

analysis using video sequences [3-10]. Beside above methods, a clinical gait

analysis might be limited to a video recording and the measurement of certain gait

stride and temporal parameters such as velocity, cadence, stride length, step length

and percentage of stance/swing. While the video record is a useful tool in

developing and substantiating visual impressions, it is inappropriate to measure

joint and segment gait kinematics directly from the videotape or monitor. They do

not give an indication of the cause of the gait abnormality and so have limited

value in clinical decision-making.

2.3. PROPERTIES OF GAIT DATA

Studies of biomechanical factors in OA are mainly focused on the knee joint. This

is primarily for two reasons. First, the knee is the most common joint affected by

OA. Second, the anatomy of the knee joint is relatively simpler and more

amenable to biomechanical modeling and noninvasive evaluation than other joints

[20]. Therefore, the data in this study are primarily from measurements on the

knee.

In this study, the gait data are collected by the gait experts in Ankara University

Faculty of Medicine, Department of Physical Medicine and Rehabilitation Gait

Laboratory (shown in Figure 2.4). Before gait analysis, all subjects gave informed

consent as advised by the Ethics Committee. The socio-demographic and clinical

characteristics of the patients were also collected in the lab before the patients are

walked. Electronic format of the form filled by the patients to gather these data is

shown in Figure 2.9.

Collected information other than gait is converted to numerical values before they

are used as features. For example, while weight and height attributes are not used

for classification purpose, body mass index (BMI), which is equal to weight

divided by the square of the height, is created as a new feature. Age of the subject

is calculated from date of birth, disease periods are converted to months as unit.

24

Pain and morning stiffness are numeric values between 0 and 10, family history is

a binary value indicating whether the same disease exist in family history or not.

Sex is another binary valued feature where 0 stands for women and 1 for men.

Then, the first subset of the data can be defined as:

A = age, BMI, pain, stiffness, period, history, sex

The max and min values of the non binary features of the subjects used in this

study are shown in Table 2.1.

Table 2.1: Limits of the personal features

Normal subjects Patients

Features Min max average min max average

age 19 63 43 41 80 60

BMI 18 46 27 20 49 32

pain 0 0 0 1 10 6,6

stiffness 0 0 0 1 10 5,2

period (year) 0 0 0 0 30 6,6

Subjects underwent gait analysis with the same protocol by one and the same

physician. Spatiotemporal and kinematic data were obtained from the Vicon 370

Motion Measurement and Analysis System. This system consisted of 5 video

cameras, a computer system for data acquisition, processing and analysis and a

data station. The experimental model idealized the lower extremity as a system of

rigid links with spherical joints. The joints were assumed to have a fixed axis of

rotation. Skeletal movement can be described using surface markers placed in

precise anatomical positions.

25

4.1.

Figure 2.4: Data collection (a: gait analysis laboratory, b: a subject walking on the platform)

All subjects were instructed to walk at a self selected speed along the walkway

and to practice until they could consistently and naturally make contact with both

of the force plates. Three acceptable trials were obtained for each foot and

averaged to yield representative values. Time-distance parameters of the gait are

gathered at the end of one cycle. So the second set of the data is composed of time

distance parameters:

26

B = Cadence, Walking Speed, Stride Time, Step Time, Single Support, Double

Support, Stride Length, Step Length

The kinetic and kinematic features of the gait are gathered by 3D analysis of the

human body. Figure 2.5 shows three planes of the human motion. Flexion-

extension data is taken in sagittal plane, valgus-varus and abduction-adduction

data is taken in frontal plane and rotation data is taken in transvers plane.

Figure 2.5: 3D analysis of human body

External retro-reflective markers, used for computer digitization, were placed on

each of the following anatomic locations: anterior superior iliac spine (ASIS),

sacrum, lateral thigh, joint line of the knee, lateral shank, calcaneus, lateral

malleolus and second metatarsal head. The 3-dimensional position of each

reflective marker was sampled 60 times a second. Markers were placed on the

bony prominences to minimize artifacts due to skin movement. On the other hand,

these locations provided anatomic reference points to locate internally the joint

center position of the hip, knee and ankle. The hip joint center was determined

using leg length, inter-ASIS distance and ASIS-greater trochanter distance

calculated by Vicon Clinical Manager (VCM) [20]. The knee center was located

at one-half the knee width medially along the knee flexion axis. The ankle joint

27

center was located at one-half the ankle medially along the ankle flexion axis.

Figure 2.6 shows an example to marker adjustment screen of VICON software.

Figure 2.6: VICON Software marker adjustment screen

Besides kinematic variables, ground reaction force (GRF) data is also gathered in

one gait cycle. Kinetic variables, which are also important for diagnosis, are

calculated by using GRF as shown in Figure 2.7.

Figure 2.7: Calculation of kinetic variables [19] Moment = GRF x Distance

Power = Moment x Angular velocity

28

Ground reaction forces (GRF) were collected using two force plates (Bertec,

Columbus, OH). GRF measurements were acquired simultaneously with a

measurement of the limb position. Time-to second vertical force peak and values

of first and second vertical force peaks were determined. To calculate the

moments, each segment of the limb (thigh, shank and foot) was assumed to be a

rigid body with a coordinate system chosen to coincide with the anatomic axes.

Moments producing flexion-extension, abduction-adduction and internal external

rotation at the knee joint were calculated. Angular velocity and acceleration

around the longitudinal axis were assumed to be negligible. All moments and

ground reaction forces were normalized to body weight and height permitting

comparison with other results in the literature. Table 2.2 summarizes the motion

planes, the anatomic levels and the types of the 33 kinematic gait attributes used

in this study.

Table 2.2: Properties of the used gait attributes (“x”: exists, “-”: not exists, flex: flexion, abd: abduction, rot: rotation

Joint rotation

angles Joint net moments Joint net powers

Motion plane

anatomic level

flex abd rot flex abd rot total flex abd rot

Pelvic x x x - - - - - - -

Hip x x x x x x x x x x

Knee x x x x x x x x x x

Ankle x x x x x x x x x x

Then the final subset of the data can be defined as the temporal changes of the

joint angles from four anatomical level and three motion planes (Set C) as below.

C = PelvicTilt, Pelvic Obliquity Knee Flexion, Knee Varus, …

29

Each of these attributes in C above is represented by a graph that contains 51

samples taken in equally spaced intervals for one gait cycle. So the attributes for a

given subject can be arranged as a 33-dimensional vector X as below:

X = [X(1), X(2),…X

(33)] where

X(i) = [X

(i)1, X

(i)2,…X

(i)51]

X(i)j is the value of the i

th gait attribute at jth time period of the gait cycle. Figure

2.8 shows examples of a graphical representation of the joint angle attributes.

30

Figure 2.8: Examples to temporal gait attributes (data set C): The data is reported in 2-D charts with the abscissa usually

defined as the percentage of the gate cycle and the ordinate displaying the gait parameter.

31

2.4. DATA STORAGE

As explained above one of the advantages of gait analysis for researchers and

medical experts is opportunity of data storage for comparison and other purposes.

Gait laboratories need comprehensive, user friendly database systems for efficient

data processing. Since each laboratory uses different software, hardware and

biomechanics models, it improved its own particular databases and information

systems. Standardization of data storage methods is important for opportunity of

transferring data between different systems and for creating an international,

easily understandable gait terminology. Some institutions have recognized this

need and founded some standardization societies. GCMAS (Gait and Clinical

Motion Analysis Society) centered in USA and ESMAC (European Society of

Movement Analysis for Adults and Children) centered in Europe are most known

of these societies. But unfortunately a widely used international gait standards

could not be created until today. On the other hand, usage of a standard file format

(c3d) for data storage is becoming widespread [14].

In our gait laboratory collected gait data can be saved as MS Excel file, so the

access and the transfer of the data become easier. These files show the time-

distance parameters of the gait and temporal changes of the joint angles and

graphs of them. An excel file of a patient is shown in Appendix A of the report as

an example. Since there was no database keeping all available data in the system

earlier, it used to be difficult to combine personal information and gait data of the

patients for processing. The patients are asked to fill a form about personal

information and to take WOMAC (Western Ontario and McMaster University

Osteoarthritis Index) questionnaire before walking in the laboratory. Both of

these forms were kept in paper files, so it was mandatory to create an electronic

environment to safely save them for further analysis.

A comprehensive database is designed to automate data collection and query

processes in the scope of the thesis study. Electronic interfaces are created for

entering this information to the database. When the database is opened the first

form to be filled by the user is the patient record form, which is shown in Figure

32

2.9. Detailed information about OAGAIT database system is given in fourth

chapter of the report.

Figure 2.9: Patient recording form

33

CHAPTER 3

3. PATTERN RECOGNITION METHODS

3.1. FEATURES AND DIMENSIONALITY REDUCTION

When dealing with high-dimensional data, one faces with the well-known

problem of “curse of dimensionality”. The curse of dimensionality is a term to

describe the problem caused by the exponential increase in volume associated

with adding extra dimensions to a (mathematical) space [72]. In pattern

recognition view, the idea of the curse of dimensionality is that high dimensional

data is difficult to work with for several reasons [64, 65]. Most importantly the

need of exponentially increasing number of training samples with dimension.

Also, adding more features can increase the noise, and hence the error. There may

not be enough samples to get good recognition results. So, dimensionality

reduction is a commonly used step before classification in pattern recognition,

especially when dealing with very high dimensional feature spaces. The original

feature space is mapped onto a new, reduced dimensionality space and the

examples to be used by pattern recognition algorithms are represented in that new

space. The mapping is usually performed either by selecting a subset of the

original features or/and by constructing some new features.

The dimensionality of the feature space may be reduced by the selection of

subsets of good features. Several strategies and criteria are possible for searching

34

good subsets. In addition to the improved computational speed, an increase in the

accuracy of the classification algorithms is also expected with reduced feature set.

[64].

Another way to reduce the dimensionality is to map the data on a linear or

nonlinear subspace. This is called linear or nonlinear feature extraction. It does

not necessarily reduce the number of features to be measured, but the advantage

of an increased accuracy may still be gained. Moreover, as lower dimensional

representations yield less complex classifiers better generalizations can be

obtained [64].

There are some significant difficulties in the design of automated medical

decision support systems because of the multidimensional and complex structure

of the clinical data. So, feature reduction and selection always become an

important part of the medical data analysis studies.

3.1.1. Feature Extraction

In most pattern recognition problems the number of samples is smaller than the

number of row features due to practical reasons. In that case the actual feature

space is mapped to another one having fewer dimensions by minimizing the

information lost. There are some widely used mapping algorithms in statistical

pattern recognition like FFT, PCA, wavelet etc.

The most commonly used method for the feature extraction in gait classification is

based on the estimation of parameters (peak values, ranges) as descriptors of the

gait patterns. In that case the classification is done according to the differences

between the class averages of the training set and the parameters of the new

subjects [19, 31]. This method is subjective [31] and neglects the temporal

information of the gait data. There are examples of using statistical feature

reduction techniques in gait analysis, such as Fast Fourier Transform (FFT) [34,

36, 37], Principle Component Analysis (PCA) [19, 24, 31], wavelet transform [32]

and averaging [38, 39].

35

FFT:

A fast Fourier transform (FFT) is an efficient algorithm to compute the discrete

Fourier transforms (DFT) and its inverse [72]. FFTs are used in great variety of

applications like digital signal processing, solving partial differential equation,

and quick multiplication of large integers.

Fourier Transform maps a time series into the series of frequencies (their

amplitudes and phases) that compose the time series. Applications of Fourier

transforms in statistical pattern recognition and image processing include [72]:

• Filtering: Since taking Fourier transform of a function means to represent

it as the sum of sine functions, eliminating some high/low frequency

components and taking inverse Fourier transform produce an image

without noises.

• Image Compression: Since a filtered image contains less information than

a noisy image, encoding it requires fewer bits to represent than the original

image.

• Convolution and Deconvolution: Fourier transforms can be used to

efficiently compute convolutions of two sequences.

• Feature reduction for temporal features: FFT coefficients (which is

usually less than original number of time samples) are used for

classification [31, 34]

Let x0, ...., xN-1 be complex numbers. The DFT is defined by the formula

(Equation 3.1)

36

Evaluating these sums directly would take O(N2) arithmetical operations. An FFT

is an algorithm to compute the same result in only O(N log N) operations.

Since the inverse DFT is the same as the DFT, but with the opposite sign in the

exponent and a 1/N factor, any FFT algorithm can easily be adapted for it as well.

Averaging:

Averaging methods are similar to mean filtering methods in image processing.

Mean filtering is a simple, intuitive and easy to implement method of smoothing

and scaling images [71, 72]. The image is smoothed because the amount of

intensity variation between one pixel and the next is reduced. As an example, for

the scaling, to halve the size of the image each quad of four pixels is replaced by

one pixel with average of the four pixels. This simplest way of scaling images is

mostly recommended for downscaling.

-20

-10

0

10

20

30

40

50

60

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

-20

-10

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10

Figure 3.1: Averaging five consecutive time samples of two gait waveforms

37

This property of the averaging makes it usable for time sample reduction of

temporal features. Figure 3.1 shows an example to averaging five consecutive

time samples of two gait waveforms. The dimension of the temporal features is

reduced from 51 to 10 by averaging method.

3.1.2. Feature selection

Automated selection of the gait attributes is not observed in gait classification

literature; medical experts select them or previous studies are taken as references.

Actually, there are many medical practices, testing the variations in the gait

attributes which are caused by the related illness [20-30]. Non-automated

selection of the gait attributes are done by using the result of these studies. But it

may not always be convenient to work with a medical expert for the feature

selection process. Also, the judgments may vary in different experts leading the

different interpretations of the classifiers. Obviously, automated selection lessens

the dependence and the load on the experts and gives more freedom to the

researchers.

Feature selection also helps people to acquire better understanding about their

data by telling them that which are the important features and how they are related

to each other.

The feature selection process is simple defined as follows: given a set of candidate

features, select a subset that performs the best by a given classifier. This

procedure can reduce the cost of recognition and in most cases provide better

classification accuracy. There are some criterion functions for assessing the

goodness of a feature subset. Mahalanobis distance is one of these functions [64,

68-70]. The selection of the criterion function is very important. If we know

which classifier will be used in the problem, then the best criterion is the correct

recognition rate of that classifier. However, it is computationally time consuming

and very difficult to estimate correct recognition rate of the classifier with a

limited number of training samples. This is one of the reasons why Mahalanobis

38

distance, which gives an upper bound of the Bayes error rate with a priori

probabilities of classes, is used in many feature selection processes [69].

In statistics, Mahalanobis distance is defined as distance measure based on

correlations between variables by which different patterns can be identified and

analyzed. It is a useful way of determining similarity of an unknown sample set to

a known one. It differs from Euclidean distance is that; it takes into account the

correlations of the data set and is scale-invariant, i.e. not dependent on the scale of

measurements [65, 72].

Formally, the Mahalanobis distance is defined as

(Equation 3.2)

for a multivariate vector

with mean

and covariance matrix Ρ whose (i, j) entry is the covariance

( )( )[ ].jjiiij EP µχµχ −−= (Equation 3.3)

where )( ii xE=µ is the expected value of the ith entry in the vector X

Mahalanobis distance can also be defined as dissimilarity measure between two

random vectors and of the same distribution with the covariance matrix Ρ:

39

(Equation 3.4)

If the covariance matrix is the identity matrix, the Mahalanobis distance reduces

to the Euclidean distance. If the covariance matrix is diagonal, then the resulting

distance measure is called the normalized Euclidean distance:

(Equation 3.5)

Where, σi is the standard deviation of the xi over the sample set.

3.2. CLASSIFIERS

Here, we will discuss only the ones that are used in this study.

3.2.1. Tree Classifiers

Decision Tree Classifiers are used successfully in many diverse areas such as

radar signal classification, character recognition, remote sensing, medical

diagnosis, expert systems, and speech recognition. Perhaps, the most important

feature of decision trees is their capability to break down a complex decision-

making process into a collection of simpler decisions, thus providing a solution

which is often easier to interpret [61].

As an example, the decision tree in Figure 3.2 is constructed to decide whether the

weather is convenient for playing tennis or not. The weather attributes are

outlook, temperature, humidity, and wind speed and the target classification is

“yes” or “no”.

40

Figure 3.2: An example to decision tree classifier

Comparing to other classification methods the most advantageous differences of

decision trees are [62]:

• They produce understandable tree-structures which clarify the reasoning of

the method (many other techniques lack this and are harder to interpret)

• They can show the problem in as a disjunction of the hypotheses.

• They can be faster in the average than many other approaches.

A classification tree or a decision tree is an example of a multistage decision

process. Instead of using the complete set of features, subsets are used at different

levels of the tree. Three important properties of the decision trees are [42]:

• Decision trees are instable classifiers. Means they are capable of

memorizing the training data so that small changes in data might create a

different structure tree. Instability can be an advantage when ensembles of

classifiers considered.

• Since decision process can be traced as a sequence of simple decisions,

tree classifiers can be defined as intuitive. Tree can capture a knowledge

41

base in a hierarchical way; most popular examples are botany, zoology and

medical diagnosis.

• Both quantitative and qualitative features are suitable for building decision

tree classifiers. Binary features and features with a small number of

categories are useful because the decision can be easily branched out.

Since decision trees are not based on the distances in the feature space

they are regarded as nonmetric methods for classification.

A decision tree construction starts with the root and continues separating the parts

of the data to child nodes which is called splitting the tree. Splitting into small

parts continue until a termination criterion is met. A termination criterion may be

that all objects be labeled correctly. In this case the tree has to be pruned to

prevent overtraining.

One can reach from root of the tree to the final class label by asking small

number of questions at nodes (root is the top node of the tree) of the tree.

Depending on the answer a branch is selected and the related child node is visited.

Another decision is made at next node and the process continues until reaching to

a leaf (a terminal node). This leaf shows a class label which can be repeated at

other nodes. If the same number of branches are visited to reach to every leaf of

tree then tree is called balanced. Otherwise it is called imbalanced.

Figure 3.3 shows examples to the balanced and unbalanced trees. Imbalanced trees

indicate that objects near the classification boundaries may need longer decision

chains than the others [42].

42

Figure 3.3: Examples to a) balanced and b) unbalanced trees

Splitting Criteria

Consider a c-class problem with Ω = w1, w2, ..., wc. Let Pj be the probability for

class wj at a node t. These probabilities can be calculated by the proportion of the

points from related class within the whole dataset at that node. The impurity of the

distribution of the class labels at t can be measured in different ways.

Entropy based Impurity:

∑=

−=c

j

jj PPti1

log)( (Equation 3.6)

According to this formula impurity takes its minimum value when only one class

label exists at the node (0 log 0 = 0). The most impure situation occurs when the

classes have uniform distribution. In that case i(t)=log c

Gini Impurity:

∑=

−=c

j

jPti1

21)( (Equation 3.7)

A

B B

x y x y

A

B B

x y x y C

x y

43

For the most pure case again i(t) = 0. The highest impurity in the case of uniform

distribution is i(t) = (c-1)/c. The Gini index can be defined as the expected

classification error when a random class label is chosen from the distribution of

the labels at t.

Misclassification Impurity:

max1)(1

j

c

jPti

=−= (Equation 3.8)

Misclassification impurity gives the expected error if the node was replaced by a

leaf and the chosen label was the corresponding label of the largest Pj .

Gain:

Assume that the tree is split into child nodes based on the feature X. Then the gain

in splitting the t is defined as:

∑∈

−=∆Xv

vv tit

ttiXti )()(),( (Equation 3.9)

If the features are binary then splitting is easier; try each one in turn and choose

the feature with highest gain. However is the features are multiple categories or

continues valued, then an optimal threshold to split node is have to be found.

Among the above methods Gini index is the mostly used one [65]. The choice of

the impurity index is not seem to be very important for the success of the tree

classifier [65]. The more important issues are stopping criteria and the pruning

methods.

44

Stopping Criterion

The tree construction can be continued until there are no impure nodes. But this

time overtraining may be a problem for the test dataset. So the training should be

stopped before reaching pure nodes. But if the splitting is stopped too early the

tree may be under-trained. In [65] some options are listed to avoid this problem:

• Use a validation set

• Set a small impurity-reduction threshold. When the greatest possible

reduction of impurity is less than or equal to this threshold stop splitting.

But the problem here is to determine this threshold value.

• Set a threshold values for the number of point at a node

• Use hypothesis testing to see whether a one more split is beneficial or not

Pruning methods

Sometimes early stopping can prevent further beneficial splits. This phenomenon

is called horizon effect [65]. To avoid this one can construct full tree and then

prune it to a smaller size. The aim of pruning is to optimize training error and the

size of the tree.

Reduced error pruning is the simplest pruning method. An additional training set

(pruning set) is used for a simple error check at all non-leaf node. A node is

replaced with a leaf and labeled to the majority class. The error of the tree on the

pruning set is calculated and compared to the error of the first tree. If the new

error is smaller than the previous one, the node is replaced with the leaf.

Otherwise the sub-tree is kept [63].

In pessimistic error pruning method the same dataset is used for both constructing

and pruning the tree. If the number of errors at node is smaller than the number of

errors with a complexity correction at sub-tree of that node, then the node is

replaced by a leaf [63].

45

In critical value pruning a critical value is set as a threshold. The tree is checked in

a bottom-up fashion. If a node has a gain in error rate smaller than the critical

value is replaced by leaf [63].

In [63] two more pruning methods are defined: Cost-complexity pruning and

error-based pruning. According to this study critical value pruning and error based

pruning have tendency towards over pruning where as reduced error pruning has

opposite trend. Also it is concluded that, using an aside pruning set does not

always work. Methods used the whole trading set to construct and prune the tree

are found to be more successful.

3.2.2. Neural networks

Artificial neural network (ANN), which is often called Neural Network (NN), is a

mathematical model created by inspiration of biological neural networks. A NN is

created by artificial neurons by connecting them to each other in different

fashions. The information flows through these connections and update the

structure of the networks. So NNs are defined as adaptive systems.

In pattern recognition view, NNs are non-linear statistical data modeling tools.

They can be used to model complex relationships between inputs and outputs or

to find patterns in data.

Neurons

The basic schema of a neuron is shown in Figure 3.4.

46

Figure 3.4: A simple neuron model

Let u = [u0,u1,…uq] Є Rq+1 be the input vector, v Є R be its output.

The vector w = [w1,w2,…wq]T Є Rq+1 is synaptic weights. Where

v = φ(ε) and ∑=

=q

0iiw iuε where φ is the activation function and ε is the net

sum.

The activation function may be hard-limit, linear or sigmoid function. The

sigmoid function is the most used one, because;

• It can model both linear and hard limit (threshold) functions. It is almost

linear near the origin and hard limited for large weights.

• It is differentiable, which is important for the training algorithms.

Perceptron

The simplest kind of neural network is a perceptron network, which consists of a

single layer of output nodes. The inputs are given directly to the outputs via

related weights. The sum of the products of the inputs and weights are calculated

and the output is produced according to threshold activation function.

w0

w1

wq

u0

u1

uq

Σ v

47

This one-neuron linear classifier can separate two classes. The weights are

initialized randomly and modified as each sample is subsequently presented to the

inputs of the perceptron. The modification occurs only if the current ample is

misclassified. Perceptron training has following properties [42]:

• If the classes are linearly separable the algorithm always converges in a

finite number of steps.

• If the classes are not linearly separable the algorithm will enter to a loop

and never converges.

Multilayer Perceptron

By connecting two or more perceptron one can construct a Multilayer Perceptron

(MLP). MLP has a feedforward structure, means all units in input layer and

hidden layers are submitted to the only higher layer. A generic example of a MLP

is shown in Figure 3.5.

Figure 3.5: An example to multilayer perceptron with one hidden layer.

The number of hidden layers and number of nodes is not limited, but there are lots

of studies to find best numbers. In late 80s it was shown that an MLP with two

48

hidden layers with threshold nodes can approximate any classification problem

[42, 65]. In F

Figure 3.6 classification regions that could be constructed by one, two and three

layers are shown [42]. Later, it is proven that even an MLP with single hidden

layer can approximate any function [42].

NN topolo

gy

Cla

ssific

ation

regio

n

Figure 3.6: Examples to classification regions by one, two and three layer MLPs [42]

Most common properties of MLPs are [42, 65]:

• The activation function at input layer is the identity (linear) function

• There are no connections between the nodes at same layer (feedforward)

• There no connection between the nodes at nonadjacent layers

• All the nodes at all hidden layers have the same activation function

Multi-layer networks use a variety of learning techniques; the most popular of

these is backpropagation algorithm. For training of the neural networks, the output

49

values are compared with the correct answer to compute the error function. Then

this error is fed back to the network by various methods. Algorithm adjusts the

weights of each connection in order to reduce the error some small amount. To

adjust weights a general method for non-linear optimization that is called gradient

descent is used. For this, the derivative of the error function with respect to

weights is calculated and the weights are updated to decrease the error. So the

activation function of the network applying backpropagation should be

differentiable. Repeating this process for a sufficiently large number of training

cycles the network will usually converge to some state where the error is small.

The network converged this final state is said that it has learned the target

function.

If a network is trained by very limited number of training samples it can overfit

the data. So the network can not perform well on test data set. Some special

methods should be applied to avoid overfitting. Other problems of network

training are speed of the convergence and stopping convergence in a local

minimum. This causes networks to take a non-optimum state. Decreasing or

increasing the number of hidden layers and nodes may prevent the local minima

problem. Rerunning the algorithm may also work because the weights will be

reinitialized to a different numbers.

3.2.3. Combining Classifiers

The concept of combining classifiers is proposed as a new direction for the

improvement of the performance of individual classifiers. These classifiers can be

based on a variety of classification methodologies, and could achieve better rates

than individual classifiers. The goal of classification result integration algorithms

is to generate more certain, precise and accurate system results. Dietterich [55]

provides an accessible and informal reasoning as shown in Figure 3.7 from

statistical, computational and representational viewpoints, of why ensembles can

improve results.

50

Figure 3.7: Reasons why an ensemble classifier may be better than an individual one [55]

• Statistically: Instead of selecting a single classifier one option may be use

them all and average their outputs. The new classifiers may not be better

than the single best classifier but the risk of selecting an inadequate

classifier is eliminated.

• Computational: Assuming the training process of each classifier start

somewhere in the space and end closer to the best one (f), combining them

may cause to better approximation than a single classifier.

• Representational: Training an ensemble of simple classifiers to achieve a

high accuracy is more straightforward than training a single more complex

classifier.

51

There are several ways of creating multiple classifier system. In [73] three broad

categories are defined.

Different feature spaces: This describes the combination of a set of classifiers,

each designed to use different feature spaces. For example in a person verification

application, several classifiers may be used for different sensor data like retina

scan, facial image etc.

Common feature spaces: This describes the combination of different classifiers

trained on the same feature space. The classifiers can differ from each other in

some ways.

• The classifiers may be of different type, for example nearest neighbor,

neural network, decision tree etc.

• They may be similar types but use different part of training set.

• They may be similar type but use different initialization parameters, for

example weight initialization of neural networks.

Repeated measurements: This category of combination is about different

classification of an object through repeated measurements.

3.2.4. Combination Schema

Combination schemas may be classified according to some characteristics

including, level of combination, structure, form of classifiers and training styles.

Level of combination

Combination may be done at different levels as suggested by Kuncheva [42] in

Figure 3.8.

52

Figure 3.8: Approaches for building ensemble classifiers (differentiation at each level may be considered as an approach) [42]

Data level: Raw measurements are given to the combiner that produces posterior

probabilities of class membership. This requires defining a classifier on all sensor

variables. Different datasets may be created by also different preprocessing

methods.

Feature level: Each of features may have its own techniques for reducing

dimension. Different classifiers may perform some local preprocessing on feature

subsets.

Classifier level: Using different base classifiers is a common approach in

combination schemas. Different base classifiers may be preferred due to various

structure of the feature set. Another approach may be differentiating the same

classifier by changing some parameters of it, for example changing initializing or

training parameters of a neural network.

Combination level: Most important issue in combining classifiers is the way

they are combined. According to [43, 71] there are two types of combining rules,

Combiner

D1 D1 D1

X

Dataset

Combination

level

Classifier level

Feature level

Data level

53

trained and fixed rules. Trained combiners are different than fixed combiners in

methods of producing final decision. After gathering outputs of the base

classifiers, they are used as an input vector of the combining classifier. The

training set is used for both training base classifiers and combining classifier.

The fixed combining rules make use of the fact that the outputs of the base

classifiers are not just numbers, but that they have a clear interpretation: class

labels, distances, or confidences. The confidence is sometimes interpreted or

generated by fuzzy class membership functions sometimes by class posterior

probabilities. Majority vote, product rule, sum rule, maximum rule are examples

of fixed rules.

For example if only labels are available a majority vote is used [44]. Sometimes

label ranking may be preferred. If continuous outputs like posterior probabilities

are gathered, a linear combination like sum or average may be used. Moreover, it

is possible to train a classifier with the output of another as new features [43].

The structure of a multiple classifier system may be discussed by three style [71].

In parallel combining, results from the base classifiers are passed to the combiner

together. In serial combining the base classifiers are invoked sequentially. In

hierarchical combining, the classifiers are combined in a hierarchy, with the

outputs of one base classifier are given to another one as inputs, in a similar

manner to decision trees.

54

CHAPTER 4

FEATURE SELECTION AND

CLASSIFICATION (GRADING)

4.1. INTRODUCTION

In this study the grading algorithm is designed as a combination of different types

of classifiers, which allows including all available data in decision making

process. Before finalizing this form of the grading algorithm many experiments

have been implemented by different classifiers, different feature

reduction/selection methods. The results of these experiments were helpful for

design of the final algorithm.

• In first trials, the gaits of 111 patients with 110 age-matched normal

subjects are compared. Two different feature reduction techniques, FFT

and averaging are compared by performances of some well known pattern

classifiers. The MD measure is used as an individual feature filtering

criterion and most discriminatory features are determined. Then a set of

linear and non-linear classifiers is tested by a ten-fold cross validation

approach. The details of this trial are given in Section 5.2.

• In second experiment, two popular methods of combining neural networks

are implemented for discrimination of normal and sick patterns. The

55

results of classifiers are compared with different output combining rules.

The details of this trial are given in Section 5.3.

• Finally, a decision tree MLP combination is implemented for grading of

the knee OA. Automated feature selection is used for composing datasets

to train MLPs responsible for discriminating neighbor classes. Last section

of this chapter is about design and implementation stages of this algorithm.

4.2. STATISTICAL ANALYSIS OF GAIT DATA: COMPARISON OF

FEATURE REDUCTION/SELECTION AND CLASSIFICATION

ALGORITHMS

The objective of this experiment is to compare the convenient methods for

preprocessing (feature reduction/selection), classification and further analysis

(such as learning curves of the classifiers) of the gait data. For this purpose two

feature reduction techniques (averaging and FFT) are compared by performances

of some well known pattern recognition classifiers. The MD method is used as an

individual feature selection criterion and the most discriminatory features are

determined automatically and the selected attributes are compared with the ones

suggested by previous OA classification studies to discuss the differences of

automated and non automated selection procedures. Next, a set of linear and non

linear classifiers is tested on datasets with different dimensionalities by a

crossvalidation approach. Finally, learning curves of some classifiers are

compared to discuss data size issues (the number of subjects in the training set)

for further studies.

The classification and feature selection algorithms are used from PRTools which

is a Matlab based toolbox for pattern recognition. PRTools supplies about 200

user routines for traditional statistical pattern recognition tasks. [67].

56

4.2.1. Feature reduction and selection methods

In this experiment, the data that were formerly collected in Ankara University

Faculty of Medicine gait laboratory from 110 normal and 111 OA patients are

used. All joint angles features from both kinetic and kinematic domains are

included in feature reduction and selection processes.

Data collection process produces a dataset for each subject, including 33 gait

attributes, each having 51 sample points in time, as explained before. Combining

these files into a complete dataset, we got 33 (attributes) x 51(time samples)

dimensional arrays for each subject. The final dataset is thereby composed of 221

subjects presented by 1653 points in feature space. Since the total number of the

features is too large relative to the number of subjects, most of the commonly

used classifiers will suffer from the curse of dimensionality [64, 65]. So, a

reduction in the number of features is needed before the classification process.

Two different reduction techniques are applied to the same dataset for

comparison. Six datasets of the different dimensionalities are composed by

averaging consecutive time sample points for reducing the size of the feature

vectors. Also, FFT is applied to each waveform and each attribute is represented

by one, five, ten or 25 FFT coefficients rather than 51 time samples. At the end of

the reduction process, ten datasets of the different dimensionality are created.

Most of the datasets still have a too high dimension, which forces elimination of

the redundant features. Since the term “features” here represent the time samples

of the gait attributes, they have no meaning by themselves. So the selection is

done by the gait attributes, not by the features. The MD measure is used as a

selection criterion (detailed information about MD based feature selection is given

in Section 3.1.2) Individual performances of the each gait attributes to

discriminate two classes are compared. Instead of the individual selection, a

forward or a backward selection can also be tried in order to avoid the selection of

the similar attributes. However, since the data is from different motion planes and

different anatomic levels of the body, to have similar attributes is not very

probable.

57

Table 4.1 summarizes how these datasets are created. The values on this table

represent the MD values produced by each gait attribute with shown number of

time samples. The marked values show the selected attributes for creating

corresponding dataset. All of one dimensional attributes are used for dataset

creation.

The number of selected attributes is limited to reach the best ratio of the number

of subjects and the number of features, which is suggested as one over five in

[64]. Except for the all-mean datasets, the dimensions of the datasets are fixed to

50, since it is reachable by integer number of the attributes for all datasets and

about the ideal ratio (which is 44,2 here). For feature selection distmaha property

of the PRTools to calculate MDs of classes in a dataset is used, as an example

code segment is shown in Table 4.2. Table 4.3 shows the properties of the created

new datasets by combining the selected best features and the MD values produced

by these new datasets.

58

Table 4.1: Mahalanobis Distance values of gait attributes

59

Table 4.2: An example to feature selection by distmaha function of PRTools

for i = 1 to #gait_attributes

1. x = featfft5 (:,:,i);

2. y = dataset (x, labs);

3. Dm = distmaha (y);

4. Dmfft5(i) = Dm (1,2);

End

1. x is an array composed by first 5 fft coefficients of the ith attribute

2. Convert x to dataset y by presenting class labels array labs

3. Dm is a 2x2 symmetric matrix where Dm(i,j) represent the Mahalanobis Dist. of classes are written to a one-dimensional array

4. Mahalanobis Distance of classes i and j of dataset y.

Here the naming procedure for datasets are based on the number of time samples

and feature reduction criteria, for example “BesFFT10” represent the dataset that

the time sample reduction is done by FFT algorithm and each attribute is

represented by 10 time samples.

Table 4.3: Datasets after feature reduction and selection processes

Composed dataset

# time samples to represent gait

attributes # selected gait attributes

Dimension of the dataset

Mahalanobis

Distance

Best51d 51 1 51 14.568

Best25d 25 2 50 18.273

Best10d 10 5 50 23.545

Best5d 5 10 50 19.804

Best2d 2 25 50 20.387

Best1d 1 33 33 14.577

BestFFT25 25 2 50 13.244

BestFFT10 10 5 50 15.589

BestFFT5 5 10 50 18.029

BestFFT1 1 33 33 10.625

60

As Table 4.3 shows, datasets with about the same dimensionality are composed of

different numbers of gait attributes represented by different numbers of time

samples. So at the end of the classification process it will be possible to discuss

whether the number of gait attributes is more important than the number of time

samples for classification accuracy.

Comparing two reduction techniques based on the MD criterion, averaged

datasets perform better than the corresponding FFT based dataset. While

composing these new datasets, the MD values of the attributes are ordered and the

required number of the best of them is added to the new dataset. So, while some

of the gait attributes may appear in many of the datasets, some may not. As the

table shows, two of ten datasets are created by including all attributes and eight of

them are created by selecting the best ones. The number of appearance times of

attributes in eight datasets are used to compare the discriminatory ability of them

for the classification of the gait patterns of OA patients (only 4 and above are

showed). Appearance times of the gait attributes in eight datasets are:

• KFlex (Knee Flexion): 7

• HMAbd (Hip Abduction Moment): 7

• KMFlex (Knee Flexion Moment): 5

• KRot (Knee Rotation): 4

• KMVal (Knee Valgus Moment): 4

In other trial of the study, the gait attributes were selected by an expert physician

who has suggested four knee-related attributes [38]. All of these four attributes are

appeared in most of the current datasets, too, and moreover three of them are

included in the above table as the most apparent attributes (KFlex, KMFlex,

KMVal). It can be concluded that our selection criterion approximates the expert

knowledge and so contributes to the validation of the approach.

61

4.2.2. Comparing Classifiers

As an initial study for classifier selection, s set of linear and nonlinear classifiers

is tested by a ten-fold crossvalidation method. PRTools [67] is used for classifier

construction. Total of nine classifiers are used by some adjustments, for more

information see also [64, 65, 67, 71].

1. Logistic Linear Classifier (loglc)

2. Support vector classifier (svc)

3. Linear Bayes Normal Classifier (ldc): log function is used for adjustment

4. Quadratic Bayes Normal Classifier (qdc)

5. Back-propagation trained feed-forward neural net classifier (bpxnc): 1

hidden layer with 5 nodes

6. Levenberg-Marquardt trained feed-forward neural net classifier (lmnc): 1

hidden layer with 5 nodes

7. Automaric radial basis SVM (rbsvc)

8. Parzen Classifier (parzenc): Datasets are scaled and log functions are used

9. Parzen density based classifier (parzendc): Datasets are scaled

Since density based classifiers (ldc, qdc, parzenc) suffer from a low numeric

accuracy in the tails of the distributions ‘log’ function is used to compute log-

densities of them. This is needed for overtrained or high dimensional classifiers.

Almost zero-density estimates may otherwise arise for many test samples,

resulting in a bad performance due to numerical problems. Loglc, and SVC, are

other linear classifiers added to set. Loglc is a linear classifier that maximizes the

likelihood criterion using the logistic (sigmoid) function. SVC is a linear support

vector classifier maximizing the distance between support vectors of two classes.

62

For neural network classifiers (lmnc, bpxnc), defaults for the numbers of hidden

layers (one) and hidden nodes (five) are used, and the optimization of weights is

done by the Matlab Neural Network Toolbox. Other nonlinear classifiers added to

set are rbsvc and parzendc. rbsvc is a support vector classifier having a radial

basis kernel. Parzendc is a density based classifier using different kernels for the

density estimated for each of the classes. Figure 4.1 shows the error rates of these

classifiers for the averaged and FFT based datasets, respectively.

0

0,05

0,1

0,15

0,2

0,25

best1d best2d best5d best10d best25d best51d

loglc

svc

ldclog

qdclog

bpxnc

lmnc

rbscv

parzenlog

parzendc

0

0,05

0,1

0,15

0,2

0,25

best1ffd best5fftd best10fftd best25ff td

loglc

svc

ldclog

qdclog

bpxnc

lmnc

rbscv

parzenlog

parzendc

Figure 4.1: Error rates of the classifiers for the a) averaged datasets b) FFT applied datasets

63

As can be seen in the figure the averaged datasets perform better than the ones

composed of FFT coefficients. One of the best datasets (best5d) is selected for

further analysis of the gait data.

Figure 4.2: Learning curves for some classifiers

4.2.3. Results of comparisons

The results of the experiment performed in this study are important for setting a

base for further study of OA grading. Starting from the beginning of the study two

different feature reduction techniques are compared first by the MD criterion and

then by performances of the classifiers. It can be observed that datasets created by

FFT techniques produced worse results in MD calculations and also in the

classification process. Even the datasets composed by averaging all time samples

produced reasonable classification rates, which makes clear how important

activities in the measured angles are for diagnosis. Temporal information of the

waveforms is not so significant for the classifier performances. Since FFT

coefficients represent the changes of the data in time, the higher error rates for

these datasets support the derived result. Also, the severe difference between the

performances of the datasets, best1d (all gait attributes with one time sample), and

64

best51d (one gait attribute with all time samples) shows that including more gait

attributes is more informative than including more time samples.

We found a high match between currently selected features on the basis of the

MD and the ones suggested by gait analysis expert. In the current study, besides

the knee features a hip related feature (HMAbd) appeared to perform as well as

the best knee related feature (KFlex). The high discriminatory ability of this

feature shows that the knee OA causes high variation in hip abduction moments as

much as in knee flexion moments of the patients. Moreover, the dataset selected

as the best one (best5d), includes data about the pelvic besides the hip and knee

related ones (PTilt, PRot, HFlex, HAbd, KFlex, KVal, KRot, FDor, HMAbd,

KMVal). To be able to find variation in all levels and motion planes of the

subjects automatic feature selection may be preferred.

Comparing the performances of the classifiers on the basis of the current number

of subjects, it may be concluded that nonlinear classifiers performed quite well

and better than the linear ones. We have compared the learning curves of the

classifiers to investigate whether more data might be helpful. Figure 4.2 shows,

Backpropagation Neural Network (bpxnc) and Parzen density based (parzendc)

classifiers converge faster than the others. Therefore, more data may increase the

performances of the linear classifiers more than the nonlinear ones. We have also

observed that high regularization prevents linear classifiers learning from more

data. Considering the training costs of the algorithms, linear classifiers with a

convenient regularization rate may be included in the further studies with more

data.

These experiments showed us that statistical pattern recognition algorithms

produce promising results for automated analysis of the gait data.

4.3. COMBINING MLPS FOR GAIT CLASSIFICATION

The objective of this part of study is to design a classification algorithm for

discrimination of normal and sick gait patterns. The accuracy of the proposed

system will be safeguarded by using all gait features. To be able to combine all

65

features in one classification system, combination methods are expected to be

most suitable. As our previous studies and similar studies proved MLP usage for

gait classification produces reasonable results.

As the dimension of the features and the size of the data increase same accuracies

may not be guaranteed. In similar pattern recognition studies this problem is tried

to be solved by combining classifiers. Combination of NNs [45-51] are widely

used today especially in speech recognition and character recognition studies and

they have showed an increase in the performance of the classifiers. In [48]

Sharkey made a comprehensive experiment to compare two different NNs

combining methods; modular and ensemble ones. She concluded that using an

entire set for training produces more accurate results than decomposing it. In this

study comparison of these two approaches are done in the context of gait

classification. There are also different approaches on combining outputs of

classifiers. In [43] the authors have comparative studies on efficiency of output

combination rules such as majority voting, sum, product, max., and min. rules. In

[43], they concluded that sum rule is superior to others in most of the cases.

A group of MLPs are used to classify the subjects as healthy or sick, using

temporal changes of knee joint angle and time-distance parameters as features.

Two different NNs combination methods are tried. In the first experiment data set

is decomposed into five different sets and five MLPs are trained and tested by

these sets. Then test set results are combined by sum, majority vote and max rules

to produce final class label. In the second experiment, entire data set is used to

train three different architectural MLPs and again outputs are combined by three

different rules and accuracy rates on test set are compared.

4.3.1. Dataset Properties

In this study, decision of which gait attributes to use is done by medical expert,

and four knee related attributes are selected; knee flexion, knee flexion moment,

knee valgus moment and total knee power graphs of which are shown in Figure

66

4.3. In addition, walking velocity, single support and step length are selected as

the time-distance parameters of the gait.

Figure 4.3: Graphs of the gait data (healthy (a) and knee osteoarthritis (b))

Each of joint angle related features are represented by a graph that contains 51

samples taken in equally spaced intervals in the time for gait cycle, which is the

time spent for one step. These points composed feature vectors which are used as

inputs of the related MLP. On the other hand, time-distance parameters are static

numerical values which are also used to train another MLP.

Table 4.4: Dataset characteristics

#TRAIN #TEST FEATURE VECTOR

(FV)

DATASET #SMP. H S H S

FV1 KFlex: Knee flexion/extension 51

FV2 KMFlex: Knee flexion/ extension moment

51

FV3 KMVal: Knee Valgus Moment 51

FV4 KPTot: Total Knee Power 51

FV5 Time-dist: Velocity, single support, step length

3

FV6 Entire set (all of above) 207

61 77 30 33

67

Before passing to classification phase data is cleaned by eliminating rows having

missing values. Finally, 91 healthy and 110 sick subjects’ data is prepared for

classification purpose and shared for training and testing purposes as shown in

Table 4.4 (H: healthy, S: Sick, SMP: Samples).

4.3.2. Classification by Combining MLPs

The basic classifier structure, used in this study is MLPs combination.

Weaknesses of each classifier are diminished by combining classifiers, and more

accurate results are expected. In [48], two methods are described for combining

multiple networks. The first one is the modular approach, in which the task is first

decomposed into several subtasks and a specialist network is then trained using

the inputs pertaining to the corresponding subtask. The second approach is the

ensemble one, in which each network is trained using the same inputs and

provides a different solution to the same task. Outputs from these networks are

combined to reach an integrated result. Complexity is an important issue to be

considered in this case. Differentiation among classifiers may be done by using

initial random weights, different topologies, and varying the input data.

As stated previously the final data that is used here has five feature vectors; four

for temporal changes of knee joint angle (KFlex, KMFlex, KMVal, KPTot) and

one for time-distance parameters. Before training all data sets are scaled to

interval [-1, 1]. Totally eight MLPs are trained using MATLAB neural network

toolbox. These MLPs are combined in different schemas for experiment 1 and

experiment 2 as shown in Figure 4.4. Table 4.5 and Table 4.6 show the topology

of each network and their individual success rates on test data. For the first five

networks number of hidden nodes and hidden layers are determined

experimentally.

68

Figure 4.4: MLP combination schemas for experiment 1 (a), and experiment 2 (b) H/S: Healthy or sick, FV: Feature vector

Experiment 1: Input data is decomposed in five sets composed of different feature

vectors. Five MLPs are trained by these input sets and then outputs of test set are

combined by three different combining rules to reach a final result. So, accuracy

of different combining rules is compared.

Table 4.5: Properties of MLPs used in experiment 1

#NODE NETWORK

input hidden1 hidden2

#MIS-CLASSIFIED

SUCCESS RATE (%)

MLP1 51 35 10 10 84

MLP2 51 35 10 8 87

MLP3 51 35 10 15 76

MLP4 51 35 10 18 71

MLP5 3 2 - 13 79

Experiment 2: Three different MLPs are trained by using the same composite

input set without any decomposition. Here, differentiation of each network is done

by different number of hidden layers and hidden nodes.

In both experiments different combination approaches are used, but in both cases

combining outputs of classifiers became and important issue. In this study three of

these rules, sum, majority vote and max rules, are experimented and results are

compared by success rates on test data set.

69

Table 4.6: Properties of MLPs used in experiment 2

#NODE NETWORK

input hidden1 hidden2

#MIS-CLASSIFIED

SUCCESS RATE (%)

MLP6 207 50 - 6 90

MLP7 207 150 40 6 90

MLP8 207 207 50 7 89

After training each network with corresponding input set, test data are presented

and the outputs are normalized to use them as posterior probabilities. Since tansig

function is used as the activation function in all layers of networks, outputs are in

interval [-1, 1]. To normalize an output, its absolute value is taken as posterior

probability, and its sign is taken as class label (i.e minus sign is for normal and

plus sign is for sick subject). Then, its 1-complement is recorded as posterior

probability of the other class. Thus, sum and max rules for combining outputs can

be applied.

For sum rule, created posterior probabilities are added up for two classes and

higher value determined the class label. In max rule, the network, producing the

maximum of posterior probabilities determined the class label and the others are

ignored. To find the majority vote, each networks’ output is converted to class

labels by applying a threshold and three agreeing classifiers determine the class

label of the test datum. Table 4.7 shows the obtained success rates on test set by

applying these combining rules.

Table 4.7: Success rates (number and percentage) for combining rules

Combined networks

MLP1-MLP5 MLP6-MLP8 Combining rule

#misclassified success rate (%)#misclassified success rate (%)

Sum 4 94 6 90

majority vote 5 92 6 90

Max 5 92 6 90

70

As seen in Table 4.6 MLPs used in second experiment produced better

performances which are expected. However, when the dimension of the dataset

increased, and so there are more parameters (like weights) to be tuned, a local

extreme of the error function is likely to be found. An ensemble of simple

classifiers might be better option for such problems [42]. Combining simple

classifiers require condition of diversity. Obviously combining identical

classifiers does not contribute to accuracy. To check whether our five MLPs have

identical classification results or not, we have tested whole dataset by

crossvalidation approach and seen that only % 1.5 of the subjects has been

misclassified by all classifiers, which proves the disagreement of the classifiers.

According to the test set results confusion matrices of the MLPs are created to see

where the misclassifications have occurred. Table 4.8 shows these confusion

matrices where positive (P) means sick subjects.

Table 4.8: Confusion matrices for used MLPs

Predicted

MLP Actual Negative (N) Positive (P)

N 21 9 MLP 1

P 1 32

N 28 2 MLP 2

P 6 27

N 26 4 MLP 3

P 11 22

N 29 1 MLP 4

P 17 16

N 26 4 MLP 5

P 9 24

71

These matrices proved that while some MLPs are successful at malfunction

detection, others are good at separating normal subjects. These properties of

simple classifiers support the idea that combining them increases the classification

accuracy. To test this idea, these MLPs are combined by different combining

rules. In this study three of these rules, sum, majority vote and max rules, are

experimented and results are compared by success rates on test data set.

According to these results, it can be concluded that the best individual

performance is produced by MLP6 and MLP7 in which entire data set is used for

training and testing purpose. However, as the dimension of the data and relatively

network size increase, complexity becomes an important drawback. Since it is

difficult to process a large set of data training time increases. However, smaller

MLPs which use only one feature vector produce less accurate results and

combining their outputs increase the accuracy reasonably.

In addition, combining outputs do not increase the accuracy in experiment 2 as

much as in the first one. Increasing the number of networks does not cause any

improvement after an optimum number, which is “three” in our experiment.

The combining rules show equal performance in experiment 2, but in experiment

1 sum rule is superior to others. Then, as complexities are considered combining

many small networks may be preferred when dealing with large dimensional data.

The confusion matrices suggest that further study is needed for the effectiveness

of the selected features.

4.4. A DECISION TREE-MLP MULTICLASSIFIER FOR GRADING

KNEE OA

This part of the study presents the ultimate algorithm behind OAGAIT clinical

decision support system for the detecting and grading of a knee OA. The objective

of this study is to design a classification algorithm to help physicians by

interpreting and further following the progress of OA. The accuracy of the

proposed system is expected to be improved compared to our previous work as

discussed above by using symptoms and history in addition to numeric gait data,

72

with a multi-classifier approach. In the previous studies, selected knee joint angle

features are used to train a single neural network, namely a 3 layer perceptron

(MP) and an 89% success rate was achieved for classification of healthy and sick

patterns [38]. In the next stage of the study, MLP’s with identical topology are

trained by different feature sets and the outputs are combined by fixed combining

rules. This time the success rate of the classifier reached to 94%, for again binary

classification [39], which suggested that the use of combination classifiers may be

used in further work.

Sociodemographic and disease characteristics such as age, body mass index and

pain level are also included in decision making. A grade of the OA (0-3, including

normal with grade zero) is sought. The grade of the disease is already determined

by radiographic methods.

Different types of classifiers are combined to incorporate the different types of

data and to make the best advantages of different classifiers for better accuracy. A

decision tree is developed with Multilayer Perceptrons (MLP) at the leaves. This

gives an opportunity to use neural networks to extract hidden (i.e., implicit)

knowledge in gait measurements and use it back into the explicit form of the

decision trees for reasoning. The approach is similar to the Mixture of Experts

method since different expert MLP’s are used for discriminating different grades

(our categories) of the disease. Individual feature selection criterion is used with

MD measure for feature selection and most discriminatory features are used for

each expert MLP.

Automatic feature selection from many numerical gait parameters is another

subject that’s not studied well before. In this experiment automatic feature

selection process produced some valuable results for further analysis of the

progress of OA.

The main stages of this experiment are shown by a flowchart in Figure 4.5. The

processes followed within these stages are given in detail in following sections.

73

Figure 4.5: Flowchart for classifier design process

74

4.4.1. Preprocessing of data

As explained in detail in chapter 2 of the report, the gait data is mainly composed

of three different sets of data. The first subset (Set A) of the data is about

symptoms and history of the subjects and defined as:

A = age, BMI, pain, stiffness, period, history, sex

In preprocessing steps, some of these variables are converted to numerical values

before they are used as features. For example, while weight and height attributes

are not used for classification purpose, body mass index (BMI), which is equal to

weight divided by the square of the height, is created as a new feature. Age of the

subject is calculated from date of birth, disease periods are converted to months as

unit. Pain and morning stiffness are numeric values between 0 and 10, family

history is a binary value indicating whether the same disease exist in family

history or not. Sex is another binary valued feature where 0 stands for women and

1 for men. The distribution of the subjects for these dataset are shown in Table

2.1 by summarizing min, max and average values of the features.

Table 4.9: Limits of the personal features

Normal subjects Patients

Features min max average min max average

Age 19 63 43 41 80 60

BMI 18 46 27 20 49 32

Pain 0 0 0 1 10 6,6

Stiffness 0 0 0 1 10 5,2

period (year) 0 0 0 0 30 6,6

75

The second set (Set B) of the data is composed of time distance parameters which

are gathered in one cycle of gait.

B = Cadence, Walking Speed, Stride Time, Step Time, Single Support, Double

Support, Stride Length, Step Length

The final subset of the data can be defined as the temporal changes of the joint

angles from four anatomical level and three motion planes (Set C) as below.

C = PelvicTilt, Pelvic Obliquity Knee Flexion, Knee Varus, …

These sets of data are grouped according to usage purpose of the data. First two

sets are combined for decision tree training and the third one is for MLPs training.

Before going to further steps the data sets are cleared by deleting the samples

having missing entries from the database. If possible, some missing values are

completed by using previous information of the subject.

4.4.2. Feature Reduction and Selection

The first and second sets of data (A and B) is combined and used for constructing

decision tree. Since the nature of the decision tree algorithms is based on the

selecting best feature and best split point in a top down fashion, an additional

feature selection is not applied to this set.

However at the leaves of the tree attributes from set C is used. Combining all

these attributes into a complete dataset, 33 (attributes) x 51(time samples)

dimensional arrays for each subject are obtained. Figure 4.6 shows the flowchart

for feature reduction and selection processes used in this experiment. The changes

in dimension of the datasets are shown on the lines by “dim” term to clarify the

process.

76

Figure 4.6: Flowchart for feature reduction and selection processes

77

A basic reduction technique is applied before selection as in previous trials. The

dimensions of all attributes are reduced from 51 to 5, by taking means of 10

consecutive time samples. The MD is used as a selection criterion.

Binary class datasets are created for selection of most discriminative gait

attributes. The reason for binary case is due to the expert MLP’s able to

discriminate successive categories, as will be explained later. Since the number of

samples in each case is about 130, the dimensions of the datasets are fixed to 30

by including 6 of attributes. Table 4.10 displays the classes that datasets include

and the levels of selected gait attributes for those datasets.

Table 4.10: Levels of the selected gait attributes for each binary class case (P: Pelvic, F: Foot,

H: Hip, K: Knee)

Attribute number Classes

1 2 3 4 5 6

0-1 F H K K P F

1-2 K K K H K K

2-3 H H F H H K

1-3 K K K P K K

0-2 F K H F K P

In the next stage, these gait attributes are used for creating input vector of the

related expert MLP. Multidimensional input vectors are created by combining six

best features to train MLPs. The selected feature list is revised by the expert

physician and the similar features are removed to prevent including highly

correlated features in the datasets. Then 6 of 33 features are selected for

composing datasets for each expert MLP. Stages of selection process are

summarized in Table 4.11.

78

Table 4.11: Selection of gait attributes using MD values

Calculate Mahalanobis distance matrix between classes in

dataset: the distance matrix between the class means,

sphered using the average covariance matrix of the per-class

centered data.

For i=1 to #gait_attributes

• Select ith gait attribute as feature set

• Create a two class dataset

• Calculate covariance matrix P

• Using the below formula find the Mahalanobis distances of class means (x, y)

• Write distance value to array Dm

End

• Sort Dm

• According to number of time samples of gait attribute select first 1,2,5 or 10 best attributes

• Create a new dataset with selected gait attributes

Automatic selection of these attributes gives some valuable information about

progress of the OA. For example while knee related attributes seen more

frequently in dataset composed of classes 1 and 2, hip related ones seem more

discriminative for classes 2 and 3. This shows that as the grade of the illness

increase hip angles are affected more. This kind of information is valuable for

clinical decision making and training physicians.

79

4.4.3. Combining classifiers

It is difficult to combine these features due to their diversified units, e.g.,

continuous variables, binary values, and discrete labels. Therefore, the

combination of multiple classifiers is a good solution for a problem involving a

variety of features. It is also important for an M.D. to understand how and/or why

the classifier makes its decisions rather than black-box solutions. Hence, a

classifier that’s able to do that should be aimed.

In this study a Mixture of Decision tree classifiers and Multilayer Perceptrons that

are experts for different regions of the feature space are used for classifying four

levels (0 to3) of the knee OA. In that aspect, the algorithm is similar to the

‘mixture of experts (ME)’ approach in the literature [45]. ME algorithm is based

on the principle of’ ‘divide and conquer’ in which a large, hard to solve problem

is divided into many smaller, easier to solve ones [42]. ME is a tree-structured

architecture for supervised learning and further for classification with the

participation of the experts in the final decision making. The ME architecture has

been proposed for neural networks [42]. The experts are neural networks, which

are responsible for a part of the feature space. The selector uses the output of

another neural network, namely ‘the gating network’. If the input to the gating

network is called X, then the output can be defined as a set of coefficients p1(x),

….,pL(x) where pi(x) is interpreted as the probability that expert Di is the most

competent expert to label the particular input x [17].

In this study it is aimed to train a decision tree with another subset y of the feature

space instead of training a gating network with the same input x. The output of the

decision tree is again a set of probabilities which are served to expert neural

network at that leaf as prior probabilities of the classes. The neural networks at

each leaf are the experts for classifying the subjects of type falling in that leaf.

Different experts are created by running the feature selection algorithm at each

leaf and designing different structured neural networks according to these selected

features. So one of the networks is responsible for categorization of, for example,

first and second classes, the other is responsible for third and fourth classes.

80

Decision tree classifiers are widely used for building classifier ensembles [42, 56,

58]. Binary features and features with small number of discrete values are

especially useful for the purpose since the decision can be easily branched out.

Since distance is not easy to formulate when the objects are described by

categorical or mixed-type features, the decision trees are regarded as nonnumeric

methods for classification [42, 59]. Using decision trees for clinical medical

decision making problems [59] is a popular approach since they are cost effective,

easy to implement and they have descriptive quality, which makes them

advantageous over black-box approaches such as ANN. It is difficult to

incorporate a neural network model into a computer system in an explicit form

since inner formulation and calculations are not shown to the user. In contrast,

once a decision tree model has been built, it can be converted to if…then…else

statements that can be implemented easily in most computer languages without

requiring an additional effort [59, 60].

The four grades of OA are defined by Kellgren grade which is based on the

radiographic assessment of the joint space narrowing. The accuracy of the

proposed system will be safeguarded by using all feature sets A, B, C above with

Kellgren grade-labeled subjects.

The learning and classification processes consist of two stages. In the first stage a

decision tree is trained by using data set A and B. In the second stage, the samples

falling at each leaf is analyzed for feature selection and an expert MLP is trained

by composed datasets using attributes from set C to classify the data into one of

the two categories 0-1, 1-2 etc. Figure 4.7 shows the steps of combining and

training of classifiers with a flowchart.

81

Figure 4.7: Flowchart for classifier combining

82

4.4.4. Training and testing

The training of our combined algorithm has two divisions, decision tree training

and training of MLPs, as shown in Figure 4.7. Decision tree training is different

than training of many other classifiers for which a topology is created first and

then training data is presented. Decision tree construction and training processes

can not be considered separately. Tree is constructed according to the training data

set by using some predefined criteria. These criteria like splitting and stopping

criteria are important for constructing best tree representing the training set and

avoiding overfitting.

The basic idea of tree learning is to choose a split among all the possible splits at

each node so that the resulting child nodes are pure enough. In our algorithm, only

univariate splits are considered. That is, each split depends on the value of only

one feature variable. All possible splits consist of possible splits of each feature. If

X is a nominal categorical feature of I categories, there are 2I-1 - 1 possible splits

for it. If X is an ordinal categorical or continuous feature with K different values,

there are K - 1 different splits on X. A classification tree is grown starting from the

root node by repeatedly using the following steps on each node.

1. Find each feature’s best split: For each feature, sort its values from the

smallest to the largest. For the sorted feature, go through each value from

top to examine each candidate split point (call it v, if x <= v, the case goes

to the left child node, otherwise, goes to the right.) to determine the best.

The best split point is the one that maximize the splitting criterion when

the node is split according to it.

2. Find the node’s best split: Among the best splits found in step 1, choose

the one that maximizes the splitting criterion.

3. Split the node: Split the node by using its best split found in step 2 if the

stopping rules are not satisfied.

83

At node, the best split is chosen to maximize a splitting criterion. This splitting

criterion may be Gini impurity, misclassification impurity etc. For classification

trees the impurity is defined with the Gini index of diversity [42].

Stopping and pruning criteria are also important for tree construction. Stopping

rules control if the tree growing process should be stopped or not. In this study the

number of samples in a node is restricted to be at least 10 so the node is not split

any more.

Pruning step of the tree construction is done by considering structure of our

combination. Means, we applied a method to prune the tree that each leaf has

samples from two classes. In next stage of the combination, MLPs responsible for

discriminating these two classes are replaced to the related leaf. This approach

changed the dimension of the problem from a recognizer for four-categories to

three recognizers with 2 categories, using expert neural networks for

discriminating neighbor classes.

The basic classifiers structure used in the leaves of the decision tree are MLPs.

Three- layered (one hidden layer) MLPs are trained by different input vectors.

These input vectors are trained by automatically selected gait attributes, different

for each leaf as mentioned above. So, they are assumed to be experts in the region

of the binary decision of the category.

The trained MLPs are placed at the leaves of the trained decision tree by class

matching. Figure 4.8 summarizes the proposed combination in a simplified form.

Where;

• Y = y1, y2, ... yn is the union of set A and B above

• T = t1, t2, ... tn is the set of corresponding threshold values for above,

used for composing tree.

• X= x1, x2, … xm is the set of datasets composed by selected attributes of

set C and presented to the expert networks

84

Figure 4.8: Proposed example combination in a simplified symbolic representation.

Implementation of the algorithm and analysis of the results are explained in detail

in next chapter of the report.

Testing of the MLPs is done by crossvalidation method. By this method, the set is

randomly permutated and divided in N (almost) equally sized parts. The classifier

is trained on N-1 parts and the remaining part is used for testing. This is rotated

over all parts. Average error rate and standard deviation is returned as the result.

In this experiment tenfold crossvaldiation is used a short pseudo code is given in

Table 4.12.

Table 4.12: Tenfold Crossvalidaton testing

For i=1 to iteration count

• Separate 1/10 of the samples for testing randomly

• Train the related MLP with rest of the samples

• Test the trained MLP with separated test samples

• Record the error to an array call Err End

• Produce the mean of values of Err and standard deviations as the result of the testing process

x1 x2 x3 xm

yi<ti

yj<tj yk<tk

Expert MLP

Expert MLP

Expert MLP

Expert MLP

85

CHAPTER 5

IMPLEMENTATION AND RESULTS

5.1. INTRODUCTION

This chapter is about the implementation of the classification algorithms and

results. As explained in previous chapter the classification algorithm is created by

combining a decision tree with a number of MLPs. First a decision tree is created

by using symptoms and history of the patients and time-distance parameters

together. Then MLPs are trained with different feature sets of gait data to make

them experts for discriminating different grades of the illness. Finally, trained

MLPs are replaced at the leaves of the tree and the algorithm is tested by unseen

data.

5.2. CREATING THE DECISION TREE

A data set having about 40 subjects from each category (0, 1, 2, 3) is composed

for training the decision tree. A tree is fitted to the composed dataset by using

treefit property of the MATLAB Statistical Toolbox. Decision tree is constructed

in top-down fashion with binary splits, where each node checks a numerical value

of a single feature. A termination criterion at a node could be that all objects be

labeled as belonging to the same category. Unfortunately, this is an ideal situation

where the number of samples and features should represent the problem perfectly,

86

which is not valid for the data at hand. Here it is aimed to continue until samples

from 2 categories are left. The steps of our tree construction algorithm are shown

in Table 5.1:

Table 5.1: Tree construction algorithm

1. Assign all objects to root node.

2. Split each feature at all its possible split points.

3. For each split point, split the parent node into two

child nodes by separating the samples with values

lower and higher than the split point for the

considered feature.

4. Select the feature and split point with the highest

reduction of impurity.

5. Perform the split of the parent node into the two

child nodes according to the selected split point.

6. Repeat steps 2–5, using each node as a new parent

node, until the tree has maximum size.

7. Prune the tree back using cross-validation to select

the optimal sized tree.

Since we are constructing a classification tree, Gini index impurity is used at step

4 of the algorithm. Consider a c-class problem with Ω = w1, w2, ..., wc. Let Pj

be the probability for class wj at a node t. Gini impurity is defined as

∑=

−=c

j

jPti1

21)( (Equation 5.1)

87

For the most pure case i(t) = 0. The highest impurity in the case of uniform

distribution is i(t) = (c-1)/c.

A set of possible stopping criteria are explained in Section 3.2 of the report. In

this implementation the method of “setting a threshold value for the number of

samples at a node” is used. The number of samples in each node is limited to 10.

An example tree before pruning is shown in Figure 5.1.

88

Figure 5.1: An example to created decision tree (no pruning)

89

Pruning is also applied when composing the decision tree. As explained in section

3.2, pruning is a significant step for decision tree construction to avoid overfitting

of the tree to training data. The tree is pruned based on an optimal pruning scheme

that first prunes branches giving less improvement in error cost. To determine the

best size of the tree, it is tested by crossvalidation approach as shown in Table 5.2:

Table 5.2: Pruning algorithm

1. Partition training data in “training” and “validation”

sets.

2. Build a complete tree from the “training” data.

3. Until accuracy on validation set decreases do:

a. For each non-leaf node, N, in the tree do:

b. Temporarily prune the subtree below N and

replace it with a leaf labeled with the current

majority class at that node.

c. Measure and record the accuracy of the pruned

tree on the validation set.

4. Permanently prune the node that results in the

greatest increase in accuracy on the validation set.

This algorithm pools the information from all subsamples to compute the cost for

the whole samples. Applying this method to our tree shown in Figure 5.1 we got a

graph showing the cost versus number of final nodes. The cost value in this graph

represents the misclassification rate for classification trees. The pruning of the tree

is done considering the optimum number of terminal nodes shown in Figure 5.2.

90

Figure 5.2: Cost (classification error) versus tree size

Cost graph hints that the number of terminal nodes (leaves) should be chosen

around 2-6 for minimizing the classification error. The tree shown in Figure 5.1 is

pruned 3 times by cost reduction algorithm, and once manually. The aim of

manual pruning is to leave samples from two categories at each leaf.

Unfortunately the desired situation of leaving only two categories at each leaf is

not satisfied fully. Those small numbers of samples at a leaf after pruning will be

ignored by the MLPs.

5.3. FEATURE REDUCTION AND SELECTION PROCESSES

As previously explained gait attributes are represented by 51 time samples.

Consecutive time samples are averaged before training, so each is represented by

five time samples as shown in Figure 5.3. In a previous trial two different feature

reduction techniques are compared first by the MD criterion and then by

performances of the classifiers (section 5.1). It is observed that datasets created by

FFT techniques produced worse results than averaged datasets in MD calculations

and also in the classification process. This study showed that number of time

91

samples and number of features should be optimized for better accuracies. In

Figure 5.4 graphs derived from this study are shown; dimension of the datasets are

represented by (#of_time_samples) x (#of_gait_attributes). Also, the optimum

ratio of the number of samples and feature size was tried to reach by limiting the

number of selected gait attributes to six.

Figure 5.3: Averaging consecutive time samples for feature reduction

MD based feature selection was implemented after reduction in time samples. The

automated feature selection method is compared with the manual feature selection

done by expert physician in a previous trial (section 5.3). A high match is

recognized between the automated selected features on the basis of the MD

criterion and the ones suggested by the gait analysis expert. This result

encouraged us for using MD criterion in further stages of the study.

92

0

0,02

0,04

0,06

0,08

0,1

0,12

1x33 2x25 5x10 10x5 25x2 51x1

dimension of the dataset

misclassification rate

0,085

0,09

0,095

0,1

0,105

0,11

0,115

0,12

1x33 5x10 10x5 25x2

dimension of the dataset

misclassification rate

Figure 5.4: Dataset dimension versus misclassification rate (a. averaged datasets, b. FFT applied datasets)

As a result of selection process the features shown in Table 5.3 are obtained.

Different attributes are effective at different stages of OA, as seen in the table. The

parts of the body that the gait attributes are related are important for deriving

information about progression of the knee OA.

Grades 0-1: Most discriminative gait attributes for classification of these grades

are from different parts of the body. This can be commented as in early grades of

the disease a severe deformation in knee joint does not exist.

Grades 1-2: Five of the selected gait attributes for discrimination of these classes

are related to knee joint. This proves that if the grade of the disease progress from

1 to 2, the knee joint is affected seriously. One hip related feature may be the

indicator of the early deformation of this joint.

93

Grades 2-3: The results of the selection process for these grades are really

remarkable. Since the number of hip related gait attributes is higher than the knee

related ones for these grade levels, it can be concluded that at advanced levels of

the disease different parts of the body other than knee are also affected. Medically,

it would be incorrect to say that the same disease is also seen in other joints of the

body in high grades but it started to affect the other joints. It is known that most

patients living with the same disease along time may change their postures to

compensate some bad effects of the disease like pain [20, 26, 29, 30]. That may be

why patients with high grade of OA have abnormal hip joint patterns.

Grades 0-2 and grades 1-3: Selected attributes for these grades are not used for

classification purpose but added here to comment about the progress of the

disease. It is seen that most of the attributes for grades 0-2 are common with the

ones for grades 0-1, which shows equal deformation in different parts of the body.

Similarly the attribute for grades 1-3 are common with the ones for grades 1-3

which shows deformation of the knee joint more than the others.

Table 5.3: Selected gait attributes for each two-class case (P: Pelvic, F: Foot, H: Hip, K: Knee, Flex: Flexion, M: Moment, Tot: Total, Dor: Dosrflexion, Rot: Rotation, Val: Valgus,

Obliq: Obliquity, Adb: Abduction)

Classes Selected Gait Attributes

0-1 F.M.Dor. H.Flex K.M.Flex K.Flex P.Tilt F.Rot

1-2 K.Flex K.M.Flex K.P.Flex H.P.Tot K.P.Tot K.Val

2-3 H.P.Abd H.Flex A.P.Dor H.Rot H.P.Flex K.Val

1-3 K.P.Flex K.P.Tot K.Flex P.Obliq K.M.Rot K.Val

0-2 F.M.Dor K.Flex H.Flex F.Dor K.Rot P.Tilt

94

5.4. IMPLEMENTATION OF MLPS

A set of classifiers are compared by same dimensional datasets. It is concluded

that nonlinear classifiers performed quite well and better than the linear ones.

Backpropagation Neural Network (bpxnc) and Radial Basis Support Vector

Machine (rbsvc) classifiers produced best generalization accuracy by almost all

datasets. Comparing learning rate of the classifiers it was concluded that more

data is needed to increase the performances of the linear classifiers. These results

formed the direction of the study towards using MLPs in further stages.

In second trial of the study combining classifiers approaches are investigated.

Different combination schemas for combining a group of MLPs are experimented.

MLPs are used to classify the subjects as healthy or sick, using temporal changes

of knee joint angle and time-distance parameters as features. Two different

combination methods are tried. In the first experiment five MLPs are trained by

different subsets of the feature space, and in second one three MLPs are trained by

entire feature set. Then the outputs are combined by sum, majority vote and max

rules to produce final class labels. These two experiments show that using entire

data set produces more accurate results than using decomposed data sets, but

complexity becomes an important drawback. However, when a proper combining

rule is applied to decomposed sets, results are more accurate than entire set. So,

for design of the final classification algorithm expert MLPs are trained by

different subsets of the feature set.

Almost 60 subjects from each category are used for composing datasets for MLP

training. Three datasets are created by the selected gait attributes for training three

MLPs. These MLPs are responsible for discrimination of classes 0-1, 1-2 and 2-3

respectively. These MLPs are trained by bpxnc property of the PRTools [27]. This

function creates a feedforward neural network and uses Backpropagation

algorithm for training. MLPs have three-layer structures with binary outputs. They

are tested by crossvalidation approach and an average error rate for each is

gathered as shown in Table 5.4.

95

Table 5.4: Classification errors of MLPs

MLP Classes (grades) Classification error

MLP1 0-1 % 11

MLP2 1-2 % 16

MLP3 2-3 % 21

These MLPs are also tested by receiver operating characteristic (ROC) curve,

which are shown in Figure 4 where x and y axis represent the error of the first and

second classes respectively. ROC curve is a graphical plot of the sensitivity vs. (1

- specificity) for a binary classifier system as its discrimination threshold is

varied, where;

negativesfalseofnumberpositivestrueofnumber

positivestrueofnumberysensitivit

______

___

+=

positivesfalseofnumbernegativestrueofnumber

negativestrueofnumberyspecificit

______

___

+=

The ROC can also be represented by plotting the fraction of false negatives vs. the

fraction of false positives as in Figure 5.5. For MLPs setting a threshold values for

posterior probability values determine a point on the ROC curve. Plotting these

points for each possible threshold value creates a curve.

96

Figure 5.5: ROC curves for MLPs

Error rates and ROC curves of MLPs show that discrimination of some classes are

difficult than the others. In ROC curves, the smaller area under curve (AUC)

shows better classifier. In graph we see that the AUC of MLP discriminating

classes 0-1 is the smallest and the one discriminating classes 2-3 is the greatest.

Then it can be commented that, as the grade of the illness increase the

discrimination power of the gait patterns decrease. Therefore, for discrimination

of these high grades of the disease some more details about these grades may be

added to the classification algorithm. For instance, ignored gait attributes may

also be contributed to the classification process if the number of samples

increases.

5.5. RESULTS

In the second stage of the classification process, expert MLPs are placed at the

leaves of the tree. Figure 5.6 shows an example of the composed tree pruned to

level 3 end corresponding MLPs at leaves of it.

97

Figure 5.6: Composed decision tree by prune level 3

Table 5.5 shows the number of samples at each leaf of the tree. The signed ones

represent the samples at unexpected leaf; means the MLP at that leaf is not

responsible for detection of that class. For example 2 subjects from class 3 placed

at leaf2, but MLP2 placed at that leaf is responsible for detection of classes 1-2.

Summing up the samples at unexpected leaves we got a %95 training error for

construction of decision tree.

98

Table 5.5: Distribution of samples in decision tree

Leaves: L1 L2 L3 L4 L5 L6

Used MLP:

MLP1 (0-1)

MLP2 (1-2)

MLP2 (1-2)

MLP3 (2-3)

MLP2 (1-2)

MLP3 (2-3)

Total Error

Class-0 40 0 0 0 0 0 0

Class-1 7 11 12 2 7 1 3

Class-2 0 14 8 11 2 5 0

Class-3 0 2 2 13 1 22 5

Total 47 27 22 26 10 28 8

To calculate the overall error rate of the combination, MLP errors and DT error

are used in a formulation. Following code segments summarizes the steps of

calculating error rates for detection of each level of the OA.

For level = 0 to 3

For leaf = 1 to #leaves

[ ] leafinMLPofrateerrorlevelfromsamples

leafinlevelfromsampleslevelError _____

__

____×=

∑∑

End

End

The error rate of the decision tree and MLPs are combined in a different fashion

by this formulation. The tree is constructed by equal number of samples (which is

99

40) from each level of the illness to objectively calculate the error rates.

According to above formulation, if a sample is sent to wrong leaf of the tree,

means there is no MLP to classify it, then the error rate is taken as 1 in that leaf.

Confusion matrices created by training and test data as shown in Table 5.6 and

Table 5.7 give more detailed information about success rate of the combination.

Table 5.6: Confusion matrix for combination (training data)

Estimated Classes

0 1 2 3 total error rate

0 39 1 0 0 40 0,025

1 1 34 2 3 40 0,15

2 0 1 37 2 40 0,075

Actual classes

3 0 5 2 33 40 0,175

total 40 41 41 38 160 0,10625

For creation of Table 5.6 data used for training of the combination is presented to

it and the numbers of misclassified samples are detected and almost %90

classification rate is achieved. Since the MLPs were trained with the same data the

contribution of MLP errors on this total error is very small. The reason for most of

these misclassified subjects is that they have been assigned to wrong expert MLP

in decision tree. This wrong assignment is most probable because of pain level

which is a subjective feature determined by subject himself. For example, a

subject from grade 1 can determine his pain level as 10 (max value for pain level)

while the other from grade 3 can say 3. These kind of subjective features are not

preferred in classification processes but experts and medical studies in literature

show that pain level is one of the important indicators of the selected illness, so

added to datasets here.

100

Table 5.7: Confusion matrix for combination (test data)

Estimated Classes


0 18 2 0 0 20 0,1

1 1 15 2 2 20 0,25

2 0 1 17 2 20 0,15

Actual Classes

3 0 2 4 14 20 0,3

total 19 20 23 18 80 0,2

Then the algorithm is tested by an unseen dataset composed of 20 samples and

classification rate of %80 is achieved. Since the system is tested by unseen data

the generalization rate reduced from %90 to %80. Because this time the errors

from MLPs are also effective in error calculation.

Finally, for comparing the success of our combination schema with a single

classifier a four-class neural network is trained. A three layered MLP, call it

MLP4, is created having 50 units in input, 10 units in hidden layer and 2 outputs

to discriminate four classes. The same feature reduction and selection processes

are applied to this new dataset, but the number of inputs is increased to 50.

Crossvalidation testing is applied during training. The estimated labels are

gathered as an output of crossvalidation algorithm. Table 5.8 shows the confusion

matrix for MLP4.

Table 5.8: Confusion matrix for MLP4

Estimated classes


0 27 7 5 1 40 0,325

1 3 24 7 6 40 0,4

2 3 8 21 8 40 0,475

Actual classes

3 3 5 12 20 40 0,5

Total 36 44 45 35 160 0,425

101

The classification rate of the MLP4 which is about %58, proved us that using

different expert for different part of the feature space and then combining them

produced reasonable better result than using a single multi-class classifier.

5.6. CLASSIFICATION EXAMPLES

In medical applications, physicians want to see some reasons for class

assignments rather than just learning the results that software produced. The

proposed combination is designed to produce class labels of the new subjects and

the reasons for this assignment. Table 5.9 shows examples of reasoning procedure

which is created by tracking the nodes of the tree in Figure 5.6 and checking

related gait attributes. The physician is able to see which controls are done to

subject for detecting grade of his disease by using this table. The affected gait

attribute field decreases the number of graphs from 33 to 6 that physician may

need to analyze. These are important for treatment planning and supplying

immediate feedback to the subject.

The misclassified examples are also added to the table to be able to discuss

reasons for wrong classification. The reason for most of these misclassified

subjects is that they have been assigned to wrong expert MLP in decision tree.

This wrong assignment is most probable because of pain level which is a

subjective feature determined by subject himself. For example, a subject from

grade 1 can determine his pain level as 10 (max value for pain level) while the

other from grade 3 can say 3. These kind of subjective features are not preferred

in classification processes but experts and medical studies in literature show that

pain level is one of the important indicators of the selected illness, so added to

datasets here.

102

Table 5.9: Examples of the test set classification

103

5.7. OVERALL ANALYSIS OF THE RESULTS

• Comparing accuracy rate of our algorithm with the manual grading done

by expert physicians it can be summarized that; the physicians can detect

the knee OA by evaluating gait data only. Physicians use radiographic

imaging techniques to grade severity of knee OA and use Kellgren-

Lawrence score as gold standard. There is no study about using gait data

for grading of the OA. This study represents a new approach for

estimating Kellgren-Lawrence grades of the subjects by using only gait

data with an accuracy rate of 80% which is an acceptable rate to use it as a

clinical test.

• The causes and compensatory effects of the knee OA has been searched by

some studies before [20, 26, 29]. They concluded that patients with OA of

the knee joint often adapt a gait for alleviating pain, so the motion of the

other joints may be affected [26]. But it is not known if gait adaptation is

mainly related to the severity of the disease [26]. Our data analysis process

indicated relations of severity of the OA and the joints affected by gait

adaptations. We proved the hypothesis of [26] that reduced motion of the

knee joint would be compensated by an increased motion of the hip joint.

• The classification success of the implemented combining classifier is

proved by comparing the generalization accuracy of it with a single

uncombined MLP. It can be concluded that our algorithm performed

significantly better than it and supplied some additional advantages like

reasoning etc. But, there are still some drawbacks about detection of high

grades of the disease. Actually, as the grade of the disease increase,

discriminating it from nearest grade becomes more difficult. Most

probable reason for this is patients’ progressive ability to compensate their

gaits by changing gait pattern of some joints. That is why hip joint

waveforms selected as more discriminative features than knee joint ones.

104

• If we discuss the misclassified examples in all levels of the disease, it is

seen that subjective features like pain contributes to misclassification rate

most. Since the high correlation between the severity of the OA of the

knee and pain level is detected by many studies [20, 26], it is included in

classification process.

105

CHAPTER 6

OAGAIT DECISION SUPPORT SYSTEM

6.1. CLINICAL DECISION SUPPORT SYSTEMS

Clinical (or diagnostic) decision support systems (CDSS) can be defined as

interactive computer programs assisting physicians and other health professionals

with decision making tasks [72].

The basic components of a CDSS include a medical knowledge and logical rules

derived from experts. There are many computer applications designed to be a

CDSS. Programs that perform database search or check drug interactions support

decisions, but usually they are not called CDSS. In [73] a CDSS is defined as a

program that supports a reasoning task, implemented behind the user interfaces

and based on the clinical data. For example, a program that takes the laboratory

results as inputs and generates a list of possible diseases is recognized as a clinical

diagnostic decision support system (CDDSS). General purpose programs

accepting clinical findings and generating diagnostic results are typical CDDSSs.

These programs use numerical, logical or artificial intelligence techniques to

convert clinical data to the information that a physician might use for diagnostic

reasoning.

The use of artificial intelligence in medicine started in the early 1970's and

produced a number of experimental systems [73]. INTERNIST I was one of the

106

first CDSSs, designed to support diagnosis. It was a rule-based expert system

designed at the University of Pittsburgh in 1974 for the diagnosis of complex

problems in general internal medicine. It uses a tree-structured database that links

diseases with symptoms. Most valuable product of the system was its medical

knowledge base which was used as a basis for successor systems. [74]

MYCIN was another rule-based expert system designed to diagnose and

recommend treatment for certain blood infections. Clinical knowledge in it is

represented as a set of IF-THEN rules. It was a goal-directed system, using a basic

backward chaining reasoning strategy. It was developed in Stanford University.

The EMYCIN (Essential MYCIN) expert system shell, employing MYCIN's

control structures was developed at Stanford in 1980. This domain-independent

framework was used to build diagnostic rule-based expert systems [73].

PIP, the Present Illness Program, was a system built by MIT and Tufts-New

England Medical Center in the 1970s. They gathered data and generated

hypotheses about disease processes in patients with renal disease [73].

The review studies to evaluate the effects of computer based CDSSs on physician

performance and patient outcomes have concluded that CDSSs developed in 70s

as summarized above can improve clinical performance for drug dosing,

preventing care and other aspects of medical care, but not too convincingly for

diagnosis. On the other hand, legal issues such as who would be responsible as a

result of misdiagnosis also prevented these systems to be commercially accepted.

In [73] factors that affect the acceptance and use of CDSSs in clinical practice are

defined as follows.

• Cost

• Degree of user acceptance prior to and after installation

• Ease of use

• Interoperability: Integration with existing systems (hardware, other

devices) and existing software programs (integration with patient record

and/or any relevant clinical terminologies)

107

• Ease of integration within organizational context and routine

• Legal and ethical issues

• User interface: design, structure, number of forms

• Style, manner of presentation of advice/ recommendations/ results to user

• Provision of evidence justifying advice and/or recommendations

• Involvement of local users during development phase

Today, positive aspects of medical experts about computer usage in clinical

applications, need of rapid access to recent information and need for time saving

increases the number of commercialized CDSSs. Other potential benefits of using

electronic CDSSs in clinical practice are grouped in three broad categories [76]:

1) Improved patient safety

a) Reducing medication errors

b) Improving medication and test ordering

2) Improved quality of care

a) Increasing clinicians time for patient care

b) Providing immediate feedback to the patient

c) Reducing variations in quality of care

d) Increasing application of clinical pathways and guidelines

e) Facilitating the use of up-to-date clinical evidence

f) Improving the clinical documentation and patient satisfaction

3) Improved efficiency in health care delivery

a) Reducing the cost by faster processing after initial capital cost

b) Reducing the test duplications

108

DXplain, QMR, ERA and ATHENA are good examples to commercialized

successful systems originating after 80s [73, 75, 77]. DXplain uses a set of clinical

findings (signs, symptoms, and laboratory data) to produce a ranked list of

diagnoses which might explain the clinical manifestations. It provides justification

for why each of these diseases might be considered, suggests what further clinical

information would be useful to collect for each disease, and lists what clinical

manifestations, if any, would be unusual or atypical for each of the specific

diseases [77]. DXplain includes 2,200 diseases and 5,000 symptoms in its

knowledge base. It is developed by Laboratory of Computer Science,

Massachusetts General Hospital, and Harvard Medical School.

QMR has a knowledge base composed of diseases, diagnoses, findings, disease

associations and lab information. It includes information about almost 700

diseases and more than 5,000 symptoms, signs, and labs. It was designed for 3

types of use: as an electronic textbook, as an intermediate level spreadsheet for the

combination and exploration of simple diagnostic concepts and as an expert

consultant program [75]. It is developed by the University of Pittsburgh and First

Databank in California in 1980.

The ATHENA DSS implements guidelines for hypertension, encourages blood

pressure control and recommends guideline-concordant choice of drug therapy. It

is designed to allow clinical experts to customize the knowledge base to

incorporate new evidence or to reflect local interpretations. It has a independent

database so can be integrated into a variety of electronic medical record systems.

6.2. PROPERTIES OF A GOOD CDSS

The implementation of effective CDSS is a challenging task that should involve

interactions between technologies and organizations. There are no obvious

solutions to guarantee success or to avoid failure in this complex process. There

are many factors to reduce errors or to improve health processes, so measuring the

effectiveness of decision support systems is difficult. So, evaluation studies of

CDSSs have typically aimed to measure the impact of a system on a limited part

109

of the process [73]. Evaluated systems are mostly designed for providing support

for diagnosis, disease management, drug management or preventive interventions.

Other evaluation topics have included the impact of a system on the quality of

decision making, impact on clinical actions, usability, integration with workflow,

the quality of the clinical advice offered. The cost effectiveness of CDSSs and

their ability to help improve clinical outcomes have been infrequently evaluated.

In a review of computer based systems, most (66%) significantly improved

clinical practice, but 34% did not [78]. There is little scientific evidence to

explain why systems succeed or fail. Some researchers have tried to identify the

system features most important for improving clinical practice by relying on

opinion of a limited number of experts. In [79] the authors systematically

reviewed the literature published up to 2003 to identify features of CDSSs critical

for improving clinical practice. Table 6.1 shows the 15 features of CDSSs derived

from this study.

Table 6.1: Features of a good CDSS [79]

General system features

Integration with charting or order entry system to support workflow integration

Use of a computer to generate the decision support

Clinician-system interaction features

Automatic provision of decision support as part of clinician workflow

No need for additional clinician data entry

Request documentation of the reason for not following CDSS recommendations

Provision of decision support at time and location of decision making

Recommendations executed by noting agreement

110

Table 6.1 (cont.)

Communication content features

Provision of a recommendation, not just an assessment

Promotion of action rather than inaction

Justification of decision support via provision of reasoning

Justification of decision support via provision of research evidence

Auxiliary features

Local user involvement in development process

Provision of decision support results to patients as well as providers

CDSS accompanied by periodic performance feedback

CDSS accompanied by conventional education

6.3. FEATURES OF OAGAIT

OAGAIT is designed as a CDSS for grading of the knee OA. On the other hand it

has significant differences from commercialized CDSSs explained in examples. It

has a small knowledge base and database about knee OA but not for all gait

disorders. Its implementation is done by a small amount of money as a part of

academic research project.

A grading method is implemented by a combined pattern recognition system and

embedded into the OAGAIT system. User friendly interfaces are designed for

different modules. The main objective of the OAGAIT is to help physicians for

grading of an already diagnosed disease to ease the treatment planning. Moreover,

some other functions are designed to guide gait analysis process. As discussed in

previous chapters, various structured data such as personal information, time-

distance parameters are collected in the gait laboratory for each subject. Some of

these are stored in paper files. So it used to be difficult to access and combine data

111

for processing. A complete database is integrated to OAGAIT system to keep it

together and to access easily when needed.

In our gait laboratory, the commercial software (VICON) used allows gait data to

be saved as MS Excel file. These files show the time-distance parameters of the

gait and temporal changes of the joint angles and their graphs. An excel file of a

patient report is shown in Appendix A as an example. Electronic interfaces are

created for entering this information to the database. When the database is opened

the first form to be filled by the user is the patient record form, which is shown in

Figure 6.1.

Figure 6.1: Patient recording form

Besides these, patient tracking forms are created to automate some often used

queries. For example Figure 6.2 shows the query results for a patient’s time

distance parameters and personal information. These types of query forms are

important for database user to learn the number and the date of the experiments of

112

the selected subject. Moreover, the physician can compare the changes in the gait

parameters by selecting the different experiments from the list.

Figure 6.2: Patient tracking form

Another form supplied by the database system is WOMAC entry form. This is a

validated test designed specifically for the assessment of lower extremity pain and

function in OA of the knee or hip [27, 29]. Figure 6.3 shows the WOMAC form

of the OAGAIT system.

113

Figure 6.3: WOMAC form

These forms and database tables are used for the main purpose of the system;

grading of the OA. An explanatory and easy to use grading screen is designed for

grading results of the system (Figure 6.4). The grading algorithm will be

explained in more detail in further sections of the report.

114

Figure 6.4: Grading screen

6.4. OAGAIT DATABASE

A database with five main tables is used for OAGAIT system. Figure 6.5 shows

these and their relations. The relations of the table are constructed by the used

database system automatically. These relations are important for consistency of

the data, and show the database's structure of how this data is arranged.

The ID entries of the tables represent the unique numbers assigned for the subjects

and construct the relations of all tables. For the tables containing information

115

about gait experiment the relation is also provided by experiment ID (expID)

entry.

Figure 6.5: ER diagram for OAGAIT database

Some of these tables are created for provision of integration of VICON system

and OAGAIT system, and some are designed for storing paper based data on a

computer based system. Detailed explanation of the tables and entries are

presented in this section.

Table “gait_data”: This table is the main table storing temporal variables of the

gait and provides integration of VICON system and OAGAIT system. It is

accessed by both patient tracking and grading functions. Patient record function

writes data to the table.

116

Figure 6.6: Table: gait_data

Entries: It has 38 entries as shown in Figure 6.6, 33 of those are related to

temporal gait variables and the rest is about experiment that the subject is walked.

The ID entry of this table is read from VICON system which assigns a numeric ID

to each subject during first visit. Experiment data (expDate) keeps the date of the

gait and used for deriving “age” feature of the subjects with date of birth entry of

another table. Experiment ID (expID) is another system assigned number for each

gait trial of the subject. So the experiments done in different times are stored by a

unique number and allow the patient tracking property of OAGAIT. Eorder is an

automatically generated number to show order of the time sample points for

temporal variables of gait. Side entry is a single character value (L stands for left

and R stands for right) shows which knee of the patient is affected by OA.

117

Table “womac_scores”: This table is responsible for keeping answers of the

subjects to the WOMAC questionnaire. The WOMAC form of the database

allows subject ID selection and writes his/her answers to table.

Figure 6.7: Table: Womac_results

Entries: This table has 20 entries as shown in Figure 6.7, 17 of which are related

to WOMAC questionnaire answers of the subjects. ID and expID entries are

primary key of the table. The entries from a1 to c17 are numbers taking values 0,

1, 2, 3, 4 or 5 according to answers. The system calculates the WOMAC scores of

the subjects by summing up these values and writes the results to “score” of this

table.

Table “time_dist”: Data of this table are entered by automatic file reading

function of the OAGAIT. The parameters are read from the VICON system and

written to the table by patient record function. It is accessed by patient tracking

and grading functions of the system.

118

Figure 6.8: Table: time_dist

Entries: This table stores the time distance parameters of the gait as shown in

Figure 6.8. It has 11 entries 8 of which are time distance parameters and first three

are about subject and experiment information as in other tables.

Table “personal_info”: This table is created for storing personal information of

the subjects which were saved in paper files before. Figure 6.9 shows the entries

and data types of the table.

Figure 6.9: Table: personal_info

Entries: This table has seven entries most of which are not used for grading or

tracking functions. Only the date of birth (dbirth) entry is used for deriving “age”

features of the subjects.

Table “measure data”: This table stores the measurements of the subjects that

are taken just before the gait. These measurements are used for calculation of

119

time-distance parameters and temporal variables of the gait by VICON system.

Left and right side of the subjects are measured separately and the initial

characters (“r” or “l”) of the entries represent these sides as shown in Figure 6.10.

The measurements are done by a laboratory expert who writes these numbers to a

paper form. Then this information is saved to the OAGAIT database by patient

record function.

Figure 6.10: Table: measure_data

Entries: This table has 20 entries, 17 of which are about measurements of the

subjects. The height and weight entries are used for calculation of BMI features of

the subjects. acheLevel and firstMovAche entries are used as pain and stiffness

features, respectively. These features together with BMI feature are used for

grading function of the OAGAIT system.

6.5. DATA FLOW DIAGRAMS

A data flow diagram (DFD) is a graphical representation of the "flow" of data

through an information system. A DFD is mostly used for the visualization of data

processing. A first level DFD is called context-level DFD which shows the

interaction between the system and the outside entities. This context-level DFD is

then "exploded" to show more detail of the system.

120

Figure 6.11 shows level 1 DFD of the designed decision support system.

Integration of VICON Clinical Manager and OAGAIT is also shown here.

Figure 6.11: Level 1 DFD of the OAGAIT for data recording

121

After recording the gait data to OAGAIT database, two main screens can be used

for tracking the patients and grading their illness. DFD for patients tracking

process is shown in Figure 6.12.

Figure 6.12: DFD for patient tracking process

Actually patient tracking is a database query function which enables the

comparison of two different gait experiments of the patient taken in different

times. So, the physician can analyze the recovery of the illness.

122

Figure 6.13: DFD for patient grading process

The grading process is mainly composed of two stages as seen in Figure 6.13.

Since the classification algorithm is created by combining two different

classifiers, these stages represent the testing of them by new gait data. Design of

the classification algorithm is detailed in further sections.

6.6. EVALUATION OF OAGAIT AS A CDSS

A comparison table is created for evaluation of OAGAIT system as a CDSS. The

features of a good CDSS suggested by Kuwamoto [79] are searched for OAGAIT

system and the existing features are explained shortly as shown in Table 6.2. It

can be seen that almost all features are supplied by OAGAIT system.

123

Table 6.2: Features of OAGAIT compared to the ones suggested in [79]

Features of a good CDSS Features of OAGAIT

General system features

Integration with charting or order entry system

to support workflow integration

Integration of OAGAIT and VICON system

Use of a computer to generate the decision

support

Fully computerized decision support

Clinician-system interaction features

Automatic provision of decision support as

part of clinician workflow

When the subject’s gait data is entered to the

database the grading info is automatically

displayed on the screen.

No need for additional clinician data entry All needed data is entered to the database

before processing

Request documentation of the reason for not

following CDSS recommendations

There is a additional notes entry in all forms

Provision of decision support at time and

location of decision making

The grading results are shown on the screen

just after the walking of subject

Recommendations executed by noting

agreement

Not applicable

124

Table 6.2 (cont.)

Communication content features

Provision of a recommendation, not just an

assessment

OAGAIT supply reasoning for the

assessment to help creation of treatment plans

Promotion of action rather than inaction The system should advise something rather

than a blank screen. OAGAIT produces most

probable two classes as a result rather than

“not classified” message.

Justification of decision support via provision

of reasoning

OAGAIT shows assessment stages to support

reasoning

Justification of decision support via provision

of research evidence

The decision tree property of OAGAIT

supplies research evidences for provision of

OA

Auxiliary features

Local user involvement in development

process

An expert physician is included in the

development process as both an knowledge

expert and end user

Provision of decision support results to

patients as well as providers

Physician is responsible for delivering the

results to the patients

CDSS accompanied by periodic performance

feedback

Not applicable

CDSS accompanied by conventional

education

A short training is given to the physicians

and/or other laboratory staff

125

CHAPTER 7

CONCLUSION AND FUTURE DIRECTIONS

7.1. PROPERTIES OF OAGAIT SYSTEM

Within the scope of this study, a CDSS was implemented to help physicians for

grading and further analysis of the knee OA. Main function of OAGAIT is to

interpret gait data almost as close as an expert’s. This interpretation is done by

using expert knowledge on gait and other features in pattern recognition. Main

features of the implemented CDSS can be summarized as following:

• OAGAIT supports the function of radiographic films for grading of OA

and/or other diseases.

• OAGAIT is a fully computerized system, so incorrect decisions by experts

as a result of non-experienced interpretations is minimized.

• It provides a base for patient follow up in time, which helps physicians to

make more accurate treatment plans.

• It provides a graphical representation (the decision tree) of the grading

process which brings a description to the decision in addition to

classification.

126

• OAGAIT has a complete gait database integrated with the data collection

software VICON. This database combined all new and old gait data in an

easy access and portable environment. Moreover, this database is

convenient to use for further studies about other diseases or integration to

other software systems.

• It has easy-to-use user interfaces, so a short training is enough for the

physicians and/or other laboratory staff

• It has a short processing time; the grading results are shown on the screen

just after walking of the subject. So immediate feedback to the physician

and the patient is supplied.

7.2. EVALUATION OF THE GRADING ALGORITHM

Combining classifiers produced premising results for many areas in pattern

recognition. Since we deal with a multi-class problem in this study, expert

classifiers for different classes are combined for better generalization accuracy.

The implemented combination schema is expected to increase success rates of

similar medical problems.

Since the classification algorithm was developed using a method similar to spiral

development methodology, the result of one stage is important for design of the

next one. This means, feature reduction/selection and classification algorithms are

selected in a series of successive trials of increasing complexity. Each stage of the

algorithm design produced valuable information about further recognition of the

selected disease, helping treatment plans.

A grading algorithm is created by combining decision trees and MLPs by

considering the results of previous experiments. The symptoms and history

information of the subjects in addition to gait data are also included in the

combination. This data is used to train a decision tree, which gives an opportunity

of the reasoning of the results. The gait data is used to train 3 different MLPs with

binary classifications that are used at the leaves of the decision tree. The feature

127

selection processes prior to decision tree and MLP training give us information

about relations between the grade of the illness and the affected body parts.

Namely, while the subjects with low grade of the disease (grade 1 or 2) have

deformation in knee joint, the ones with high grade of the disease (grade 3) have

more deformation in hip joint. Although we analyze a knee disease, we see that

other parts of the body may be affected and so data from these parts should also

be included in the classification processes. Deriving this kind of hidden

information in data provides better clinical recognition of the illness while

contributing the classification accuracy.

Comparing the accuracy of the implemented classifier with a single multi-class

one, it can be concluded that combining a set of binary classifiers produce better

results than a single multi-class one. Since the classes are not easily

distinguishable creating different experts for different subsets of the feature sets

produce better results. But still classification accuracy of the expert MLPs may be

improved. Especially for detection of third grade of the illness some more detailed

analysis may be helpful.

As a final comment, the results and analysis of the classifier both for accuracy and

descriptiveness produced satisfactory results and aimed to be used in gait

laboratories. The combined decision tree-MLP approach is also expected to be

applicable to similar type of medical decision making processes, where both

disease characteristics and clinical measurements and tests are to be combined.

7.3. LIMITATIONS OF THE STUDY

In pattern recognition studies the curse of dimensionality is a significant reason

for poor generalization ability of classifiers. In practice it is often observed that

the added features may degrade the performance of a classifier if the number of

training samples that are used to design the classifier is small relative to the

number of features. This generalization is valid for our study, too. Even though

the amount of collected data far exceeds the data size used by other studies, it is

still not fully satisfactory. If the number of samples was arbitrarily large then

128

more features would be included for both creating decision tree and training

MLPs. Most probable, including more features would produce better

generalization accuracies.

As said before the implemented combination algorithm may be used for detection

and grading of similar type of diseases. Because of time and budget restrictions

data collection process limited to only one disease. The algorithm could not be

tested by any other set of data.

Another limitation of the study was about data collection process. In most studies

OA is scaled to five grades (0-4) according to Kellgren-Lawrence radiographic

method. Since most of the fourth grade patients are not able to walk in laboratory

and treated in bed in orthopedics department of the hospitals, to collect their gait

data is difficult. Therefore, in this study data from fourth grade of the OA is

ignored in classification process.

7.4. SUGGESTIONS FOR FUTURE WORK

OAGAIT is currently proposed for use in Ankara University Medicine Faculty

Gait Laboratory. It may be installed to other gait laboratories and tested by many

experts in the future. Testing results may be used for improvement of the system.

If data from other diseases like cerebral palsy (CP) can be added to the database, it

will be preferred by more laboratories in the future. If system is used widely, a

web based collective database system may be designed to help the data sharing

between laboratories. Fast increase in number of samples will lead to design

different classifiers and further analysis of the diseases. Moreover, large amounts

of data may allow the data mining studies by which hidden knowledge in medical

data may be discovered.

In addition to the detection and grading of the diseases, patient follow up property

which is significant for clinical decision making may be added to the system. To

achieve this, the subjects should have gait data collected in determined time

intervals. Therefore a subject should be called to the gait laboratory after some

periods of time like 6 months, 1 year etc. For the design of this function,

129

extrapolation methods may be tried by including “time” parameter in the feature

set.

Combining pattern classifiers is one of the recent, popular research areas of

machine learning. Lots of new combining methods and approaches are

implemented everyday. In this study we could try some of them with existing

amount of data. However, some other feature reduction/selection methods or

different combination models may also be tried to optimize the classification

accuracies. Also combination of pattern recognition algorithms and image

processing methods may be tried for evaluating gait analysis data and XR images

for grading.

130

REFERENCES

1. L. Lee, W. Grimson, “Gait analysis for recognition and classification”, in Proceedings of the IEEE Conference on Face and Gesture Recognition, 2002, pp. 148-155

2. R. Begg,y, J. Kamruzzaman, "A Comparison of Neural Networks and Support Vector Machines for Recognizing Young-Old Gait Patterns”, Proceeding of IEEE TENCON Conference, 2003, pp. 354-358

3. C. Y. Yam, M. S. Nixon, J. N. Carter, “Automated person recognition by walking and running via model-based approaches”. Pattern

Recognition, vol. 37, pp. 1057-1072 , 2004

4. M. S. Nixon, J. N. Carter. "Advances in Automatic Gait Recognition," in Proceedings of the Sixth IEEE International Conference on Automatic

Face and Gesture Recognition, 2004, pp. 39-144

5. S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, K. W. Bowyer, “The HumanID Gait Challenge Problem: Data Sets, Performance, and Analysis”, IEEE Transactions On Pattern Analysis And Machine

Intelligence, vol. 27, pp. 162-177, 2005

6. L. Wang, W. Hu, T. Tan, “A New Attempt to Gait-Based Human Identification,” in Proceedings of the International Conference on

Pattern Recognition, 2002, pp. 115-118

7. R. Collins, R. Gross, J. Shi, “Silhouette-Based Human Identification from Body Shape and Gait,” in Proceedings of the International

Conference on Automatic Face and Gesture Recognition, 2002, pp. 366-371

8. L. Herda, P. Fua, R. Plankers, R. Boulic, D. Thalmann “Skeleton-Based Motion Capture for Robust Reconstruction of Human Motion”, in Computer Animation , 2000, pp. 77-82

131

9. R. Urtasun, P. Fua, “3D Tracking for Gait Characterization and Recognition”, in Proceedings of the Sixth IEEE International Conference

on Automatic Face and Gesture Recognition, 2004, pp. 17-22

10. A. Kale, A. Sundaresan, A. N. Rajagopalan, N. P. Cuntoor, R. Chowdhury, V. Krüger, R. Chellappa, ”Identification of Humans Using Gait”, IEEE Transactions On Image Processing, vol. 13, pp.1163-1173, 2004

11. R. B. Davis, P. A. DeLuca, M. J. Romness, “Clinical Gait Analysis and Its Role in Treatment Decision-Making”, Medscape General Medicine vol. 1, 1999

12. K. R. Kaufman, “SECTION FOUR: Future Directions in Gait Analysis”, RRDS Gait Analysis in the Science of Rehabilitation, May 2005, http://www.vard.org/mono/gait/kaufman.htm,

13. R. Baker, “Gait Analysis Methods in Rehabilitation”, Journal of Neuro-

Engineering and Rehabilitation, vol. 3, pp. 1-10, 2006

14. C. L Vaughan, B. L. Davis, J. C. O'Conner, Dynamics of Human Gait. Champaign, IL: Human Kinetics, 1991

15. G. Yavuzer, “The use of computerized gait analyses in the assessment of Neuromusculoskeletal disorders”, FTR Bil Der - J PMR Sci, vol. 2, pp. 43-45, 2007

16. S. Salarian, H. Russmann, F. J. G. Vingerhoets, C. Dehollain, Y. Blanc, P. R. Burkhard, K. Aminian, “Gait Assessment in Parkinson’s Disease: Toward an Ambulatory System for Long-Term Monitoring”, IEEE

Transactions On Biomedical Engineering, vol.51, pp. 1434-1443, 2004

17. R. E. Cook, I. Schneider, M.E. Hazlewood, S.J. Hillman, J.E. Robb, “Gait analysis alters decision-making in cerebral palsy”. J Pediatr Orthop, 23-3: pp.292-295, 2003

18. Salazar, A.J. De Castro, O.C. Bravo, R.J., “Novel approach for spastic hemiplegia classification through the use of support vector machines”, Proceedings of the 26th Annual International Conference of the

Engineering in Medicine and Biology Society, 2004, pp. 446-469

19. F. Dobson, M. E. Morris, R. Baker, H. K. Graham, “Gait classification in children with cerebral palsy: A systematic review”, Gait and Posture,

vol. 25, pp. 140-52, 2007

20. H. Gök, S. Ergin, G. Yavuzer, ”Kinetic and kinematic characteristics of gait in patients with medial knee arthritis” , Acta Orthop Scand, vol. 73, pp. 647–652, 2002

132

21. JH Kellgren, JS Lawrence, ‘Radiological assessment of osteoarthritis’, Ann Rheum Dis, vol. 16, pp. 494-501, 1957

22. Osteoarthritis (OA): Bone, Joint, and Muscle Disorders: Merck Manual

Home Edition October 2007, http://www.merck.com/mmhe/sec05/ch066/ch066a.html

23. Osteoarhritis- Risk Factors, December 2006 http://adam.about.com/reports/000035_4.htm

24. K. J. Deluzio, J. L. Astephen, “Biomechanical features of gait waveform data associated with knee osteoarthritis: An application of principal component analysis”, Gait and Posture, vol. 25, pp. 86-93, 2007

25. Kaufman, K., Hughes, C., Morrey, B., Morrey, M., An, K., “Gait characteristics of patients with knee osteoarthritis”, Journal of

Biomechanics,, vol. 34, pp. 907–915, 2001

26. Z. Bejek, R. Paroczai, A. Illyesi, L. Kocsis, R. M. Kiss, “Gait Parameters Of Patients With Osteoarthritis Of The Knee Joint”, Physical

Education and Sport, vol. 4, pp. 9-16, 2006

27. F Petersson, T. Boegard, T. Saxne, A. J. Silman, B. Svensson, “Radiographic osteoarthritis of the knee classified by the Ahlbäck and Kellgren & Lawrence systems for the tibiofemoral joint in people aged 35–54 years with chronic knee pain”, Annals of the Rheumatic Diseases,

vol. 56, pp. 493–496, 1997

28. N. Samancı, C.Kaçar, M. Sayın, T. Tuncer, “Primer Diz Osteoartritinde Metabolik, Endokrin Ve Sosyo-Kültürel Risk Faktörleri Ve Radyolojik Bulgularla Đlişkisi”, Romatizma, vol.18, 2003

29. L. Sharma, S. Cahue, J. Song, K. Hayes, Y. C. Pai, D. Dunlop, “Physical Functioning Over Three Years in Knee Osteoarthritis” Role Of Psychosocial, Local Mechanical, And Neuromuscular Factors”, Arthritis & Rheumatism, vol. 48, pp 3359–3370, 2003

30. K. S. Al-Zahrani, A. M. O. Bakheit, “A Study of the Gait Characteristics of Patients with Chronic Osteoarthritis of the Knee”, Disability and Rehabilitation, vol. 24, pp. 275-280, 2002

31. T. Chau, “A review of analytical techniques for gait data, Part 1: Fuzzy, statistical and fractal methods,” Gait and Posture, vol. 13, pp. 49-66, 2001.

32. T. Chau, “A review of analytical techniques for gait data, Part 2: Neural network and wavelet methods”, Gait and Posture, vol. 13, pp. 102-120, 2001.

133

33. G. Barton, P. Lisboa, A. Lees, S. Attfield, “Gait quality assessment using self-organising artificial neural networks”, Gait and Posture, vol. 25, pp. 374-379, 2007

34. M. Kohle , D. Merkl , J. Kastner, “Clinical gait analysis by neural networks: issues and experiences”, in Proceedings of the 10th IEEE

Symposium on Computer-Based Medical Systems 1997, pp.138

35. W. Wu, F. Su, Y. Cheng, Y. Chou, “Potential of the Genetic Algorithm Neural Network in the Assessment of Gait Patterns in Ankle Arthrodesis”, Annals of Biomedical Engineering, vol. 29, pp. 83–91, 2001

36. J. G. Barton, A. Lees, “An application of neural networks for distinguishing gait patterns on the basis of hip-knee joint angle diagrams” Gait and Posture, vol. 5, pp. 28-33, 1997

37. M. Kohle , D. Merkl, “Analyzing human gait patterns for malfunction detection”, in Proceedings of the ACM symposium on Applied computing, 2000, pp. 41-45

38. N. Şen Köktaş, N. Yalabık, “A Neural Network Classifier for Gait Analysis”, in Proceeding of International Symposium on Health

Informatics and Bioinformatics, 2005, pp. 174-179

39. N. Sen Koktas, N. Yalabik, G. Yavuzer, “Ensemble Classifiers for Medical Diagnosis of Knee Osteoarthritis Using Gait Data”, in Proceeding of IEEE International Conference on Machine Learning and

Applications, 2006, pp. 225-230

40. R. Lafuente, JM. Belda, J. Sanchez-Lacuesta, C. Soler, J. Prat J. “Design and test of neural networks and statistical classifiers in computer aided movement analysis: a case study on gait analysis”, Clinical

Biomechanics, vol. 13, pp. 216–20, 1997

41. R. K. Begg, M. Palaniswami, B. Owen, “Support Vector Machines for Automated Gait Classification”, IEEE Transactions On Biomedical

Engineering, vol. 52, pp. 828-838, 2005

42. L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004

43. J. Kittler, M. Hatef, R. P. W. Duin, J. Matas, “On Combining Classifiers”, IEEE Transactions on Pattern Analysis and Machine


44. R. Ranawana, V. Palade, “Multi classifier systems: Review and Roadmap for developers”, International Journal of Hybrid Intelligent

Systems, vol. 3, pp. 35-61, 2006

134

45. M. I. Jordan, R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm”, Neural Computation, vol. 6, pp. 181-214, 1994

46. J. Mao, “A case study on bagging, boosting and basic ensembles of neural networks for OCR”, in Proceeding of IJCNN, 1998, pp.1828-1833

47. J. Kim, J. Ahn., S. Cho, “Ensemble competitive learning neural networks with reduced input dimension”, International Journal of

Neural Systems, vol. 6, pp 133-42, 1995

48. A. J. C. Sharkey, “Types of multinet system”, in Proceedings of the

Third International Workshop on Multiple Classifier Systems, 2002, pp. 108-117

49. Z. H. Zhou, J. Wu, and W. Tang, "Ensembling neural networks: many could be better than all", Artificial Intelligence, vol. 137, pp.239-263, 2002

50. L. K. Hansen, P. Salamon, “Neural Network Ensembles”, IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 993-1001, 1990

51. N. Ueda, “Optimal Linear Combination of Neural Networks for Improving Classification Performance”, IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 22, pp. 207-215, 2000

52. G. Fumera, F. Roli, “A Theoretical and Experimental Analysis of Linear Combiners for Multiple Classifier Systems”, IEEE Transactions

on Pattern Analysis and Machine Intelligence, vol.27, pp.942-956, 2005

53. T. K. Ho, J. J. Hull, S. N. Srihari, “Decision Combination in Multiple Classifier Systems”, IEEE Transactions on Pattern Analysis and Machine


54. R. E. Banfield, L. O. Hall, K. W. Bowyeri, D. Bhadoria, W. P. Kegelmeyer, S. Eschrich, “A Comparison of Ensemble Creation Techniques” , in Proceeding of the International Conference on Multiple

Classifier Systems, 2004, pp. 223-232

55. T.G. Dietterich. “Ensemble methods in machine learning”, in Proceeding of Multiple Classifier Systems, 2000, pp. 1-15

56. A. C Stasis, E. N Loukis, S. A Pavlopoulos, D Koutsouris, “A multiple decision trees architecture for medical diagnosis: The differentiation of opening snap, second heart sound split and third heart sound”, Computational Management Science, vol. 1, pp. 245-274, 2004

135

57. S. Armand, E. Watelain, M. Mercier, G. Lensel, F. Lepoutre, “Linking clinical measurements and kinematic gait patterns of toe-walking using fuzzy decision trees”, Gait and Posture, vol. 25, pp. 475-484, 2006

58. R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid”, in Proceeding of International Conference on

Knowledge Discovery and Data Mining, 1996, pp. 202-207

59. V. Podforelec, P. Kokol, “Evolutionary construction of medical decision trees”, in Proceedings of the 20th Annual International

Conference of the IEEE Engineering in Medicine and Biology Society, 1998, pp. 1202-1205

60. K. Tsujino, “Hybrid Knowledge Acqusition by Integrating Decision Trees and Neural Networks”, in Proceeding of IEEE International

Conference on Neural Networks, 1995, pp. 1379-1383

61. J. R. Quinlan, “Learning Decision Tree Classifiers”, ACM Computing

Surveys, vol. 28, pp. 71-72, 1996

62. Decision Tress and Data Mining, May, 2007, http://www.decisiontrees.net/

63. F. Esposito, D. Malerba and G. Semeraro, “A comparative analysis of methods for pruning decision trees”, IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 19, pp. 476–491, 1997

64. Anil K. Jain , Robert P. W. Duin , Jianchang Mao, “Statistical Pattern Recognition: A Review”, IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 22, pp. 4-37, 2000

65. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. John Wiley and Sons, New York, 2001

66. M. Skurichina, R.P.W Duin, “Combining Feature Subsets in Feature Selection”, in Proceeding of the International Conference on Multiple

Classifier Systems, 2005, pp. 165–175

67. R.P.W Duin, PRTOOLS (version 4). A Matlab toolbox for pattern recognition. Pattern Recognition Group, Delft University of Technology, February 2004

68. A.K. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,” IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol.19, pp. 153-158, 1997

69. M. Kudo, J. Sklansky, “Comparison of algorithms that select features for pattern classiffers”, Pattern Recognition, vol. 33, pp. 25-41, 2000

136

70. M. Dash, H. Liu, “Feature selection for classification”, International

Journal of Intelligent Data Analysis, vol. 1, pp. 131-156, 1997

71. A. R. Webb, Statistical Pattern Recognition, Second Edition, Wiley, 2002

72. The free encyclopedia, January 2008, http://en.wikipedia.org

73. Decision Support Systems, January 2008, http://www.openclinical.org/dss.html

74. R. A. Miller, “Medical diagnostic decision support systems--past, present, and future: a threaded bibliography and brief commentary”, Journal of the American Medical Informatics Association, vol. 1, pp. 8-27, 1994

75. R. A. Miller, F.E. Masarie, “Use of the Quick Medical Reference (QMR) program as a tool for medical education”, Methods of

Information in Medicine, vol. 28, pp.340-5, 1989

76. E. Coiera. The Guide to Health Informatics (2nd Edition). Arnold, London, October 2003.

77. DXPlain, January 2008, http://lcs.mgh.harvard.edu/projects/dxplain.html

78. D. L. Hunt, R. B. Haynes, S. E. Hanna, K. Smith, “Effects of Computer-Based Clinical Decision Support Systems on Physician Performance and Patient Outcomes: A Systematic Review”, The Journal of the

American Medical Association, vol. 280, pp. 1339-1346, 1998

79. K. Kawamoto, “Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success”, BMJ, vol. 330, pp. 330-765, 2005

137

APPENDICES

APPENDIX A. An example to excel file of a subject

140

VITA

PERSONAL INFORMATION

Surname, Name: Şen Köktaş, Nigar

Nationality: Turkish (TC)

Date and Place of Birth: March 25, 1977, Rize

Marital Status: Married

Phone: + 90 312 210 5552

Fax: + 90 312 210 4745

E-mail: [email protected]

[email protected]

EDUCATION

Degree Institution Year of Graduation

MS METU Information Systems 2003

BS METU Mathematics Education 2000

WORK EXPERIENCE

Year Place Enrollment

2006- Present METU, Department of Computer Engineering Project Assistant

2000-2006 METU, Informatics Institute Research Assistant

141

PUBLICATIONS

1. N. S. Koktas, N. Yalabik, G. Yavuzer, R.P.W. Duin, “A Decision Tree-

MLP Multiclassifier For Grading Knee Osteoarthritis Using Gait

Analysis”, (in progress)

2. N. S. Koktas, N. Yalabik, G. Yavuzer, V. Atalay, E. Civek, "Combining

Decision Trees And Neural Networks For Grading Knee

Osteoarthritis”, Proceeding of International Symposium on Health

Informatics and Bioinformatics, 2007

3. N. Sen Koktas, R.P.W. Duin, “Statistical Analysis of Gait Data

Associated with Knee Osteoarthritis”, (in progress)

4. N. Sen Koktas, N. Yalabik, G. Yavuzer, “Ensemble Classifiers for

Medical Diagnosis of Knee Osteoarthritis Using Gait Data”,

Proceeding of IEEE International Conference on Machine Learning and

Applications, 2006

5. N. Sen Koktas, N. Yalabik, G. Yavuzer, “Combining Neural Networks

for Gait Classification”, Proceeding of Iberoamerican Congress on

Pattern Recognition, 2006

6. N. Sen Koktas, N. Yalabik, “A Neural Network Classifier for Gait

Analysis”, Proceeding of International Symposium on Health Informatics

and Bioinformatics, 2005

7. N. Sen Koktas, N. Yalabik, “Self-Examination for an Activity Planning

and Progress Following Tool”, Proceeding of the IASTED International

Conference on Web-based Education, 2004

8. N. Sen, N. Yalabik, “An Activity Planning and Progress Following Tool

for Self-Directed Distance Learning”, Computer and Information

Sciences, 2003

142

RESEARCH INTERESTS

• Pattern recognition

• Combining classifiers

• Feature selection approaches

• Neural networks

• Gait analysis

‘oagait’: a decision support system for grading knee...

Documents