optimization-based data mining approaches in neuroscience research panos m. pardalos university of...

108
Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Upload: dorothy-cameron

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Optimization-Based Data Mining Approaches in

Neuroscience Research

Panos M. Pardalos

University of Florida

Page 2: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Data Mining: “the practice of searching through large amounts of computerized data to find useful patterns or trends.”

Optimization: “An act, process, or methodology of making something (as a design, system, or decision) as fully perfect, functional, or effective as possible; specifically : the mathematical procedures (as finding the maximum of a function) involved in this.”

Merriam Webster Dictionary

Introduction

Page 3: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Introduction

The combination of data mining and optimization:

Find the “best” way to extract meaningful “patterns” from data.

Not always an easy task.

Page 4: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

How difficult Optimization can be?

Given integers N1,N2,…,Nk and M find a subset of N1,N2,…,Nk such that their sum is equal to M.

Can you find a better algorithm than of O(2k). Exponential complexity ?

Page 5: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Hard drive Cost

Approximately 1/10 cheaper every 5 years

Page 6: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Hard Drive Capacity

Approximately 10 times more every 5 years

Page 7: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Processing power

Number of transistors of a computer processor double every two years

Page 8: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

References

Handbook of Massive Data Sets, co-editors: J. Abello, P.M. Pardalos, and M. Resende, Kluwer Academic Publishers, (2002).

Page 9: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Main problems in data mining Data preprocessing

Dimensionality reduction

Feature selection

Regression

Clustering (Unsupervised learning)

Classification (Supervised Learning)

Semi-Supervised learning (between unsupervised and unsupervised)

Biclustering

Result Validation

Data Visualization/Representation

Biomedical Informatics is a challenging area with lots of these problems.

Page 10: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Agenda Research Background

Epilepsy Seizure Prediction

Sources of Data Electroencephalogram (EEG) Time Series

Dimensionality Reduction Chaos Theory

Feature Selection for Brain Monitoring Time Series Classification of Neuro-Physiological States Brain Clustering Brain Network Models Concluding Remarks

Page 11: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Facts About Epilepsy At least 2 million Americans and other 40-50

million people worldwide (about 1% of population) suffer from Epilepsy.

Epilepsy is the second most common brain disorder (after stroke) that causes recurrent seizures.

Epileptic seizures occur when a massive group of neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern.

Page 12: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Epileptic Seizures

Seizures usually occur spontaneously, in the absence of external triggers.

Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes.

Seizures may be followed by a post-ictal period of confusion or impaired sensorial that can persist for several hours.

Page 13: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

10-second EEGs: Seizure EvolutionNormal Pre-Seizure

Seizure Onset Post-Seizure

Page 14: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Why do we care? Based on 1995 estimates, epilepsy imposes an annual

economic burden of $12.5 billion* in the U.S. in associated health care costs and losses in employment, wages, and productivity.

Cost per patient ranged from $4,272 for persons** with remission after initial diagnosis and treatment to $138,602 for persons** with intractable and frequent seizures.

Page 15: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Current Epilepsy Treatment Pharmacological Therapy

Anti-Epileptic Drugs (AEDs) Mainstay of epilepsy treatment Approximately 25 to 30% remain unresponsive

Epilepsy Resective Surgery Require long-term invasive EEG monitoring to locate a

specific, localized part of the brain where the seizures are thought to originate

50% of pre-surgical candidates do not undergo respective surgery Multiple epileptogenic zones Epileptogenic zone located in functional brain tissue

Only 50-60% of surgery cases result in seizure free

Page 16: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Current Epilepsy Treatment Electrical Stimulation (Vagus nerve stimulator)

Parameters (amplitude and duration of stimulation) arbitrarily adjusted

As effective as one additional AED dose Side Effects

Seizure Prediction? Monitoring Unit? Forecasting Impending Seizures? Seizure Control? Deep Brain Stimulator?

Page 17: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Electroencephalogram (EEG) …is a traditional tool for evaluating the physiological

state of the brain. …offers excellent spatial and temporal resolution

to characterize rapidly changing electrical activity of brain activation

…captures voltage potentials produced by brain cells while communicating.

In an EEG, electrodes are implanted in deep brain or placed on the scalp over multiple areas of the brain to detect and record patterns of electrical activity and check for abnormalities.

Page 18: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

From Microscopic to Macroscopic Level (Electroencephalogram - EEG)

Page 19: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Electrode Montage and EEGs

1 1

1 1

1 1

2 2

2 2

2 2

3 3

3 3

3 3

4 4

4 4

4 4

5 5

LTDRTD

LOF

LST

ROF

RST

1 1

1 1

1 1

2 2

2 2

2 2

3 3

3 3

3 3

4 4

4 4

4 4

5 5

LTDRTD

LOF

LST

ROF

RST

Page 20: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Scalp EEG Data Acquisition

Page 21: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Open Problems

Is the seizure occurrence random? If not, can seizures be predicted? If yes, are there seizure pre-cursors (in

EEGs) preceding seizures? If yes, what data mining techniques can be

used to indicate these pre-cursors? Does normal brain activity during differ from

abnormal brain activity?

Page 22: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Goals of Research Test the hypothesis that seizures are not a random process.

Demonstrate that seizures could be predicted Feature Selection to identify seizure pre-cursors (Statistical

Process Control) Demonstrate that normal and abnormal EEGs can be

differentiated Time Series Classification

Better understand the epileptogenic process – how seizures are initiated and propagated. Brain Clustering

Develop a closed-loop seizure control device (Brain Pacemaker)

Page 23: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Dimensionality Reduction

Chaos Theory

Page 24: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

EEGs with the Curse of Dimensionality The brain is a non-stationary system. EEG time series is non-stationary. With 200 Hz sampling, 1 hour of EEGs is

comprised of 200*60*60*30 = 21,600,000 data points = 43.2MB(assume 16-bit ASCI format) 1 day = 1.04GB 1 week = 7.28GB 20 patients ≈ 0.15TB

Kilobytes → Megabytes → Gigabytes → Terabytes

Page 25: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Data Transformation Using Chaos Theory

Measure the brain dynamics from time series: Stock Market Currency Exchanges (e.g., Swedish Kroner)

Apply dynamical measures (based on chaos theory) to non-overlapping EEG epochs of 10.24 seconds = 2048 points.

Maximum Short-Term Lyapunov Exponent measure the average uncertainty along the local

eigenvectors and phase differences of an attractor in the phase space

measure the stability/chaoticity of EEG signals

Page 26: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Measure of Chaos

Time

EE

G V

olt

ag

e

Page 27: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

STLmax Profiles

Pre-Seizure Seizure Onset Post-Seizure

Page 28: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Hidden Synchronization Patterns

Page 29: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

By paired-T statistic:Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 60 points, 10 minutes) are

1 2 60

1 2 60

{ max , max , , max }

{ max , max , , max },

i i i i

j j j j

L STL STL STL

L STL STL STL

1 2 60

1 1 2 2 60 60

{ , , , }

{ max max , max max , , max max }

ij i j ij ij ij

i j i j i j

D L L d d d

STL STL STL STL STL STL

Then, we calculate the average value, ,and the sample standard deviation, , of .

ijD

d̂ 2 60{ , , , }ij ij ij ijD d d d

The T-index between EEG signal epochs i and j is defined as ,ˆ

60

ij

ijd

DT

How similar are they?Statistics to quantify the convergence of STLmax

Page 30: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Statistically Quantifying the Convergence

Page 31: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Convergence of STLmax

Page 32: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Why Feature Selection?

Not every electrode site shows the convergence. Feature Selection: Select the electrodes that are most likely to

show the convergence preceding the next seizure.

Page 33: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Feature Selection

Quadratic Integer Programming with Quadratic Constraints

Page 34: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Optimization: We apply optimization techniques to find a group of

electrode sites such that … They are the most converged (in STLmax) electrode

sites during 10-min window before the seizure They show the dynamical resetting (diverged in

STLmax) during 10-min window after the seizure. Such electrode sites are defined as “critical electrode

sites”. Hypothesis:

The critical electrode sites should be most likely to show the convergence in STLmax again before the next seizure.

Optimization Problem

Page 35: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

x is an n-dimensional column vector (decision variables), where each xi represents the electrode site i. xi = 1 if electrode i is selected to be one of the critical electrode

sites. xi = 0 otherwise.

Q is an (nn) matrix, whose each element qij represents the T-index between electrode i and j during 10-minute window before a seizure.

b is an integer constant. (the number of critical electrode sites) D is an (nn) matrix, whose each element dij represents the T-

index between electrode i and j during 10-minute window after a seizure.

α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of T-index, as previously defined, to reject H0: “`two brain sites acquire identical STLmax values within 10-minute window”

Notation and Modeling

Page 36: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Multi-Quadratic Integer Programming

To select critical electrode sites, we formulated this problem as a multi-quadratic integer (0-1) programming (MQIP) problem with … objective function to minimize the

average T-index among electrode sites

a linear constraint to identify the number of critical electrode sites

a quadratic constraint to ensure that the selected electrode sites show the dynamical resetting

1

1

Problem :

Min f( )

s.t.

{0,1}, 1,...,

T

n

ii

T

i

P

x x Qx

x b

x Dx

x i n

Page 37: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Conventional Linearization Approach for Multi-Quadratic 0-1 Problem

2

i

For each product , we introduce new 0-1 variable ( ).

Note that for 0,1 .

The equivalent linear 0-1 problem is given by:

min

s.

i j ij i j

ii i i i

ij ijj

x x x x x i j

x x x x

q x

i

t.

, for , 1,..., ( )

, for , 1,..., ( )

1 , for , 1,..., ( )

ij i

ij j

i j ij

ij ijj

Ax b

x x i j n i j

x x i j n i j

x x x i j n i j

d x

2

{0,1},0 1, , 1,...,

Note that the number of continuous variables has been increased to ( ).

Note that this problem formulation is computationally inefficient as in

i ijx x i j n

O n

n

creases.

Page 38: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Consider the MQIP problem We proved that the MQIP program is EQUIVALENT to a MILP problem

with the SAME number of integer variables.

Theoretical Results:MILP formulation for MQIP problem

Problem :1

Min f( )s.t.

{0,1}, 1,...,

P

Tx x Qxb

Tx Dxx i ni

Ax

Problem :1

Min 0 (1)

(2) (1 ) (3) 0 (4)

(5) z '

P

Te sQx y s

by M x

Dx zTe

M x

Ax

z

(6)

, , 0, 0,1 (7)

where max ,

' max

s y z x

M q Qiji jM d Diji j

Equivalent

Page 39: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Empirical Results:Performance on Larger Problems

Page 40: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Hypothesis: The critical electrode sites should be most likely to show the

convergence in STLmax (drop in T-index below the critical value) again before the next seizure.

The critical electrode sites are electrode sites that are the most converged (in STLmax ) electrode sites during 10-

min window before the seizure show the dynamical resetting (diverged in STLmax ) during 10-min

window after the seizure Simulation:

Based on 3 patients with 20 seizures, we compare the probability of showing the convergence in STLmax (drop in T-index below the critical value) before the next seizure between the electrode sites, which are Critical electrode sites Randomly selected (5,000 times)

Hypothesis Testing - Simulation

Page 41: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Optimal VS Non-Optimal

Page 42: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Simulation - Results

Page 43: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Statistical Process Control:How to automate the system?

Page 44: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Select critical electrode sites after every subsequent seizure

EEG Signals

Give a warning whenT-index value drops below a critical value

Monitor the averageT-index of the critical electrodes

Continuously calculateSTLmax from multi-channel EEG.

ASWA

Automated Seizure Warning System

Page 45: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Data Characteristics

Page 46: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Performance Evaluation for ASWS

To test this algorithm, a warning was considered to be true if a seizure occurred within 3 hours after the warning.

Sensitivity =

False Prediction Rate = average number of false warnings per hour

seizures analyzed of #

seizures predicted accurately of #

Page 47: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Performance characteristics of automated seizure warningalgorithm with the best parameter-settings of training data set.

Training Results

Page 48: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between:

the false positive rate (1-Specificity, plotted on X-axis) that needs to be minimized

the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized.

RECEIVER OPERATING CHARACTERISTICS (ROC)

Page 49: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Test Results

Performance characteristics of automated seizure warning algorithm with the best parameter settings on testing data set.

Page 50: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Validation of the ASWS algorithm

Temporal Properties Surrogate Seizure Time Data Set 100 Surrogate Data Sets

Spatial Properties Non-Optimized ASWS – Selecting non-optimal

electrode sites 100 Randomly Selected Electrodes

Page 51: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Prediction Scores: Surrogate Data and Non-Optimized ASWS

Page 52: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Remarks Optimization as feature selection for brain monitoring Developed an online real-time seizure prediction system Tested on the dataset of

10 patients suffering from temporal lope seizures ~90 days (2100 hours) of EEG data 58 seizures

Seizure Prediction Predicting ~70% of temporal lobe seizures on average Giving a false alarm rate of ~0.16 per hour on average

What’s next?-fundamental questions on brain physiology

Page 53: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Time Series Classification I

Support Vector Machines with Dynamic Time Warping

Page 54: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Other Dynamical Measures:Phase Profiles

Page 55: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Other Dynamical Measures:Entropy H of Attractor

Page 56: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Classification of Physiological States

Page 57: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Support Vector Machines

From 1 electrode

Page 58: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Input

Standard SVM Input 30 electrodes, 30 data points, 3 dynamical

features = 2,700 features

Time Series SVM Input 30*29 data pairs, 3 dynamical features = 2,700 –

90 features

Page 59: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Dynamic Time Warping

Page 60: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Preliminary Data Set

132 5-minute epochs of pre-seizure EEGs 300 5-minute epochs of normal EEGs

Pre-seizure = 0-30 minutes before seizure Normal = 10 hours away from seizure

Page 61: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Metrics for Performance Evaluation

PREDICTED CLASS

ACTUALCLASS

Class=Yes Class=No

Class=Yes a b

Class=No c d

a: TP (true positive); b: FN (false negative);

c: FP (false positive); d: TN (true negative)

Page 62: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Sensitivity and Specificity Sensitivity measures the fraction of positive cases that are

classified as positive. Specificity measures the fraction of negative cases classified as

negative.

Sensitivity = TP/(TP+FN)Specificity = TN/(TN+FP)

Sensitivity can be considered as a detection (prediction or classification) rate that one wants to maximize.

Maximize the probability of correctly classifying patient states.

False positive rate can be considered as 1-Specificity which one wants to minimize.

Page 63: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Leave-one-out Cross Validation

Cross-validation can be seen as a way of applying partial information about the applicability of alternative classification strategies.

K-fold cross validation: Divide all the data into k subsets of equal size. Train a classifier using k-1 groups of training data. Test a classifier on the omitted subset. Iterate k times.

Page 64: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Empirical Results

Page 65: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

 

Com

 

User

InterfaceTechnology

Multichannel

Data Acquisition

PatternRecognition

Initiate a variety oftherapies (e.g., electrical stimulation, drug injection)

VNS

Automated Seizure Prediction Paradigm

Drug

Feature Extraction/ Cluster Analysis

Page 66: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Related Patents

Multi-dimensional multi-parameter time series processing for seizure warning and predictionPatent 7,263,467 (Issued on August 28, 2007).

Optimization of Multi-dimensional Time Series Processing for seizure warning and predictionPatent 7,373,199 (Issued on May 13, 2008).

Optimization of spatio-temporal pattern processing for seizure warning and predictionPatent 7,461,045 (Issued on December 2, 2008).

Multi-dimensional dynamical analysisU.S. Utility Patent application filed on December 21, 2006, Serial No.: 11/339,606.

Closed-Loop State-Dependent Seizure Prevention SystemsU.S. Utility Patent application filed on December 19, 2006, Serial No.: 11/641,292.

Page 67: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Brain Network Models

Brain Connectivity Networks Based on fMRI Data

Page 68: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Certain neurological diseases are very difficult to diagnose at early stages

Functional Magnetic Resonance Imaging (fMRI) technique provides vast amount of information about structure and function of human brain, but there is lack of methods to analyse these data

Computational methods and algorithms based on mathematical models should be applied in order to find and recognize key patterns in this “ocean” of data

The Problem

Page 69: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Network models of human brain Partition of the brain into regions of interest Functional interconnections between regions

in brain

Network Models

Page 70: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Connectivity Networks

Page 71: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Blood flow level as an indicator of neuronal activity Representation of values of signal in spatial voxels as

2D and 3D images

MRI Data

Page 72: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

The measurements are being performed every 2 seconds over 6 minutes for each voxel of brain of size 2mm x 2mm x 2mm

The fMRI data is therefore a set of time series, corresponding to particular elementary volumes of the brain. In our data set each series contains 180 elements.

fMRI Data

Page 73: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

fMRI Data, Vector Representation

Z

X

Y0

(x, y, z)

Page 74: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Small world phenomenon first described by

Stanley Milgram in 1960.

“Six degrees of separation”

“Erdos number”

Small World Networks

Page 75: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Random graphs generally have property of low mean shortest path length and low clustering coefficient

Regular lattice has high mean shortest path and high cluster coefficient

Small world networks have low mean shortest path length while still high clustering coefficient

From Random Graph to Regular Lattice

Page 76: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Random Graph vs Regular Lattice

Page 77: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Small World Network

Page 78: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Characteristic path length

Clustering coefficient

Global efficiency

Nodal efficiency

Quantitative Measures of “Small World” Property

Page 79: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Brain connectivity networks possess small world properties

We predicted, that network characteristics, such as global and local efficiency values, would be decreased for people with Parkinson’s disease.

Brain Connectivity Networks

Page 80: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

How to define brain regions – nodes in the network?

Clustering problem Standard MNI template

Nodes in Connectivity Network

Page 81: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Signal Time Series Form Clusters

time

Page 82: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Each data set contains roughly 100 000 of time series, each of them consist of 180 elements

Efficient algorithms should be developed in order to solve this problem

Clustering Problem

Page 83: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Partition of the brain into 116 brain regions

Standard MNI Brain Atlas

Page 84: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Weighted graph with nodes corresponding to MNI brain regions

Weights of edges defined based on correlation between averaged neural activity over the regions

Edges in Connectivity Network

Page 85: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Neural activity Head movements during the MR session Respiratory and heart rhythms Noise

Signal Processing

Page 86: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Wavelet is a “small wave” Wavelet transform is a decomposition of initial

signal into linear combination of wavelets

Maximal Overlap Discrete Wavelet Transform

Page 87: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Time Series Decomposition by Wavelets

Page 88: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Inter-regional correlations in resting state fMRI data are particularly salient at frequencies below 0.1 Hz

Second scale wavelet coefficients correspond to 0.06 – 0.12 Hz frequency range

Wavelet Coefficients Correlation

Page 89: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Averaged over the regions signal vectors

Define level 2 wavelet coefficients of averaged signals , .

The connectivity between regions A and B is

Connectivity Strength

Page 90: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

For each time series S = {s1, s2, …, sn } of size n there is a corresponding point in n-dimensional space

For normal vectors x and y the distance between end points is equal to

Therefore, (1 – corr(x,y)) may serve as a measure of distance between time series

Definition of Distance Between Nodes

Page 91: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Geometrical Representation

x

y

0

x - y S = (s1, s2, …, sn)

Page 92: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

15 healthy controls, 14 Parkinson patients

Each network for each patient consist of 116 nodes

Data Set

Page 93: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Averaged Connectivity Networks

Control Parkinson

Page 94: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Global Network Efficiency Values

Control (1.85 +/- 0.57), Parkinson (1.12 +/- 0.55), independent t-test p-value = 0.0017

Page 95: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Top 30 Nodal Efficiency Values

Page 96: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Nodal Efficiency Plot

Red line – Control set, blue line - PD set

Page 97: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Parkinson’s brain network properties possess measurable alteration in comparison with healthy ones

Further research, in particular, different network model, may reveal the pattern in brain networks, which could be used as a diagnosis criteria

Discussion

Page 98: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Concluding Remarks

Overview of Epilepsy Research Applications of Data Mining and Optimization

Techniques Interplay between theory and application Feature Selection Time Series Classification Brain Clustering Brain Network Models

Page 99: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Related Patents

Sensor registration by global optimization proceduresPatent 7,653,513 (Issued January 26, 2010).

Atomic Magnetometer Sensor Array Magnetoencephalogram Systems and MethodUnited States Patent Application 20100219820 (Filed April 14, 2008)

Page 100: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

References

Handbook of Massive Data Sets, co-editors: J. Abello, P.M. Pardalos, and M. Resende, Kluwer Academic Publishers, (2002).

Page 101: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

References

“Feature Selection for Consistent Biclustering via Fractional 0-1 Programming” (with Stanislav Busygin and Oleg A. Prokopyev), Journal of Combinatorial Optimization, Volume 10, Number 1 (2005), pp. 7-21.

“Biclustering in Data Mining” (with S. Busygin, and O. Prokopyev), Computers & Operations Research, Volume 35, Issue 9 (2008), pp. 2964-2987.

“On Biclustering with Features Selection for Microarray Data Set” (with S. Busygin and O. Prokopyev), In (BIOMAT 2005) Proceedings of the International Symposium on Mathematical and Computational Biology (Edited by R. Mondaini & R. Dilao), World Scientific (2006), pp. 367-377.

“Biclustering: algorithms and applications in data mining and forecasting” (with P. Xanthopoulos, N. Boyko and N. Fan) In Encyclopedia of Operations Research and Management Science (accepted to appear) Wiley(2010).

“Clustering Challenges on Biological Networks” S. Butenko, W. A. Chaovalitwongse and P. M. Pardalos, World Scientific (2009).

Page 102: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Quantitative Neuroscience, co-editors: P.M. Pardalos, C. Sackellares, P. Carney, and L. Iasemidis, Kluwer Academic Publishers, (2004).

Biocomputing, co-editors: P.M. Pardalos and J. Principe, Kluwer Academic Publishers, (2002).

References

Page 103: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

References

New in 2010: Computational Neuroscience, co-editors:

W.A. Chaovalitwongse, P.M. Pardalos, P. Xanthopoulos (Eds.) Series: Springer Optimization and Its Applications , Vol. 38.

Page 104: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

References

Optimization in Medicine, Carlos Alves,, Panos M. Pardalos, Luis Vicente (Eds.), 2008

Page 105: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

References

Handbook of Optimization in Medicine, Panos M. Pardalos, Edwin H. Romeijn (Eds.), 2009

Page 106: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005.

W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Electroencephalogram (EEG) time series classification: Applications in epilepsy , Annals of Operations Research, 148, 1 (2006), p 227-250.

Jicong Zhang, Petros Xanthopoulos ,Chang-Chia Liu, Panos M. Pardalos. Real-time differentiation of nonconvulsive status epilepticus from other encephalopathies using quantitative EEG analysis: A pilot study“, Epilepsia, 51, 2 (2010), pp. 243-250

W. Chaovalitwongse , P.M. Pardalos, L.D. Iasemidis, D.-S. Shiau, and J.C. Sackellares. Dynamical Approaches and Multi-Quadratic Integer Programming for Seizure Prediction. Optimization Methods and Software, 20 (2-3): 383-394, 2005 .

L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W. Chaovalitwongse, K. Narayanan, A. Prasad, K. Tsakalis, P.R. Carney, and J.C. Sackellares. Long Term Prospective On-Line Real-Time Seizure Prediction. Journal of Clinical Neurophysiology, 116 (3): 532-544, 2005.

P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004. (INFORMS Pierskalla Best Paper Award 2004)

W. Chaovalitwongse , P.M. Pardalos, and O.A. Prokopyev. A New Linearization Technique for Multi-Quadratic 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004. (Rank 5th in Top 25 Articles in Operations Research Letters)

Reference

Page 107: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Thank you for your attention!

Questions?

Page 108: Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Conference in 2011