detecting dementia from written and spoken language · 2020-03-07 · dementia from written and...

68
Detecting Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December 20th, 2017

Upload: others

Post on 30-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

Detecting Dementia from Written and Spoken LanguageMSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini

December 20th, 2017

Page 2: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

2

• Recent interest in intersection of computational linguistics and clinical psychology

• Can we detect: • Depression from Facebook updates • PTSD from tweets

• Is it possible to detect dementia from interview transcripts?

• Why do we care?

Background

ContributionsBackground Medical

Study 2Study 1 ConclusionStudy 3

Page 3: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

• What is dementia? • Broad category of brain diseases which cause decrease in mental ability

• Alzheimer’s disease • vascular dementia • dementia with Lewy Bodies, and others

• Alzheimer’s is one of the most costly diseases in developing countries

• 47 million people worldwide w/ dementia • One in nine people aged 65 and older have AD • $109 - $215 billion annually in US

3

Background

ContributionsBackground Medical

Study 2Study 1 ConclusionStudy 3

Page 4: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

“Researchers believe that early detection of Alzheimer’s will be key to preventing, slowing and stopping the disease.”

4

A. Association. 2016 alzheimer’s disease facts and figures. https://www.alz.org/documents custom/2016-facts-and-figures.pdf, 2016

Background

ContributionsBackground Medical

Study 2Study 1 ConclusionStudy 3

Page 5: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

• Currently diagnosis is: • Expensive • Time consuming • Invasive • Often goes undiagnosed, especially in developing countries

• Symptoms: • Wide variety of symptoms severe enough to affect daily

functioning • Aphasia: Speech and language difficulty

5

Background

ContributionsBackground Medical

Study 2Study 1 ConclusionStudy 3

Page 6: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

6

• Our work builds upon Fraser 2015 • Address small data issue • Introduce a new feature set • Extend to written samples

Background• State of the art in

detecting AD was achieved by Fraser 2015* • Introduced a large (370) feature set • Achieved state-of-the-art results on

DementiaBank dataset

“Cookie Theft” Photo from the Boston Diagnostic Aphasia

Examination

ConclusionContributionsBackground Previous Work

Study 2Study 1 Study 3

*Fraser, Kathleen C., Jed A. Meltzer, and Frank Rudzicz. "Linguistic Features Identify Alzheimer’s Disease in Narrative Speech." Journal of Alzheimer's Disease 49.2 (2015): 407-422. APA

Page 7: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

7

A clock drawn by a person with hemispatial neglect

Study 1: New feature sets• Fraser missed two feature sets

• One from the medical literature • One from the NLP literature

• We replicate their findings and show: • Measuring hemispatial neglect achieves new

state-of-the-art • Discourse analysis has no effect on

DementiaBank dataset

Thalia Shoshana Field, Vaden Masrani, Gabriel Murray, Giuseppe Carenini. "Improving Diagnostic Accuracy Of Alzheimer's Disease From Speech Analysis Using Markers Of Hemispatial Neglect." Alzheimer's & Dementia: The Journal of the Alzheimer's Association 13.7 (2017): P157-P158.

Contributions New feature sets

ConclusionBackground Study 2Study 1 Study 3

Page 8: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

8

Study 2: Written Language• All previous work has been on spoken samples

• Written samples harder to process but… • There will be lots of data in the future

• We create a publicly available dataset of blog posts • Posts written by people with / without dementia • Allows for:

• Analysis of changes in writing style as disease progresses • Comparison between different pathologies

• We show it is possible to detect dementia from writing samples

Vaden Masrani, Gabriel Murray, Thalia Shoshana Field, Giuseppe Carenini. "Detecting Dementia through Retrospective Analysis of Routine Blog Posts by Bloggers with Dementia." BioNLP (2017).

ConclusionBackground Study 2Study 1 Study 3Contributions Written language

Page 9: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

9

Study 3: Detecting Mild Cognitive Impairment

Vaden Masrani, Gabriel Murray, Thalia Shoshana Field, Giuseppe Carenini "Domain Adaptation for Detecting Mild Cognitive Impairment." Canadian Conference on Artificial Intelligence (2017).

• MCI is difficult to detect • Very little data (n=37) • Less symptomatic than AD • Less studied than AD

• We show how to overcome the lack of data • Idea: Use AD data to detect MCI • Technique: domain adaptation • We compare two algorithms across a range of models

ConclusionBackground Study 2Study 1 Study 3Contributions

MCI w/ Domain Adaptation

Page 10: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

10

ConclusionContributionsBackground Study 2 Study 3

Study 1 Hemispatial

Neglect

Study 1

Page 11: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

11

Hemispatial Neglect

A clock drawn by a person with hemispatial neglect

• Reduced awareness on one side of the visual field

• Often occurs as a result of brain damage

• Not blindness • Patients may not know they

have hemispatial neglect

ConclusionContributionsBackground Study 2 Study 3Study 1

Hemispatial Neglect

Page 12: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

12

Hemispatial Neglect

Eye movements of a patient with left-side hemispatial neglect. Patient was asked to search for letter T among Ls

• Reduced awareness on one side of the visual field

• Often occurs as a result of brain damage

• Not blindness • Patients may not know they

have hemispatial neglect

ConclusionContributionsBackground Study 2 Study 3Study 1

Hemispatial Neglect

Page 13: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

13

ConclusionContributionsBackground Study 2 Study 3Study 1

Hemispatial Neglect

Page 14: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

14

ConclusionContributionsBackground Study 2 Study 3Study 1

Hemispatial Neglect

HalvesLeft boy, girl, cookie, jar, stool, cupboard, steal, fall, kitchen

Right woman, exterior, sink, plate, dishcloth, water, window,

dishes, curtains, wash, overflow, cupboard, kitchen

StripsFar-left girl, cookie, jar, stool, cupboard, steal, kitchen, cupboard

Center-left boy, cookie, stool, steal, fall, kitchen, cupboard

Center-right woman, exterior, sink, plate, dishcloth, water, window,

dishes, curtains, wash, overflow, kitchen, cupboard

Far-right exterior, window, dishes, curtains, kitchen, cupboard

QuartersNE woman, exterior, plate, dishcloth, wash, window, curtains,

kitchen

SE woman, sink, water, dishes, overflow, cupboard, kitchen

NW girl, cookie, jar, cupboard, steal, boy, cookie, kitchen

SW girl, stool, fall, cupboard, kitchen

Page 15: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

15

ConclusionContributionsBackground Study 2 Study 3Study 1

Hemispatial Neglect

attentioni = # of mentions of info-units in division i

concentrationi =# of mentions of info-units in division i

total words uttered

repetitioni =# of unique info-units mentioned in division i

# of mentions of info-units in division i

perceptioni =# of unique info-units mentioned in division i

# of infounits in division i

Page 16: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

16

Discourse Features• In a coherent passage one sentence has a clear relation

to the next • Discourse parsing is the task of segmenting a piece of

text into EDUs* and then forming a discourse tree

* EDU = Elementary Discourse Unit

ConclusionContributionsBackground Study 2 Study 3Study 1 Discourse Analysis

Page 17: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

17

Discourse tree for the two sentences: "But he added: 'Some people use the purchasers' index as a leading indicator, and some use it as a coincident

indicator. But the thing it's supposed to measure - manufacturing strength - is missed altogether last month.'"

ConclusionContributionsBackground Study 2 Study 3Study 1 Discourse Analysis

Page 18: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

18

Discourse Features• # of each discourse relation • Depth of the discourse tree • Mean number of EDU per utterance • Discourse relation type-to-token ratio

ConclusionContributionsBackground Study 2 Study 3Study 1 Discourse Analysis

Page 19: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

• Baseline: Fraser’s feature set • Psycholinguistic (5) • Vocabulary Richness (4): • Acoustic (172): • Info-Units (40) • Parts-of-speech (15) • Context-free-grammar rules (44) • Syntactic Complexity (27) • Repetitiveness (5)

• 10-fold CV • Feature selection preprocessing step

19

Methodology

A parse tree for “a child is stealing cookies”

ConclusionContributionsBackground Study 2 Study 3Study 1 Methodology

Page 20: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

20

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

• Halves improves best model by 2.2%

• Strips second largest improvement

• Quadrant and discourse features have negligible effect

halves -> hemispatial neglect features w/ halves partition strips -> hemispatial neglect features w/ strips partition quadrants -> hemispatial neglect features w/ quadrants partition

Results

Page 21: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

21

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

• Plot shows change is performance (from baseline)

• Quadratic terms don’t help all models

Results

halves -> hemispatial neglect features w/ halves partition strips -> hemispatial neglect features w/ strips partition quadrants -> hemispatial neglect features w/ quadrants partition

Page 22: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

22

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

• Plot shows change is performance (from baseline)

• Quadratic terms don’t help all models

Results

halves -> hemispatial neglect features w/ halves partition strips -> hemispatial neglect features w/ strips partition quadrants -> hemispatial neglect features w/ quadrants partition

Page 23: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

23

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

Results• Features are scored

based on correlation w/ diagnosis

• Score of 1.0 -> most correlated feature across all folds

• Perception: Rightside most important feature (more than age!)

Page 24: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

24

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

Results• Participants w/

dementia:• are less

perceptive on the right side of the image

Page 25: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

25

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

Results• Participants w/

dementia:• use more

personal pronouns

Page 26: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

26

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

Results• Participants w/

dementia:• use shorter

words

Page 27: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

27

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

Results• Participants w/

dementia:• are older

Page 28: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

28

ConclusionContributionsBackground Study 2 Study 3Study 1 Conclusion

Conclusion• Hemispatial neglect features

improve our ability to detect AD • On DementiaBank dataset • More correlated w/ diagnosis than age • Try with/without quadratic cross

terms

• Main negative result: • Discourse features have no effect on

DementiaBank dataset • May be due to non-narrative structure

of response

Page 29: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

29

ConclusionContributionsBackground Study 3Study 1

Study 2 Written Samples

Study 2

Page 30: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

30

Written Samples: Background• Most work to date has been with spoken language

samples • Detecting impairment from writing is more

difficult • No test-specific features • Not constrained to single topic • Author can make revisions

• There will be lots of data in the future • As the “internet generation” grows older • Today only 34% of seniors use social media

• Can we detect dementia from written language?

ConclusionContributionsBackground Study 3Study 1 Study 2 Written Samples

Page 31: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

31

Data set• We developed novel data set of blog posts

• Three written by people with dementia • Three written by caretakers of people with dementia

• Will allow for: • Longitudinal analysis • Comparison between pathology, demographics • Comparison between subtypes of dementia

• We perform preliminary analysis • Is automatic detection possible?

ConclusionContributionsBackground Study 3Study 1 Study 2 Data set

Page 32: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

32

Data set

ConclusionContributionsBackground Study 3Study 1

Blog Name Posts Words Start Date Diagnosis Gender/Age

living-with-alzhiemers 344 263.03 (s=140.28) Sept 2006 AD M, 72 (approx)

creatingmemories 618 242.22 (s=169.42) Dec 2003 AD F, 61

parkblog-silverfox 692 393.21 (s=181.54) May 2009 Lewy Body M, 65

journeywithdementia 201 803.91 (s=548.34) Mar 2012 Control F, unknown

earlyonset 452 615.11 (s=206.72) Jan 2008 Control F, unknown

helpparentsagewell 498 227.12 (s=209.17) Sept 2009 Control F, unknown

Study 2 Data set

Page 33: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

33

Methodology• Features

• Part-of-speech (15), psycholinguistic (5), syntactic complexity (27), repetitiveness (5), Vocabulary Richness (4), context-free-grammar (44)

• No info-unit feature or acoustic features • Include feature selection

• 9-fold CV • Test fold -> posts from 1 dementia blog, 1 control blog • Train fold -> posts from remaining 4 blogs

• Task: predict class of unseen blog post

ConclusionContributionsBackground Study 2 Methodology

Study 3Study 1

Page 34: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

34

ConclusionContributionsBackground Study 3Study 1 Study 2 Results

• All models beat majority class baseline

• KNN/LogReg have best AUC

Results

Page 35: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

35

ConclusionContributionsBackground Study 3Study 1 Study 2 Results

• Ablation analysis • Remove feature

group, measure change in performance

• All feature groups are important • Psycholinguistic

features most of all

Results

Page 36: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

36

ConclusionContributionsBackground Study 3Study 1 Study 2 Results

Results• SUBTLWord score

most important feature • Measure of how

frequently a word is used in daily life

• Pronoun Use/ Word Length also score highly • Same findings as in

spoken dataset

Page 37: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

37

ConclusionContributionsBackground Study 3Study 1 Study 2 Results

Results

• ↑ SUBTLWord score == ↓ Vocabulary • Red == dementia

• SUBTLWord score most important feature • Measure of how

frequently a word is used in daily life

• Pronoun Use/ Word Length also score highly • Same findings as in

spoken dataset

Page 38: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

38

ConclusionContributionsBackground Study 3Study 1 Study 2 Conclusion

Conclusion• Despite difficulties associated with written language, it is

possible to detect dementia from blog posts • All feature groups significant • All models achieved AUC above baseline (.50) • Blogger w/ dementia use simpler language

• SUBTL word score

• Future work • Control for education level • Topic clustering preprocessing step • Longitudinal analysis

Page 39: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

39

ConclusionContributionsBackground Study 2Study 1

Study 3 Domain

Adaptation

Study 3

Page 40: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

40

Domain Adaptation: Background• Mild Cognitive Impairment

• Precursor stage which may lead to eventual dementia diagnosis

• Potentially treatable • Less data available than for AD

• Domain adaptation: • Useful when you have lots of data

from one domain and you want to use it to help you in a second domain

ConclusionContributionsBackground Study 2Study 1 Study 3 Domain

Adaptation

Page 41: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

41

Methodology• Idea: Use AD data to improve MCI classification

accuracy • We compare two domain adaptation algorithms

against baselines: • Alg1: AUGMENT • Alg2: CORAL

• DementiaBank Data: • Source : 257 AD, 201 control • Target : 43 MCI samples, 41 control,

• 10-fold cross validation • Only target data in test fold

ConclusionContributionsBackground Study 2 Study 3 Methodology

Study 1

Page 42: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

42

Alg1: AUGMENT• From widely cited 2009 paper “Frustratingly Easy

Domain Adaptation” by Hal Daumé III • Augment feature space by creating a shared, target

only, and source only copy of each feature:

ConclusionContributionsBackground Study 2 Study 3 Alg1: AUGMENT

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

Page 43: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

43

Alg1: AUGMENT

ConclusionContributionsBackground Study 2 Study 3 Alg1: AUGMENT

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

• Imagine d=2

# of words mean word length

71 4.1

Page 44: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

71 4.1

44

Alg1: AUGMENT

ConclusionContributionsBackground Study 2 Study 3 Alg1: AUGMENT

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

• Imagine d=2

source (AD)

target (MCI)

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

Page 45: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

45

Alg1: AUGMENT

ConclusionContributionsBackground Study 2 Study 3 Alg1: AUGMENT

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

• Imagine d=2

71 4.1 0 0 71 4.1

69 6.1 0 0 69 6.1

64 2.0 0 0 64 2.0

73 3.1 0 0 73 3.1

60 3.3 60 3.3 0 0

65 7.1 65 7.1 0 0

source (AD)

target (MCI)

71 4.171 4.171 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

Page 46: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

46

Alg1: AUGMENT

ConclusionContributionsBackground Study 2 Study 3 Alg1: AUGMENT

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

• Imagine d=2shared

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

target only

source only{ { {

source (AD)

target (MCI)

71 4.1 0 0 71 4.1

69 6.1 0 0 69 6.1

64 2.0 0 0 64 2.0

73 3.1 0 0 73 3.1

60 3.3 60 3.3 0 0

65 7.1 65 7.1 0 0

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

Page 47: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

47

Alg1: AUGMENT

ConclusionContributionsBackground Study 2 Study 3 Alg1: AUGMENT

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

• Important cavet! • Model must learn a weight vector

for this to work • e.g., KNN, Naive Bayes not expected to improve

Page 48: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

48

Alg2: CORAL (CORrelation ALignment)• “CORAL minimizes domain shift by aligning the

second-order statistics of source and target distributions*”

• How? Make covariance matrix of source data == covariance matrix of target data

*B. Sun, J. Feng, and K. Saenko. Return of frustratingly easy domain adaptation. arXiv preprint arXiv:1511.05547, 2015

ConclusionContributionsBackground Study 2 Study 3 Alg2: CORAL

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

Page 49: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

49

Alg2: CORAL (CORrelation ALignment)

ConclusionContributionsBackground Study 2 Study 3 Alg2: CORAL

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

Page 50: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

50

ConclusionContributionsBackground Study 2 Study 3 Results

Study 1

Results• Alg1: AUGMENT

improves upon all baselines

• 5% increase from target_only

• Alg2: CORAL result in a worse performance than target_only

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

Page 51: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

51 source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

ConclusionContributionsBackground Study 2Study 1 Study 3 Results

Results• Alg1: AUGMENT

fails to improve models that do not learn a weight vector • KNN • Random Forests • Naive Bayes

Page 52: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

52AD = Alzheimer’s Disease, MCI = Mild Cognitive Impairment

ConclusionContributionsBackground Study 2Study 1 Study 3 Conclusion

Conclusion• Mild Cognitive Impairment is hard to detect:

• Lack of data • Less symptomatic

• We can improve MCI detection with AD data • Using domain adaptation

• Alg1: AUGMENT algorithm increases F-measure by 5% • Model must learn a weight vector

• Main negative result: • Alg2: CORAL doesn’t work on DementiaBank dataset • May be due to binary features

Page 53: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

53

Motivation• Detecting dementia early is very

important • Key to finding a cure • For patients to receive support

• NLP + ML can be used to detect dementia from language samples

• Language samples are easy to collect • Application can distributed easily to

developing countries

Conclusion Motivation

ContributionsBackground Study 2 Study 3Study 1

“The Progression of Alzheimer's Through My Mom's Crocheting” - wuillermania

Page 54: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

54

Contributions• We showed:

• Measuring hemispatial neglect improves accuracy

• It’s possible to automatically detect dementia from unstructured written text

• AD data can improve our ability to detect MCI using domain adaptation

Conclusion Contributions

ContributionsBackground Study 2 Study 3Study 1

“The Progression of Alzheimer's Through My Mom's Crocheting” - wuillermania

Page 55: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

55

Contribution

Conclusion Summary

ContributionsBackground Study 2 Study 3Study 1

“The Progression of Alzheimer's Through My Mom's Crocheting” - wuillermania

• We released a new data set • Blog posts • Will permit longitudinal analysis

• Main negative results: • Discourse features have no effect w/

CookieTheft data • Domain adaptation alg2: CORAL

performs poorly w/ CookieTheft data

Page 56: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

56

Future work

Conclusion Future work

ContributionsBackground Study 2 Study 3Study 1

“The Progression of Alzheimer's Through My Mom's Crocheting” - wuillermania

• Multiple source domains • AD, vascular dementia, dementia with

Lewy Bodies • CookieTheft, Narrative Retelling Task,

Blogs

• Extend written data set • Emails • Forum conversations • Presidential tweets

• Longitudinal analysis • Track changes in writing style as

disease progresses

Page 57: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

57

Acknowledgements• Dr. Giuseppe Carenini, University of British Columbia • Dr. Gabriel Murray, University of Fraser Valley • Dr. Thalia Field, UBC Faculty of Medicine

Page 58: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

58

Thank you!

Page 59: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

59

Methodology Feature Selection

Page 60: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

60

Contribution 1 Results

F-Measure Baseline Strips Halves Quarters DiscourseLogReg 0.824 (0.798-0.850) 0.833 (0.801-0.866) 0.846 (0.813-0.878) 0.821 (0.784-0.859) 0.824 (0.798-0.850)

SVM 0.737 (0.688-0.786) 0.725 (0.683-0.766) 0.721 (0.678-0.764) 0.730 (0.693-0.767) 0.737 (0.688-0.786)

KNN 0.692 (0.654-0.729) 0.716 (0.674-0.759) 0.728 (0.669-0.788) 0.707 (0.675-0.739) 0.692 (0.654-0.729)

RandomForest 0.796 (0.761-0.832) 0.802 (0.759-0.845) 0.754 (0.717-0.792) 0.799 (0.765-0.834) 0.800 (0.763-0.836)

NaiveBayes 0.780 (0.746-0.814) 0.777 (0.747-0.806) 0.760 (0.732-0.789) 0.780 (0.750-0.810) 0.780 (0.746-0.814)

Page 61: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

61

Contribution 1 Results

Change in F-Measure Strips Halves Quarters DiscourseLogReg 0.010 (-0.008-0.027) 0.022 (0.003-0.041) -0.002 (-0.033-0.028) 0.000

SVM -0.012 (-0.058-0.034) -0.016 (-0.084-0.053) -0.007 (-0.034-0.020) 0.000

KNN 0.025 (-0.013-0.062) 0.037 (-0.025-0.099) 0.016 (-0.013-0.045) 0.000

RandomForest 0.006 (-0.026-0.037) -0.042 (-0.072--0.012) 0.003 (-0.011-0.017) 0.003 (-0.012-0.019)

NaiveBayes -0.004 (-0.017-0.010) -0.020 (-0.046-0.006) -0.000 (-0.008-0.007) 0.000

Page 62: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

62

Contribution 1 Results

Change in F-Measure Strips Halves Quarters DiscourseLogReg 0.010 (-0.008-0.027) 0.004 (-0.024-0.033) -0.002 (-0.033-0.028) 0.000

SVM -0.012 (-0.058-0.034) -0.019 (-0.052-0.014) -0.007 (-0.034-0.020) 0.000

KNN 0.025 (-0.013-0.062) 0.022 (-0.010-0.054) 0.016 (-0.013-0.045) 0.000

RandomForest 0.006 (-0.026-0.037) 0.004 (-0.028-0.037) 0.003 (-0.011-0.017) 0.003 (-0.012-0.019)

NaiveBayes -0.004 (-0.017-0.010) 0.001 (-0.011-0.013) -0.000 (-0.008-0.007) 0.000

Page 63: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

63

Contribution 3 Results

Performance Majority Class LogReg SVM KNN Random

Forests Naive Bayes

Accuracy 0.515 (0.471-0.560)

0.822 (0.795-0.848)

0.723 (0.674-0.772)

0.701 (0.654-0.749)

0.796 (0.759-0.833)

0.782 (0.746-0.818)

AUC 0.500 0.894 (0.867-0.921)

0.769 (0.741-0.796)

0.750 (0.712-0.787)

0.871 (0.838-0.903)

0.846 (0.810-0.882)

F-Measure 0.677 (0.638-0.716)

0.824 (0.798-0.850)

0.737 (0.688-0.786)

0.692 (0.654-0.729)

0.796 (0.761-0.832)

0.780 (0.746-0.814)

Page 64: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

64

Contribution 2 Results

F-Measure target only source only relabelled AUGMENT CORAL

Majority Class 0.283 (.122-0.445) 0.637 (.487-0.786) 0.637 (.487-0.786) 0.637 (.487-0.786) 0.637 (.487-0.786)

LogReg 0.667 (.505-0.829) 0.614 (.445-0.783) 0.690 (.577-0.803) 0.717 (.562-0.871) 0.637 (.487-0.786)

SVM 0.595 (.472-0.719) 0.637 (.487-0.786) 0.659 (.520-0.798) 0.664 (.533-0.796) 0.637 (.487-0.786)

source data = alzheimer’s + control, target data = MCI + control

Page 65: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

65

Contribution 2 Results

F-Measure target only source only relabelled AUGMENT CORAL

Majority Class 0.283 (0.122-0.445) 0.637 (0.487-0.786) 0.637 (0.487-0.786) 0.637 (0.487-0.786) 0.637 (0.487-0.786)

RandomForest 0.602 (0.503-0.701) 0.670 (0.534-0.806) 0.695 (0.558-0.832) 0.581 (0.478-0.684) 0.642 (0.490-0.794)

KNN 0.597 (0.429-0.764) 0.522 (0.347-0.696) 0.550 (0.372-0.729) 0.558 (0.357-0.760) 0.637 (0.487-0.786)

NaiveBayes 0.536 (0.362-0.710) 0.520 (0.396-0.643) 0.545 (0.422-0.668) 0.512 (0.425-0.599) 0.647 (0.502-0.791)

source data = alzheimer’s + control, target data = MCI + control

Page 66: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

66

Contribution 2 Domain Adaptation: CORAL

Whitening

source data = alzheimer’s + control, target data = MCI + control

Page 67: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

67

Contribution 2 Domain Adaptation: CORAL

Recolouring

source data = alzheimer’s + control, target data = MCI + control

Page 68: Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

68

Contribution 3 Results

Performance Majority Class LogReg SVM KNN Random

Forests Naive Bayes

Accuracy 0.629 (0.583-0.674)

0.724 (0.677-0.770)

0.638 (0.595-0.681)

0.728 (0.687-0.769)

0.681 (0.617-0.745)

0.658 (0.595-0.721)

AUC 0.500 0.759 (0.689-0.829)

0.666 (0.613-0.719)

0.761 (0.714-0.807)

0.696 (0.599-0.792)

0.674 (0.574-0.775)

F-Measure 0.770 (0.736-0.804)

0.785 (0.743-0.827)

0.773 (0.739-0.806)

0.785 (0.746-0.823)

0.766 (0.719-0.813)

0.732 (0.680-0.783)