multimodal behavior signal analysis and interpretation for young kids with asd
TRANSCRIPT
![Page 1: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/1.jpg)
Presenter: Ming Li
SYSU-CMU Joint Institute of Engineering, Sun Yat-sen University, China
1
Identifying Children with Autism Spectrum Disorder with
Machine Learning Framework
![Page 2: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/2.jpg)
2
Background
![Page 3: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/3.jpg)
Machine Learning
3
![Page 4: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/4.jpg)
Machine Learning
4
![Page 5: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/5.jpg)
Background
• Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder.
• The rate of ASD has risen sharply in the past decade. Now 1/68[1] children in the United States has ASD.
• Early interference is so far the most effective method to treat ASD.
• Current ASD diagnose procedure is both time and labor consuming.
• Our main motivation is to analyze children behavior with machine learning, and design a prediction system that can conveniently perform early screening.
5
[1] Prevalence of autism spectrum disorder among children aged 8 years-autism and developmental disabilities monitoring network. Morbidity and mortality weekly report.
![Page 6: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/6.jpg)
6
Key Symptoms of ASD
http://pages.samsung.com/ca/lookatme/English/
![Page 7: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/7.jpg)
• Restricted and repetitive behaviors[1].
• Social interaction and communication difficulties.
• Avoid eye contact with another person.
• …
7
Key Symptoms of ASD
[1] http://www.ninds.nih.gov/disorders/autism/detail_autism.htm#3082_2
![Page 8: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/8.jpg)
• Restricted and repetitive behaviors[1].
• Social interaction and communication difficulties.
• Avoid eye contact with another person.
• Based on the above observation, we have a hypothesis that ASD children may scan faces differently when looking at them.
• Our work focus on finding such abnormality in the face scanning patterns to predict ASD.
• Can we capture the eye movement information? Yes!
8
Key Symptoms of ASD
[1] http://www.ninds.nih.gov/disorders/autism/detail_autism.htm#3082_2
![Page 9: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/9.jpg)
9
Eye Tracking Device
http://www.tobiipro.com/product-listing/tobii-pro-tx300/
Input: An image on the screen.
Output: A series of eye gaze coordinates indicating where the person is looking at on the showed image.
![Page 10: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/10.jpg)
10
Related Work: Do individuals with and without autism spectrum disorder scan faces differently?
![Page 11: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/11.jpg)
Sequence of eye gaze coordinates
Sequence of eye gaze coordinates
Sequence of eye gaze coordinates
11
The Face Scanning Pattern Dataset
Li Yi et al., “Do individuals with and without autism spectrum disorder scan faces differently? A new multi-method look at an existing controversy,” Autism Research, 2014.
Eye tracking
![Page 12: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/12.jpg)
12
Subject Composition
Li Yi et al., “Do individuals with and without autism spectrum disorder scan faces differently? A new multi-method look at an existing controversy,” Autism Research, 2014.
Subject composition:
• 29 4- to 11-year-old Chinese children with ASD (ASD)
• 29 Chinese typically-developed children matched with the chronological age (TD-Age)
• 29 Chinese typically-developed children matched with IQ (TD-IQ)
![Page 13: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/13.jpg)
13
Analysis with Area of Interest (AOI)
Li Yi et al., “Do individuals with and without autism spectrum disorder scan faces differently? A new multi-method look at an existing controversy,” Autism Research, 2014.
Manually partition each face image into different semantic regions.
Count the number of gazes falling into each individual area.
Significant differences between ASD and non-ASD groups are found, especially on right eye and nose.
![Page 14: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/14.jpg)
14
Proposed ASD Prediction Framework
![Page 15: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/15.jpg)
15
Dataset Description
Sequence of eye gaze coordinates
Sequence of eye gaze coordinates
Sequence of eye gaze coordinates
Eye tracking
Dataset (Sequence of 2d x-y coordinates)
Coordinate Level
ImageLevel
SubjectLevel
87 participants
87*55 = 4785 instances
87*55*180 ≈ 9*10^5 instances
![Page 16: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/16.jpg)
Set of image-wise features (4785 instances)
Trained Prediction
Model
16
Dataset Description
ImageLevel
SubjectLevel
87 participants
87*55 = 4785 instances
Feature Representation
![Page 17: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/17.jpg)
17
Summary: From Gaze Coordinate to Image-wise Feature
• One feature vector is extracted from the gaze coordinates(180 points) belonging to one viewed image.
• For each subject in the dataset, the number of extracted features is equal to the number of viewed images(55).
• Each feature is associated with a binary label (ASD/non-ASD), decided by the group label of its corresponding subject.
• ASD Prediction model is trained on these labeled features.
![Page 18: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/18.jpg)
Obtain dictionary words with k-means:
• Algorithm: Perform k-means on eye-gaze coordinates.
• Assume N subjects (participants) are assigned for training.
• Input: The complete set of eye gaze coordinates from the N subjects.
• Output: K cluster centroids and their generated cells.
• Each cell is treated as a dictionary word.
18
BoW Feature Representation
Illustration of the cells projected onto one example image. Note that such cell partition is the same for all viewed images. From left to right: K = 16, 32, 48, 96.
![Page 19: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/19.jpg)
K-Means vs AOI
• The Area of Interest (AOI) approach needs manual segmentation for every face image.
• K-means is data driven, can automatically group the data based on coordinate locations and densities.
19
Dictionary words with k-meansDictionary words with AOI
![Page 20: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/20.jpg)
Traditional hard BoW representation:
• Consider hard membership to dictionary words by assigning each data to the closest one.
Proposed soft BoW representation:
• Consider soft membership to dictionary words.
• Soft membership to each cluster:
• Soft BoW histogram:
xn: The n-th eye gaze coordinate on the image.dk: The centroid of the k-th cluster (dictionary word).K: The number of dictionary words.
N: The number of gaze coordinates on one face image.
20
Improved BoW with Soft Membership
![Page 21: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/21.jpg)
Support Vector Machine (SVM) is a linear classifier which learns a decision boundary that maximizes the separation margin.
Data and label
Decision Boundary:
Margin Size:
Objective:
In our problem: The input to SVM are the set of labeled image-level BoW features.
21
Image-Level Prediction with SVM
M
![Page 22: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/22.jpg)
Kernel SVM
22
• The distribution of extracted feature is not linearly separable.
• We can apply kernel method to map the features into a high dimensional kernel space where they become more linearly separable.
• Our work adopts the radial basis function (RBF) kernel SVM for its good performance.
![Page 23: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/23.jpg)
23
Subject level Score ensemble
: Viewed image number of a subject
Subject-Level Prediction:
Image-level Score
Subject-level Score
![Page 24: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/24.jpg)
Framework
24
![Page 25: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/25.jpg)
Framework
25
Improved Dictionary learning
![Page 26: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/26.jpg)
Improved dictionary learning: Motivation
Different from learn the dictionary words with both classes, we need to treat them differently.
The heat map of ASD Group( Left) and TD Group (Right)
![Page 27: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/27.jpg)
Improved dictionary learning
• Supervised Discriminate Mean-shift• The estimator of density difference computed at point x:
• where cd,1and cd,2 are normalization constants for positive and negative classes. yi and yj equal to 1 when class label is positive, equal to -1 when label is negative.
• We can find the mode of this density by finding the location where gradient equal to zero.
![Page 28: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/28.jpg)
• The mean shift result with different iterations.
Improved dictionary learning: Motivation
![Page 29: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/29.jpg)
29
Experiment
![Page 30: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/30.jpg)
30
Evaluation Method
Leave-One-Out Cross Validation
Consider the dataset contains K subjects (participants) whose eye gaze data are recorded:
For k = 1 to K
Testing data = Eye gaze data from subject k.
Training data = Eye gaze data from the rest of the subjects.
Train the prediction model with the training data.
Predict subject_score(k) with the testing data.
End
Benchmarks to quantitatively evaluate the prediction performance:
• Receiver operating characteristic ( ROC Curve)
• Area Under the ROC Curve (AUC)
• Accuracy
![Page 31: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/31.jpg)
Parameter Searching
31
An example of searching parameters with coarse (left) and fine (right) grid-searching in RBF kernel SVM
0.77
0.78
0.78
0.78
0.79
0.79
0.79
0.8
0.8
0.8
0.8
0.81
0.81
0.81
0.81
0.82
0.82
0.82
0.8
2
0.83
0.83
0.8
3
0.83
0.84
0.84
0.84
0.85
0.85
0.85
0.8
6
0.8
6
0.8
6
0.780.78
0.8
7
0.82
0.79
0.86
log (gamma)
log
(C
)
-13 -11 -9 -7 -5 -3 -1 1 3-5
-3
-1
1
3
5
7
9
11
0.855
0.86
0.8
65
0.865
0.865
0.865
0.865
0.8
7
0.87
0.87
0.87
0.87
0.875
0.875
0.87
5
0.875
0.875
0.880.88
0.88
log (gamma)
log
(C
)
-2 -1 06
7
8
9
10
![Page 32: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/32.jpg)
Face partitioning: AOI, K-means and Supervised Meanshift
AOI K-means clustering Supervised Meanshift(baseline) (Our proposed method I) (Our proposed method II)
32
Baseline 2: BoW with AOI Dictionary
![Page 33: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/33.jpg)
Baseline 2: BoW with AOI Dictionary
33
Dictionary Learning Method
AUC(%) Accuracy(%)
AOI 91.20 83.91
K-means 89.63 88.51
Supervised Meanshift 94.53 89.66
AUC and Accuracy with different dictionary learning method
Wenbo Liu, "Identifying Children with Autism Spectrum Disorder Based on Their Face Processing Abnormality: A Machine Learning framework", Autism research, 2016.
![Page 34: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/34.jpg)
Result: Evaluation Summary
34
AUC(%) Accuracy(%)
ASD vs. Non-ASD 94.53 89.66
ASD vs. TD-Age 87.51 82.76
ASD vs. TD-IQ 90.25 86.21
AUC and Accuracy with different negative groups
AUC and Accuracy of different Races
Wenbo Liu, "Identifying Children with Autism Spectrum Disorder Based on Their Face Processing Abnormality: A Machine Learning framework", Autism research, 2016.
AUC(%) Accuracy(%)
All Face Images 94.53 89.66
Same-Race Faces 86.92 85.06
Other-Race Faces 96.91 90.80
![Page 35: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/35.jpg)
Ongoing work• We hope to design a more robust system by looking into
other sources of multimodality information, such as emotion, gaze, activity and speech.
35
![Page 36: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/36.jpg)
Data Example
36
![Page 37: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/37.jpg)
Data CollectionDataset:
• Recording the diagnostic procedures of Autism Diagnostic Observation Schedule (ADOS) exams.
• 230 children are enrolled in the exam.
• M1: 115 participants
• M2: 82 participants
• M3: 33 participants
37
![Page 38: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/38.jpg)
Data collection
![Page 39: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/39.jpg)
Proven technique in our Lab (I)
• Emotion Recognition
Input Images Output Score
39
Surprised Sad Disgust Worried Anxious Neutral Happy Angry
5.84e-06 1.14e-04 1.26e-04 1.51e-07 2.27e-03 3.52e-03 4.89e-02 9.44e-01
1.00e-02 2.37e-04 8.66e-02 3.96e-04 3.72e-01 5.05e-02 4.27e-02 4.36e-01
1.32e-08 1.42e-05 3.45e-03 1.59e-08 2.88e-04 4.17e-03 9.90e-01 1.46e-03
![Page 40: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/40.jpg)
Proven technique in our Lab (III)
• Face Detection & Gaze Recognition
40
![Page 41: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/41.jpg)
• ADOS Data Speech Recognition Results
doctor: 不仅 是 什么 啊 哦 好 你 在 哪 上学 上 幼儿园 对 呀 之前 的时候 是 在 哪家 幼儿园 嗯 警备区 幼儿园 是不是 哇 那 你们 那 幼儿园听 起来 好像 很 厉害 的 样子 可以 介绍 一下 你们 那个 幼儿园 吧 就是说 你 那 幼儿园 是 什么 样子 的 嘛 嗯 嗯 嗯 嗯 真的 恩 恩 恩 恩 七月份 嗯 嗯 嗯 嗯 嗯 嗯 你 已经 报 了 名 的 是 吧 嗯 哼 怎么 说错 了
child: 我 不 认识 我 上 幼儿园 毕业 了 广州 警备区 会 接受 是 幼儿园怎么 介绍 我 困扰 我 的 幼儿园 我 一年 之前 有 一个 坡 上 坡 上 不能超过 五公里 我 超过 五公里 就 发 完 后 有 一个 保安 网 网上 他 保安一般 拿 电报 他 什么 拿 拿 了 个 手枪 呀 而且 而且 有 一次 有 一次我 是因为 有 有 一次 我 上 幼儿园
![Page 42: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/42.jpg)
Wearable microphone to record child’s
interaction with patents and therapists
• Description of data
Our audio database is collected from children who havebeen diagnosed as autism and stayed in hospital for a onemonth rehabilitation treatment.
An audio recording wrist-band is worn by each child at thedaytime to record speech data in real environment.
We randomly select 6 audio segments during the child-therapist interactions with a total length of 120 minutes asour diarization evaluation data. They are selected from 3children, two boys (K1, K3) and one girl (K2).
![Page 43: Multimodal behavior signal analysis and interpretation for young kids with ASD](https://reader031.vdocuments.net/reader031/viewer/2022021919/587587711a28ab901c8b5085/html5/thumbnails/43.jpg)
Methods
• Voice activity detection
We simply use an energy-based VAD method due to its efficiency.
• Speaker segmentation
We employ the LSP feature and BIC metric to detect speaker changes.
• Speaker clustering
→ Introducing discriminative features (pitch, energy, phoneme
duration).
→ Revised distance measure.
→ Agglomerative hierarchical clustering (AHC) plus early stop. System overview..
Tianyan Zhou, Weicheng Cai, Huadi Zheng, Luting Wang, Xiaoyan Chen,Xiaobing Zou, Shilei Zhang, Ming Li, "A Pitch, Energy and Phoneme Duration Features based Speaker Diarization System for Autism Kids’ Real-Life Audio Data", ISCSLP 2016