scaling up image annotation for deep learning: … · scaling up image annotation for deep...
TRANSCRIPT
![Page 1: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/1.jpg)
SCALING UP IMAGE ANNOTATION FORDEEP LEARNING: STANDARDS, LABELSFROM TEXT, AND LEVERAGING MULTI-
INSTITUTIONAL DATA
Daniel L. Rubin, MD, MS
Professor of Biomedical Data Science, Radiology, Medicine (Biomedical Informatics), and
Ophthalmology (by courtesy)Stanford University
![Page 2: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/2.jpg)
AcknowledgementsStudents, Post-docs, Residents, Staff, and Collaborators
– Bao Do
– Selen Bozkurt
– Assaf Hoogi
Funding Support– NCI QIN grants
U01CA142555,1U01CA190214, 1U01CA187947
– Stanford-AstraZeneca Collaboration Grant– NVIDIA Academic Hardware Grant Program– Stanford Philips and GE BlueSky
– Alfiia Galimzianova
– Imon Banerjee
– Christopher Re
– Sandy Napel
– Chris Beaulieu– Darvin Yi
– Xuerong Xiao
– Carson Lam
– Blaine Rister
– Hersh Sagreiya
– Emel Alkim
– Ann Leung
– Matthew Lungren
– Jared Dunnmon
– David Conn
– Mete Akdogan
– Niranjan Balachandar
– Curt Langlotz
– Ted Leng
– Joelle Hallak
– Luis de Sisternes
– Zaid Nabulsi
– Michael Gensheimer
![Page 3: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/3.jpg)
Challenges to scaling up image annotation for deep learning Varying data/file formats for saving image
annotations Difficulty leveraging free text radiology
reports as a source for labels for images Hurdles to sharing data across institutions
to build more robust AI models
![Page 4: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/4.jpg)
Challenges to scaling up image annotation for deep learning Varying data/file formats for saving image
annotations Difficulty leveraging free text radiology
reports as a source for labels for images Hurdles to sharing data across institutions
to build more robust AI models
![Page 5: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/5.jpg)
Detection,Segmentation
Classification,Diagnosis
Image annotations are crucial for AI
ROI1
ROI2
![Page 6: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/6.jpg)
Varying file formats for image annotations Regions of interest
(ROIs) and image labels◦ DICOM-PS◦ Burned-in image◦ Proprietary formats
Clinical labels (diagnoses, findings, patient outcomes◦ EMR◦ Spreadsheets◦ Delimited files◦ Proprietary formats
![Page 7: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/7.jpg)
Vendor 4
Lack of image annotation standards thwarts interoperability
Vendor 1 Vendor 3
Vendor 2 3D Slicer
Copyright © Daniel Rubin 2015
![Page 8: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/8.jpg)
Annotation and Image Markup (AIM) XML schema to make the information that
humans and machines see in images machine-accessible in standard format
Enables interoperability of this information across systems and computer applications
Developed by National Cancer Imaging Program at NCI
Harmonized/incorporated into DICOM-SRRubin DL, et. al: Medical Imaging on the Semantic Web: Annotation and Image Markup, AAAI 2008.https://wiki.nci.nih.gov/display/AIM/Annotation+and+Image+Markup+-+AIM
Copyright © Daniel Rubin 2018
![Page 9: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/9.jpg)
AIM captures annotations in XML
Copyright © Daniel Rubin 2017
QUALITATIVE
QUANTITATIVE
![Page 10: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/10.jpg)
Anatomic Entity: Upper lobe of left lung (RID1327)Observation: Mass (RID3874)
Characteristic: Microlobulated margin (RID5712)Geometric Shape: Polyline
2D coordinates: {(x,y), (x,y)….}Calculation: Largest diameter result: 2.8 cmDiagnosis: Lung cancer
DICOM SR (TID 1500)
XML
HL7 CDA/FHIR
AIM annotations interoperate with other standards
Copyright © Daniel Rubin 2017
https://github.com/NCIP/annotation-and-image-markup/tree/master/AIMToolkit_v3.0.2_rv11/examples/ANIVATR
![Page 11: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/11.jpg)
eLectronic Physician Annotation Device ePAD: free, open source Web-based image viewer and annotator AIM-compliant annotation; supports AIM templates Plugins for quantifying lesion features
Template
ROI
Values
Rubin, Willrett, O'Connor, Hage, Kurtz, Moreira, Translational Oncology 7(1):23-35, 2014http://epad.stanford.edu
Quantitative image features
Annotations linked to images
Qualitative image features
![Page 12: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/12.jpg)
AIM being used for public sharing of image annotations The Cancer Genome Atlas (TCGA) imaging
projects◦ Brain cancer◦ Breast cancer◦ Bladder Cancer
The Cancer Imaging Archive (TCIA) Quantitative Imaging Network (QIN) of NCI
![Page 13: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/13.jpg)
Challenges to scaling up image annotation for deep learning Varying data/file formats for saving image
annotations Difficulty leveraging free text radiology
reports as a source for labels for images Hurdles to sharing data across institutions
to build more robust AI models
![Page 14: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/14.jpg)
Copyright © Stanford University 2018
Motivating challenges for needing to use free text reports• Scarcity of annotated images -
need millions of images to train a complex neural network
• Annotation is a laborious, time consuming and expensive
• Radiology reports are associated with routine clinical images that could be leveraged
![Page 15: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/15.jpg)
Radiological image annotation: leveraging clinical notes• PACS contains millions of images “labeled” in the form of
unstructured notes.• Why not to use the notes for annotating the images?
• Unstructured free text cannot be directly interpreted by a machine due to the ambiguity and subtlety of natural language.
• How to extract the semantic information from the clinical notes?
Radiologist’s noteCT image
Copyright © Stanford University 2018
![Page 16: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/16.jpg)
Word embeddings to identify annotation labels from narrative text
Unsupervised deep learning algorithms (e.g., word2vec) can learn a feature representation from texts without the need of supplying specific domain knowledge
Word embedding using deep learning (4,442 words) projected in two dimensions
Imon Banerjee, JDI 30:506-518, 2017
![Page 17: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/17.jpg)
Ontocrawler: Generating domain dictionaries for annotation tasks Created an ontology crawler using SPARQL that
grabs the sub-classes and synonyms of the domain-specific terms from NCBO bio-portal.
Generate a focused dictionary for each domain of radiology.
• {‘apoplexy’, ‘contusion’, ‘hematoma’, ...} ‘hemorrhage’
Copyright © Stanford University 2018
![Page 18: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/18.jpg)
Intelligent word embedding pipeline
Copyright © Stanford University 2018
![Page 19: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/19.jpg)
Word embedding + classification model Stores each word in as a point in vector space Unsupervised, built just by reading huge corpus Can be used as features to train a supervised model with a
small subset of annotations Reusable/extensible to many text extraction use cases
Word embedding
CorpusDocument embedding Classifier
Positive
Negative
Document classificationMikolov, Distributed representations of words and phrases and their compositionality
Copyright © Stanford University 2018 Imon Banerjee, In preparation
![Page 20: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/20.jpg)
Example 1: Head CT Task: Label intracranial hemorrhage based on radiology
report Dataset: ◦ 10,000 CT reports from Stanford◦ ~900 CT reports from UPMC
Gold-standard annotation:◦ Subset of 1,188 of reports labeled independently by two
radiologists (agreement ~0.98 kappa score) Classification labels:◦ No intracranial hemorrhage◦ Diagnosis of intracranial hemorrhage unlikely, though cannot be
completely excluded◦ Diagnosis of intracranial hemorrhage possible◦ Diagnosis of intracranial hemorrhage probable, but not definitive◦ Definite intracranial hemorrhage
Copyright © Stanford University 2018Banerjee, Imon, Sriraman Madhavan, Roger Eric Goldman, and Daniel L. Rubin, AMIA Annual Symposium Proceedings, vol. 2017, p. 411. American Medical Informatics Association, 2017.
![Page 21: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/21.jpg)
Comparative performance1. Out-of-box word2vec – without semantic
mapping2. Proposed model - with semantic mapping
21
Out-of-box word2vec Proposed model
Classifier Precision Recall F1-score Precision Recall F1-score
Random Forest 87.59% 89.17% 87.78% 88.64% 90.42% 89.08%
KNN (n = 10) 86.73% 88.90% 87.47% 88.60% 89.91% 88.88%
KNN (n = 5) 87.52% 88.65% 87.74% 88.54% 89.62% 88.76%
SVM (Radial kernel) 63.98% 79.96% 71.07% 64.19% 80.09% 71.25%
SVM (Polynomial kernel) 62.40% 78.97% 69.70% 63.25% 79.49% 70.43%
Copyright © Stanford University 2018
![Page 22: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/22.jpg)
Example 2: Chest CT Task: Label pulmonary embolism based on
radiology report Dataset: ◦ 100k+ de-identified chest CT reports (Stanford and
UPMC) Baseline comparison:◦ Compare to published state-of-the-art rule-based
method for PE extraction (PeFinder) Classification labels:◦ PE acute (positive)◦ PE present (positive)◦ PE subsegmental only (negative)
Copyright © Stanford University 2018
Banerjee, Imon, Matthew C. Chen, Matthew P. Lungren, and Daniel L. Rubin. "Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort." Journal of biomedical informatics 77 (2018): 11-20.
![Page 23: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/23.jpg)
ROC curve measures
Stanford dataset UPMC dataset
Copyright © Stanford University 2018
![Page 24: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/24.jpg)
Example 3: Mammography Task: Label BI-RADS final assessment
category based on findings of radiology report
Dataset: ◦ 300K mammography reports
Baseline comparison:◦ Published rule-based information extraction
method (J Biomed Inform 62:224-31, 2016) Classification labels:◦ BI-RADS Class 0 - 6
Copyright © Stanford University 2018
![Page 25: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/25.jpg)
Results: Comparison with a Rule-based method
*Rule-based system: J Biomed Inform. 62:224-31, 2016
Copyright © Stanford University 2018
![Page 26: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/26.jpg)
Challenges to scaling up image annotation for deep learning Varying data/file formats for saving image
annotations Difficulty leveraging free text radiology
reports as a source for labels for images Hurdles to sharing data across institutions
to build more robust AI models
![Page 27: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/27.jpg)
Centralized approach to AI model development
AI Model
Legal issuesIntellectual Property
Copyright © Stanford University 2018
![Page 28: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/28.jpg)
P(Data|coefficients);Update parameters
P(Data|coefficients);Update parameters
P(Data|coefficients);Update parameters
Big Data aggregation without data sharing
Initiating site
Site 1
No data sharing required
Site 2
Site 3
Fit model with input parameters; return coefficientsIterate…
Courtesy Phil LavoriCopyright © Stanford University 2018
![Page 29: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/29.jpg)
A B
C D
Centrally hosted
J Am Med Inform Assoc 25(8):945-954, 2018
Ensemble single institution
Alternative models for training distributed deep learning models
Single weight transfer Cyclical weight transfer
![Page 30: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/30.jpg)
Centrally hosted dataN = 6000 patients
A B
Cyclical weight transfer has similar performance to centrally-hosted training
Random classification
Accuracy increases with number of collaborating institutions
Results based on having 4 institutions
J Am Med Inform Assoc 25(8):945-954, 2018
![Page 31: SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: … · SCALING UP IMAGE ANNOTATION FOR DEEP LEARNING: STANDARDS, LABELS FROM TEXT, AND LEVERAGING MULTI- INSTITUTIONAL DATA Daniel L](https://reader030.vdocuments.net/reader030/viewer/2022013021/5ed4947ac9dea30c537ab2ef/html5/thumbnails/31.jpg)
SummaryThree challenges to scaling up image annotation for deep learning◦ Varying data/file formats for saving image
annotations Image annotation standards (AIM) and tools (ePAD)
◦ Difficulty leveraging free text radiology reports as a source for labels for images Word embeddings and classification models for
information extraction◦ Hurdles to sharing data across institutions to
build more robust AI models Distributed computation of deep learning models