bethany percha - machine learning approaches poster

1
Machine Learning Approaches to Automatic BI-RADS Classification of Mammography Reports Bethany Percha and Daniel Rubin Program in Biomedical Informatics and Department of Radiology, Stanford University Introduction Clinical information is often recorded as narrative (unstructured) text. This is problematic for both researchers and clinicians, as free text thwarts attempts to standardize language and ensure document completeness. Natural language processing could be used to extract relevant information from unstructured text reports, but reports must be both complete and consistent. A feedback system which extracts relevant information from text as it is being generated and prompts the physician to modify the report as needed would be useful, both in physician training and clinical practice. Here we demonstrate our preliminary results in building an automatic classification system to automatically assign BI-RADS assessment codes to mammography reports. 0 Incomplete 1 Negative 2 Benign finding(s) 3 Probably benign 4 Suspicious abnormality 5 Highly suggestive of malignancy 6 Known biopsy - proven malignancy Preprocessing 41,142 reports extracted from Stanford’s radTF database 38,665 were diagnostic mammograms (not specimen analyses or descriptions of biopsy procedures) 22,109 had BI-RADS codes (older reports frequently don’t have them) and were unilateral (single-breast) mammography reports Each remaining report was processed as follows: After preprocessing, the reports were converted into feature vectors, where each feature was the number of times a given word stem appeared in a report. There were 2,216 unique stems. Feature Ranking The most informative features were chosen using chi-squared attribute evaluation. The most informative stems were: Stem Most Common Context Occurrences per report by class 0 1 2 3 4 5 6 breast (Many contexts.) 4.2 1.9 3.8 4.6 5.7 6.9 7.7 featur no mammographic features of malignancy 0.1 1.1 1.2 0.1 0.1 0.1 0.1 nippl x cm from the nipple (Describing a mass.) 1.2 0.1 0.2 0.8 2.3 4.6 2.8 malign no mammographic features of malignancy 0.1 1.1 1.2 0.1 0.1 0.3 0.3 evalu incompletely evaluated 1.0 0.0 0.0 0.1 0.1 0.1 0.2 incomplet incompletely evaluated 0.9 0.0 0.0 0.0 0.0 0.0 0.0 mammograph no mammographic features of malignancy 0.3 1.5 1.8 0.7 0.9 1.7 1.2 stabl stable post-biopsy change 0.2 0.3 1.5 0.7 0.4 0.2 0.5 calcif calcifications 0.6 0.1 0.7 1.3 1.5 1.9 2.0 Classification Technique % Accuracy Naive Bayes 76.4 Multinomial Naive Bayes 83.1 K-Nearest Neighbors (K=10) 87.5 Support Vector Machines LIBLINEAR (L2-norm, one-against-one) 89.3 LIBLINEAR (Multiclass Cramer) 89.3 LIBLINEAR-POLY2 (polynomial kernel, degree 2) 90.1 Accuracy was determined using 10-fold cross-validation. Misclassification error did not decrease significantly with more training data (high bias). Including more features, such as bigrams, did not improve performance. The final confusion matrix (units are %) was: Class Classified As. . . 0 1 2 3 4 5 6 0 93.7 2.3 3.1 0.1 0.8 0.0 0.0 1 0.4 93.6 5.9 0.1 0.0 0.0 0.0 2 0.9 11.1 87.1 0.1 0.6 0.0 0.1 3 7.1 21.1 49.1 9.7 12.6 0.0 0.3 4 8.5 3.7 10.6 0.6 75.9 0.0 0.7 5 0.0 0.0 0.0 0.0 100.0 0.0 0.0 6 4.9 4.9 24.6 0.8 27.9 0.0 36.9 Conclusions Radiologists’ word choices are a good indicator of which BI-RADS class they choose, but the correspondence is not perfect, particularly for the higher BI-RADS values. The development of training software for radiologists based on this approach could help them standardize their descriptions of images, and learn to better describe which specific features of the image cause them to place it in a given class.

Upload: amia

Post on 14-Oct-2014

197 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Bethany Percha - Machine Learning Approaches Poster

Machine Learning Approaches to Automatic BI-RADS Classification of Mammography ReportsBethany Percha and Daniel Rubin

Program in Biomedical Informatics and Department of Radiology, Stanford University

Introduction

Clinical information is often recorded asnarrative (unstructured) text.

This is problematic for both researchers andclinicians, as free text thwarts attempts tostandardize language and ensure documentcompleteness.

Natural language processing could be usedto extract relevant information fromunstructured text reports, but reports mustbe both complete and consistent.

A feedback system which extracts relevantinformation from text as it is being generatedand prompts the physician to modify thereport as needed would be useful, both inphysician training and clinical practice.

Here we demonstrate our preliminary resultsin building an automatic classificationsystem to automatically assign BI-RADSassessment codes to mammographyreports.

0 Incomplete

1 Negative

2 Benign finding(s)

3 Probably benign

4 Suspiciousabnormality

5 Highly suggestive ofmalignancy

6 Known biopsy -proven malignancy

Preprocessing

41,142 reports extracted from Stanford’s radTF database

38,665 were diagnostic mammograms (not specimen analyses or descriptions of biopsyprocedures)

22,109 had BI-RADS codes (older reports frequently don’t have them) and were unilateral(single-breast) mammography reports

Each remaining report was processed as follows:

After preprocessing, the reports were converted into feature vectors, where each feature was thenumber of times a given word stem appeared in a report. There were 2,216 unique stems.

Feature Ranking

The most informative features were chosen using chi-squared attribute evaluation. Themost informative stems were:

Stem Most Common Context Occurrences per report by class0 1 2 3 4 5 6

breast (Many contexts.) 4.2 1.9 3.8 4.6 5.7 6.9 7.7

featur no mammographic features of malignancy 0.1 1.1 1.2 0.1 0.1 0.1 0.1nippl x cm from the nipple (Describing a mass.) 1.2 0.1 0.2 0.8 2.3 4.6 2.8malign no mammographic features of malignancy 0.1 1.1 1.2 0.1 0.1 0.3 0.3evalu incompletely evaluated 1.0 0.0 0.0 0.1 0.1 0.1 0.2incomplet incompletely evaluated 0.9 0.0 0.0 0.0 0.0 0.0 0.0mammograph no mammographic features of malignancy 0.3 1.5 1.8 0.7 0.9 1.7 1.2stabl stable post-biopsy change 0.2 0.3 1.5 0.7 0.4 0.2 0.5calcif calcifications 0.6 0.1 0.7 1.3 1.5 1.9 2.0

Classification

Technique % AccuracyNaive Bayes 76.4Multinomial Naive Bayes 83.1K-Nearest Neighbors (K=10) 87.5Support Vector Machines

LIBLINEAR (L2-norm, one-against-one) 89.3LIBLINEAR (Multiclass Cramer) 89.3LIBLINEAR-POLY2 (polynomial kernel, degree 2) 90.1

Accuracy was determined using 10-fold cross-validation.

Misclassification error did not decrease significantly with more trainingdata (high bias). Including more features, such as bigrams, did notimprove performance.

The final confusion matrix (units are %) was:

Class Classified As. . .0 1 2 3 4 5 6

0 93.7 2.3 3.1 0.1 0.8 0.0 0.01 0.4 93.6 5.9 0.1 0.0 0.0 0.02 0.9 11.1 87.1 0.1 0.6 0.0 0.13 7.1 21.1 49.1 9.7 12.6 0.0 0.34 8.5 3.7 10.6 0.6 75.9 0.0 0.75 0.0 0.0 0.0 0.0 100.0 0.0 0.06 4.9 4.9 24.6 0.8 27.9 0.0 36.9

Conclusions

Radiologists’ word choices are a good indicator of which BI-RADS classthey choose, but the correspondence is not perfect, particularly for thehigher BI-RADS values.

The development of training software for radiologists based on thisapproach could help them standardize their descriptions of images, andlearn to better describe which specific features of the image cause themto place it in a given class.