predicting thyroid disorder with deep neural networks

Post on 12-Apr-2017

77 Views

Category:

Data & Analytics

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Deep Neural Networks Using Demographic Data to Predict Thyroid Disorder Anaelia Ovalle, ShihYin Chen, Yasser Attiga, John LaGue

Table of Contents

I. Introduction to Project & Overall GoalII. Introduction to Neural Networks & TensorFlowIII.Methods Used and ResultsIV.Discussion V. Concluding Thoughts

Thyroid Receptor Beta in Complex With Inhibitor

Thyroid Disease

Hyper & Hypothyroidism

Thyroiditis

Thyroid Cancer

Goal: Using solely demographic data, identify a

subset of the population more likely to have a thyroid

disorder

Yasser Attiga
I hope you all don't mind this change - I think "contract" generally refers to contagious diseases

Machine Learning Algorithms

Neural NetworksA Neural Network or ‘Artificial’ Neural Network (ANN):

● Machine learning technique that attempts to mimic behaviors of human neurons.

● Unlike the more algorithmic approaches we have seen this semester.● In this way an ANN can extract patterns and uncover complex trends in

a data set. ● Implemented for both supervised and unsupervised learning problems.● Applications: from classifying fraudulent credit card transactions to

YouTube images of cats.● Or more importantly to health issues such as determining populations

at risk for a certain disease.● Yet, each ANN must be tailored to each specific problem set and, as

our group discovered, the necessary computation time and resources for an ANN can rapidly increase with complexity.

Basic ANNs have only one or two hidden layers of neurons

DNNs have 3 or more hidden layers

Basic ANN

Deep Neural Network

● DNN can be far more effective at extracting hidden trends in a discombobulated data set

● Yet, Increased complexity does not necessarily mean increased network performance

● Open source machine learning software library for Python

● Much of our project involved running numerous models to determine important and optimal parameters

● hidden_units and activation_fntf.contrib.learn.DNNClassifier.__init__(hidden_units, feature_columns, model_dir=None, n_classes=2, weight_column_name=None, optimizer=None, activation_fn=relu, dropout=None, gradient_clip_norm=None, enable_centered_bias=False, config=None,feature_engineering_fn=None, embedding_lr_multipliers=None)

TensorFlow from Google

Unbalanced DataStatus Number

No Thyroid Disease 701850

Thyroid Disease 45451

➢ Only 6.082% of the patients have the thyroid disease.

● Downsampling:○ Randomly drop instances from the over-represented class

● Upsampling with Bootstrap:○ Randomly add copies of instances from the under-represented class

● Upsampling with SMOTE (Synthetic Minority Over-sampling Technique):○ Create synthetic samples from the minor class instead of creating

copies.○ Depend upon the amount of over-sampling required, neighbors from

the k nearest neighbors are randomly chosen.○ Select two or more similar instances and perturbing an instance one

attribute at a time by a random amount within the difference to the neighboring instances.

Sampling Method for Imbalanced Data

Sample Code: SMOTE sampling

Sample Code: Dynamic Neural Network

Results● Sampling methods and activation functions:

○ 28 models● Include variety of hidden layers / nodes:

○ Limitless number of models

Sensitivity = TP / (TP + FN)Precision = TP / (TP + FP)

Precision vs. Sensitivity

● Explored thresholds of 0.4, 0.5, and 0.65

● As precision improved, sensitivity decreased

Becomes a game of tug-of-war.

Do we want to reach more people with thyroid disorders or be more precise with the people we do reach?

Introducing Lift!

Introducing Lift!

Introducing Lift!● Lift compares the effectiveness of using a model versus selecting targets

randomly● Can be expressed in two ways:

○ Lift ratio [p/P]○ Percent increment: [(p-P) / P] * 100%

■ p = positives in target samples from model■ P = prevalence in general population

Introducing Lift!● Lift compares the effectiveness of using a model versus selecting targets

randomly● Can be expressed in two ways:

○ Lift ratio [p/P]○ Percent increment: [(p-P) / P] * 100%

■ p = positives in target samples from model■ P = prevalence in general population

Results: Lift

** Original RF = random forest classifier using original, unbalanced data

ImplicationsRisk factors for various thyroid disorders

include:1. Family history2. Iodine intake deficiency3. Age4. Sex5. Type 1 diabetes status6. Radiation history

● Addressing unbalanced data was key

● Chose parameters to tweak such as activation functions and hidden nodes

● Analyzed different models for effectiveness

● Evaluated models based on precision, recall, and lift

● Demographic data can have some insight on detecting thyroid disorders in a population

Concluding Thoughts...

Thank You

top related