creating syndrome definitions using rstudio

29
Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International

Upload: stacia

Post on 06-Feb-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Creating Syndrome Definitions Using RStudio. Tim Hopper. Data Scientist RTI International. Code Is Available Online. https://gist.github.com/tdhopper/d5939aaf74886143224e/raw/3ae883a25ef078a5edd2fcced0f0268b34be3d6b/Custom+Syndromes. Setup. # Connect to TarrantCounty_FP database - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Creating Syndrome Definitions Using RStudio

Creating Syndrome Definitions Using RStudio

Tim HopperData Scientist

RTI International

Page 3: Creating Syndrome Definitions Using RStudio

Setup

# Connect to TarrantCounty_FP database

# Credentials

USERNAME <- 'username'PASSWORD <- 'password'HOSTNAME <- 'data3.biosen.se'DBNAME <- 'TarrantCounty_FP'TABLE <- 'TC_Meaningful_Use_Base'

# Create database connectioncon <- dbConnect(dbDriver('MySQL'), user=USERNAME, password=PASSWORD,

host=HOSTNAME, dbname=DBNAME)

Page 4: Creating Syndrome Definitions Using RStudio

Example: Co-morbid Syndrome

We want to see the co-occurrence of influenza (influenza-like illness) and asthma.  Data source: Texas region 2/3 Location: Tarrant County Time: February 1–October 31, 2013

Page 5: Creating Syndrome Definitions Using RStudio
Page 6: Creating Syndrome Definitions Using RStudio
Page 7: Creating Syndrome Definitions Using RStudio

Query for Asthma

SELECT Facility_City, Facility_State, Diagnosis_Code, Diagnosis_Text, Chief_Complaint, Age, Gender, Visit_Date_Time, Row_Number

FROM TC_Meaningful_Use_Base WHERE Visit_Date_Time BETWEEN '2013-02-01 00:00:00' AND '2013-10-31 23:59:59'

AND (Diagnosis_Code LIKE '%493%')

Page 8: Creating Syndrome Definitions Using RStudio

Query for Influenza-Like Illness

SELECT Facility_City, Facility_State, Diagnosis_Code, Diagnosis_Text, Chief_Complaint, Age, Gender, Visit_Date_Time, Row_Number

FROM TC_Meaningful_Use_Base WHERE Visit_Date_Time BETWEEN '2013-02-01 00:00:00' AND '2013-10-31 23:59:59'

AND (Diagnosis_Code LIKE '%487%' OR Diagnosis_Code LIKE '%488%' OR Diagnosis_Code LIKE '%V04.8%'OR Diagnosis_Code LIKE '%V0481%' OR Diagnosis_Code LIKE '%V06.6%'OR Diagnosis_Code LIKE '%V066%')

Page 9: Creating Syndrome Definitions Using RStudio

Run Query and Process Data

# Run Querydf.asthma <- dbGetQuery(con, query.asthma)df.ili <- dbGetQuery(con, query.ili)

# Add column naming each as a syndromedf.asthma$Syndrome <- 'ASTHMA'df.ili$Syndrome <- 'ILI'

# Combine these two data sets into one data.framedf <- rbind(df.asthma, df.ili)

# Format dates and add date column (without time)df$Visit_Date_Time <- ymd_hms(df$Visit_Date_Time)df$Visit_Date <- as.Date(df$Visit_Date_Time)

Page 10: Creating Syndrome Definitions Using RStudio

Create Summary Data Set

events.per.day.split <- ddply(df, .(Visit_Date, Syndrome), summarize, Number_of_Visits=length(Visit_Date))

############################################# Visit_Date Syndrome Number_of_Visits# 1 2013-02-01 ASTHMA 49# 2 2013-02-01 ASTHMA & ILI 2# 3 2013-02-01 ILI 5# 4 2013-02-02 ASTHMA 60# 5 2013-02-02 ILI 21# 6 2013-02-03 ASTHMA 89############################################

Page 11: Creating Syndrome Definitions Using RStudio

Visits Per Day by Syndrome

ggplot(events.per.day.split) + aes(Visit_Date, Number_of_Visits, color=Syndrome) + geom_line()

Page 12: Creating Syndrome Definitions Using RStudio

Create Summary Data Set

events.per.day <- ddply(df, .(Visit_Date), summarize, Number_of_Visits=length(Visit_Date))

################################ Visit_Date Number_of_Visits# 1 2013-02-01 513# 2 2013-02-02 396# 3 2013-02-03 428# 4 2013-02-04 409# 5 2013-02-05 580# 6 2013-02-06 391###############################

Page 13: Creating Syndrome Definitions Using RStudio

Visits Per Day by Syndrome

ggplot(events.per.day) + aes(Visit_Date, Number_of_Visits) + geom_line()

Page 14: Creating Syndrome Definitions Using RStudio

Example: New Syndrome

We want to see create a new syndrome to identify visits during which the patient had cough AND dizziness AND headache.  Data source: Texas region 2/3 Location: Tarrant County Time: February 1–October 31, 2013

Page 15: Creating Syndrome Definitions Using RStudio

Query

SELECT Facility_City, Facility_State, Diagnosis_Code, Diagnosis_Text,

Chief_Complaint, Age, Gender, Visit_Date_Time, Row_Number

FROM TC_Meaningful_Use_Base WHERE Visit_Date_Time BETWEEN '2013-02-01 00:00:00'

AND '2013-10-31 23:59:59'AND (Diagnosis_Code LIKE '%786.2%' OR

Diagnosis_Code LIKE '%7862%') AND (Diagnosis_Code LIKE '%780.4%' OR

Diagnosis_Code LIKE '%7804%') AND (Diagnosis_Code LIKE '%784.0%' OR

Diagnosis_Code LIKE '%7840%');

Page 16: Creating Syndrome Definitions Using RStudio

Run Query

# Run Querydf.sick <- dbGetQuery(con, query)

# Fix dates using lubridatedf.sick$Visit_Date_Time <- ymd_hms(df.sick$Visit_Date_Time)

# Create a month columndf.sick$Month <- month(df.sick$Visit_Date, label=T)

Page 17: Creating Syndrome Definitions Using RStudio

Run Query

ggplot(df.sick) + aes(Month) + geom_histogram()

Page 18: Creating Syndrome Definitions Using RStudio

Create Line Listing

write.csv(df.sick, 'sick.csv', quote=F, row.names=F)

# sick.csv:## Row_Number,Facility_City,Facility_State,Diagnosis_Code,D...# 1374852,Houston,TX,473.9:780.4:300.00:786.2:784.0:305.1:...# 1536525,Houston,TX,486:786.2:780.4:784.0:794.00:789.00:7...# 2100347,Rowlett,TX,780.4:784.0:786.2,NA,SCREENING - HA -...# 2189305,Rowlett,TX,780.4:784.0:786.2:V76.12,NA,SCREENING...# 3108090,Rowlett,TX,780.4:784.0:786.2:V76.12,NA,SCREENING...# 5887191,Rowlett,TX,786.2:780.1:780.4:784.0,NA,786.2:SEP:...# 7968958,Houston,TX,493.90:780.4:780.60:784.0:786.2:787.0...# 9197758,Houston,TX,493.90:780.4:780.60:784.0:786.2:787.0...

Page 19: Creating Syndrome Definitions Using RStudio

Example: Refined Age Groups

We want to see motor vehicle traffic accidents involving young people. We recombine the ages to the following groups: 0–15, 16–20, 21–25, 26–30, and 31–35 years. Data source: Texas region 2/3 Location: Tarrant County Time: February 1–October 31, 2013

Page 20: Creating Syndrome Definitions Using RStudio
Page 21: Creating Syndrome Definitions Using RStudio
Page 22: Creating Syndrome Definitions Using RStudio

Query

SELECT Facility_City, Facility_State, Diagnosis_Code, Diagnosis_Text,

Chief_Complaint, Age, Gender, Visit_Date_Time, Row_Number

FROM TC_Meaningful_Use_Base WHERE Visit_Date_Time BETWEEN '2013-02-01 00:00:00'

AND '2013-10-31 23:59:59' AND (Diagnosis_Code LIKE '%E81_%')AND Age <= 35;

Page 23: Creating Syndrome Definitions Using RStudio

Run Query

# Run Querydf.auto <- dbGetQuery(con, query)

# Fix dates using lubridatedf.auto$Visit_Date_Time <- ymd_hms(df.auto$Visit_Date_Time)

# Create a date columndf.auto$Visit_Date <- as.Date(df.auto$Visit_Date_Time)

Page 24: Creating Syndrome Definitions Using RStudio

Bin Ages

# Drop all rows where age is greater than 35 years or is undefined

df.auto <- df.auto[!is.na(df.auto$Age),]df.auto <- df.auto[df.auto$Age <= 35,]

# Bin ages

df.auto$Age_binned <- cut(df.auto$Age, breaks=c(0, 15, 20, 25, 30, 35), include.lowest=T)

Page 25: Creating Syndrome Definitions Using RStudio

Histogram of Visits by Age Group

ggplot(df.auto) + aes(Age_binned) + geom_histogram()

Page 26: Creating Syndrome Definitions Using RStudio

Create Summary Data Set

df.auto.daily.counts <- ddply(df.auto, .(Visit_Date, Age_binned), summarize, count=length(Chief_Complaint))

################################# Visit_Date Age_binned count# 1 2013-02-01 [0,15] 3# 2 2013-02-01 (15,20] 25# 3 2013-02-01 (20,25] 16# 4 2013-02-01 (25,30] 16# 5 2013-02-02 [0,15] 13# 6 2013-02-02 (15,20] 6###############################

Page 27: Creating Syndrome Definitions Using RStudio

Visits per Week by Ageggplot(df.auto.daily.counts)

+ aes(x = Visit_Date, y = count, color=Age_binned) + geom_line(size=2, alpha=.7)

Page 28: Creating Syndrome Definitions Using RStudio

Visits per Week by Age

ggplot(df.auto.daily.counts) + aes(x = Visit_Date, y = count, color=Age_binned) + geom_smooth(size=3, alpha=.7)

Page 29: Creating Syndrome Definitions Using RStudio

Questions?