a statistician walks into a tech company: r at a rapidly scaling healthcare startup
TRANSCRIPT
A Statistician Walks into a Tech CompanyR at a rapidly scaling healthcare technology startup
Sandy GriffithTwitter: @[email protected]
My story
Academic biostatistics
© 2016 Flatiron Health, Inc. Proprietary and confidential.
My story
3
Academic biostatistics Healthcare tech
© 2016 Flatiron Health, Inc. Proprietary and confidential. 4
Flatiron’s mission is to serve cancer patients and our partners by dramatically improving treatment and accelerating research.
Our Mission
Flatiron Processes EHR Data At Scale
© 2016 Flatiron Health, Inc. Proprietary and confidential. 5
Research-Grade Data
Demographics
Diagnosis
Visits
Labs
e-Prescribing
Pathology Report
Discharge Notes
Radiology Report
Physician Notes
Electronic Health Record
Structured Data Unstructured Data Outside Practice
Hospital
Lab
Structured Data Processing
Unstructured Data
Processing
Standard EHR Data
Rapidly Scaling
January 2015Flatiron: ~140Software Engineers: ~50Quantitative Sciences team: 1
6© 2016 Flatiron Health, Inc. Proprietary and confidential.
Now: We are a team of 262
7
We include…
All Flatiron data and tools are collaboratively built, implemented and maintained by a cross-disciplinary team that includes oncology, engineering, and quantitative sciences
We come from…9 Medical oncologists and nurses
70 Software engineers
10 Quantitative scientists
5 Medical informaticists
+ more!
© 2016 Flatiron Health, Inc. Proprietary and confidential.
Primary Language: time of hire
© 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Proprietary and confidential.
Proficiency with R: time of hire
9© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
10© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
11© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package2. User group3. Slack channel4. Trainings5. Hiring
12© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package2. User group3. Slack channel4. Trainings5. Hiring
13© 2016 Flatiron Health, Inc. Proprietary and confidential.
Proficiency with R
14© 2016 Flatiron Health, Inc. Proprietary and confidential.
Time of hire Now
Now we have R users, but when should we use R?
Three scenarios:1. R for prototyping → !R in production2. R as a long-term solution3. R and !R in parallel
15© 2016 Flatiron Health, Inc. Proprietary and confidential.
R for prototyping → !R in production
16© 2016 Flatiron Health, Inc. Proprietary and confidential.
Prototype
● One-time linkage● Small cohort (10s of thousands)● RecordLinkage R package● Probabilistic linkage method using
EM algorithm
Production
● Repeated daily at scale ● Large cohort (~5 million patients)● Code maintained by different team● Deterministic logic in SQL
Example: Linking external mortality data
R for prototyping → !R in production
Why this made sense:● Stable method -- No longer needed rapid iteration ● Tuning parameters ● Similar performance, more transparency● No R users on team that would be maintaining code
17© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Linking external mortality data
R as a long-term solution
Early version (Jan 2015)
18© 2016 Flatiron Health, Inc. Proprietary and confidential.
● bash commands for extracting data run from R script using ETL tool
● R script run via command line● parameters in metafiles manually
updated● Runs a series of Rmd files and
renders HTML output
Current Version (April 2016)
Example: Rmarkdown QA report
● linked to data pipeline maintained by software engineering
● metafile generated dynamically ● Plotly survival curves● Flatly bootstrap theme● Plan to continue using R
indefinitely
R as a long-term solution
19© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Rmarkdown QA report
Why this made sense:
● Mature product and team● Quantitative science members remain embedded in team● Strong support and collaboration with software engineering● Requirements are dynamic -- continued need for rapid
prototyping
R and !R in parallel
● Specific research questions● 2 people code independently in Python/SQL and R● Compare results● Language sometimes incidental, more about 2 different perspectives
Why this made sense:● High stakes or low error tolerance● Complicated concepts● Custom projects often involve novel problems
20© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Some external collaborations
Thank you
● Melissa Curtis● Josh Kraut● Kathi Seidl-Rathkopf● Cindy Revol● Rachael Sorg● Jay Rughani
21© 2016 Flatiron Health, Inc. Proprietary and confidential.
● Paul You● Aracelis Torres● Alphan Kirayoglu● Ben Birnbaum● Ann Jaskiw● James Gippetti
Join our Team!Drop me a note at [email protected], @sgrifter,
or visit flatiron.com/careers