hpc at nibr - hpc advisory council€¦ · hpc at nibr nick holway scientific computing group, nibr...
TRANSCRIPT
HPC at NIBR
Nick Holway
Scientific Computing Group, NIBR
HPC Advisory Council, Lugano
Twitter: @nickholway LinkedIn: https://www.linkedin.com/in/nickholway/
April 2017
Novartis Institutes for
Biomedical Research
(NIBR)
HPC means we can get better, more targeted drugs to patients quicker
Public
Novartis Institutes for Biomedical Research
Today’s talk
1. Introducing Novartis
2. A very quick introduction to drug discovery
3. What our HPC looks like
4. Examples of how we use HPC to accelerate drug discovery
5. Outlook
Public
Introducing Novartis
Novartis Institutes for Biomedical Research
Novartis
Public
Innovative Medicines
Sandoz
Pharmaceuticals business unit
Oncologybusiness unit
Alcon
R&DR&D
Novartis Institutes for Biomedical Research
Drug discovery and early development
Public
~6,000Scientists /
7 sites globally
~90New Molecular Entities
~400Research projects
>500Ongoing
clinical trials>400 Computational Scientists
Novartis Institutes for Biomedical Research
NIBR
ATI
CVM
Other
ID
MSD
ONC
IO
OPH
RESP
NEURO
Public
Organised around prevalent Disease Areas
Note: Distribution of ~90 New Molecular Entities at NIBR
Immuno-Oncology
Oncology
Ophthalmology
Respiratory Diseases
Neuroscience
Autoimmunity, Transplantation & Inflammation
Cardiovascular & Metabolism
Infectious Diseases
Musculoskeletal
A very quick introduction to
drug discovery
Novartis Institutes for Biomedical Research
What’s a drug?
• ”A pharmaceutical drug is a drug used to diagnose, cure, treat, or prevent
disease” (https://en.wikipedia.org/wiki/Pharmaceutical_drug)
• Examples of some medicines and their “mechanisms”:
– Cancer: blocking cells dividing by disrupting DNA replication or the cells’ internal
skeletons
– Infectious diseases: disrupting bacterial cell walls
– Depression: Preventing serotonin re-uptake in nerve cells in the brain
Public
Novartis Institutes for Biomedical Research
The path to a new medicine
Public
Discovery Clinical trials EvaluationPost-
approval
Target
selection
Drug
research and
design
Preclinical
research
Proof of
Concept
5–15patients
Phase I
20–100healthy
volunteers
and/or
patients
Phase II
100–500patients
Phase III
1000–5000patients
Submission
Review by
regulatory
authority
Phase IV
Post-marketing
surveillance and
research
Manufacturing
Investigational New Drug (IND)
Application submitted
NDA/ BLA*
submittedApproval of
one new medicine
10 – 15 years
>10 000Compounds
<250Compounds
<5Compounds
*New Drug Application / Biologics Licence Application
What our HPC looks like
Novartis Institutes for Biomedical Research
HPC at NIBR - Hardware
• x86 servers– Intel Xeon CPUs– 128-768GB RAM– FDR Infiniband– 10GigE
• Specialised nodes– Nvidia GPUs– >=1TB RAM
• Isilon storage – CIFS/NFS– 10GigE to Arista switches
• Lustre– Scratch
Public
Novartis Institutes for Biomedical Research
HPC at NIBR - Software
• RHEL 6.x
• Univa Grid Engine for scheduling
• Software compilation & configuration
– Easybuild
– Modules
– GCC, Intel, Nvidia compilers
• Languages: C++, Fortran, CUDA, Python, R, Matlab
• Libraries: *MPI, MKL etc
• The software stack is identical on Linux desktops and “scientific servers”
Public
Novartis Institutes for Biomedical Research
HPC at NIBR - Humans
• HPC is provided by the Scientific Computing Group (SciComp)
• Global team (Europe, USA, Asia)
• Complementary backgrounds and skills
– Sysadmins
– Mathematicians
– Scientists
• HPCWire award winners in 2014
• Other teams in NIBR Informatics provide storage, Linux servers, etc.
Public
Novartis Institutes for Biomedical Research
HPC at NIBR: Community
• We’ve worked very hard to build an interdisciplinary group of informatics
scientists to share knowledge
• Various activities
– Fortnightly informal talks
– Social events
– Deep Learning “bootcamp”
– 24hr virtual multi-site workshop (Shanghai -> California!)
• This started out from the grassroots and has now been formally funded within
the Company
Public
Novartis Institutes for Biomedical Research
HPC elsewhere in Novartis
• Today’s talk covers Research; however HPC is used elsewhere in the
Company for
– Modelling Drug absorption, metabolism & secretion (PK/PD)
– Processing data from Clinical Trials
– Predicting where in the lungs inhaled drugs go (CFD)
• The cluster used for this work is much more tightly controlled and tested than
NIBR’s systems
Public
Examples of how we use HPC
to accelerate drug discovery
Novartis Institutes for Biomedical Research
Using HPC in early drug discovery
• There are many different ways NIBR scientists use HPC
– Molecular dynamics
– NGS analysis
– Ligand-protein docking
– Image analysis
– Cryo-EM analysis
• Our usage is similar to a university with biology and chemistry departments
• In today’s talk I’ll focus on using HPC to accelerate phenotypic assays
Public
Novartis Institutes for Biomedical Research
Phenotypic assays
• Traditionally our scientists have used biochemical assays in early stage drug
discovery
– Assays use an isolated enzyme or protein and measure fluorescence etc.
– This tells us very little about the cells and how they react
• Increasingly our scientists are using “phenotypic assays” using cells grown in a
lab
– Scientists can see the impact of their drug on an entire cell or population of
cells
Public
Novartis Institutes for Biomedical Research
Example: wound healing
Public
24 hrsImages from http://cellprofiler.org/examples/#Wound
Novartis Institutes for Biomedical Research
What is High Content Screening
(HCS)
• A method for identifying molecules which alter the phenotype of cells (eg cell
shape, number etc) or small organisms (eg Malaria parasites)
• Using robotics & automated microscopes a large number of potential drugs can
be ”screened” in a few hours or days
• Assays can generate a lot of data
– Videos
– Millions of images
– >600TB/yr for some HCS instruments
Public
Novartis Institutes for Biomedical Research
Accelerating MND/ALS disease
research with GPUs
Public
Novartis Institutes for Biomedical Research
In-vitro model for neuromuscular
junctions
• Faulty junctions between motor neurons and muscle cells are implicated in
MND/ALS
• We’d like to create a drug which corrects this
• Motor neurons & myotube (muscle fibre) cells were “co-cultured” in a “plate” to
which drug candidates are added
• Cells were imaged in real time to measure their contractility
• This is very hard to see by eye and also hard to segment using computers
Public
Novartis Institutes for Biomedical Research
What do the cells look like?
Public
Figure: I Hossain
Novartis Institutes for Biomedical Research
Motion estimated with Optic Flow
Public
Different contracting regions
Total area under contraction
Figure: I Hossain
Novartis Institutes for Biomedical Research
Impact of HPC
• A good joint project between bench scientists, lab automation experts &
informaticians
• 80x increase of throughput compared to CPU
• NIBR scientists have access to new method of monitoring myotube contractility
Public
Novartis Institutes for Biomedical Research
Deep learning for HCS image
analysis
Public
Novartis Institutes for Biomedical Research
CNNs for HCS image analysis
• HCS analysis is traditionally performed using tools such as CellProfiler, Fiji or
commercial tools
• Deep Learning approaches are becoming increasingly used for image analysis
• A team has investigated Convolutional Neural Networks for deriving images’
phenotypes
• They used only the images’ pixel intensity values with no a priori knowledge
• They used public and Novartis datasets
Public
Novartis Institutes for Biomedical Research
Outcome
• The images were classified better than conventional methods
• This is included tracking a response to drugs
• There was no need to design a unique pipeline for the processing
Public
Novartis Institutes for Biomedical Research
Interested in knowing more?
• This work has been published (including some code) in Godinez et al: “A multi-
scale convolutional neural network for phenotyping high-content cellular
images”, Bioinformatics btx069, https://doi.org/10.1093/bioinformatics/btx069
Public
Novartis Institutes for Biomedical Research
Pushing HPC to non-technical
scientists
Public
Novartis Institutes for Biomedical Research
Why bench scientists need HPC
(and don’t realise it!)
• Bench scientists generally do not know how to programme or use the Linux
command line
• Many scientists’ data has grown too big to be processed on a single
workstation
• This means they have to wait a long time for the data to be processed and also
they may need to wait for an informatician to become available
• If you can give a scientist the tools to analyse their data at scale then they get
their data sooner and enables the informaticians to focus on more complex
tasks
Public
Novartis Institutes for Biomedical Research
Pushing HCS analysis to bench
scientists
• Our scientists create pipelines using CellProfiler (http://cellprofiler.org/) using
the normal GUI on their laptops
• The Pipeline is then uploaded to a central server at each site
• The scientist can kick off a analysis run on the cluster using the same webpage
that they use to visualise their images
Public
Novartis Institutes for Biomedical Research
Screenshots of the GUI
Public
Novartis Institutes for Biomedical Research
Also (ab)using Jenkins
• Our scientists have automated cluster submission using the continuous integration tool, Jenkins, again with a web front end
• The work has been published: https://doi.org/10.1177/1087057116679993
• Source freely available at https://github.com/Novartis/Jenkins-LSCI
Public
Outlook
Novartis Institutes for Biomedical Research
HPC Trends
• GPUs / Intel Phi / FPGAs
– Deep learning
– Cryo-EM
• Real time collection & processing of data from clinical trials
• Integration of “big data” technologies such as Apache Spark into HPC
• HPC in the cloud
– Currently most useful for bursting or embarrassingly parallel jobs
Public
Thank you
Back up
HPC in the cloud
Novartis Institutes for Biomedical Research
HPC in the cloud
• NIBR have used Amazon EC2 for compute workloads
– Cycle computing
• ISVs eg DNANexus
– Bioinformatics NGS
Public
Novartis Institutes for Biomedical Research
Docking at scale in the cloud
• Ligand-protein docking is “to predict the position and orientation of a ligand (a
small molecule) when it is bound to a protein receptor or enzyme” (Wikipedia)
• Embarrassingly parallel - compute-heavy / data-light
• We used the cloud to screen 10 million molecules against a cancer target
Public
Novartis Institutes for Biomedical Research
How we did it
• Cycle computing’s software (Cycle server, Cyclecloud)
• Over 10,000 EC2 spot instances
– Extensive benchmarking to select instance type
• Licence files (licence servers cannot cope with the load)
• Proprietary compounds run in NIBR’s VPC, others in “public”
• See http://opensource.nibr.com/videos/aws-litster/ and
http://cyclecomputing.com/novartis-taps-cloud-hpc-for-faster-drug-discovery-
better-science/
Public
Novartis Institutes for Biomedical Research
Where we’re going in the cloud
• “Cloud by default” for many non-HPC applications
• Clinical data (subject to “informed consent”)
• HPC where appropriate
– IB etc for tightly-coupled parallel jobs usually unavailable
– Data locality challenging
Public