(c) Jason H. Moore 1
Accessible Artificial Intelligence
for Data Science
Jason H. Moore, PhD, FACMI
Edward Rose Professor of Informatics
Director, Institute for Biomedical Informatics
Senior Associate Dean for Informatics
Perelman School of Medicine
University of Pennsylvania
Philadelphia, PA, USA
@moorejh epistasis.org [email protected]
Atari and the University of UtahDEC PDP-1 ~ 1960 Spacewar! by Steve Russel (MIT) ~ 1962
Pong
Nolan Bushnell (Atari)
~ 1972
(c) Jason H. Moore 2
• Bioinformatics
• Computational Biology
Basic Science
• Clinical Informatics
• Clinical Research Informatics
• Consumer Health InformaticsClinical
• Public Health InformaticsPopulation
Biomedical Informatics
American Medical Informatics Association (AMIA)
Golden Era of Biomedical InformaticsMoore and Holmes, BioData Mining (2016)
Why?
• Big data
• High-performance computing
• Talented trainees
• Government recognition
• Industry recognition
• Patient recognition
• University investment
(c) Jason H. Moore 3
Golden Era of Biomedical Informatics
What next?
• Artificial intelligence
• Biomedical devices
• Data integration
• Data science
• Informatician scientists
• No-boundary thinking
• Visual analytics
Artificial Intelligence
History
(c) Jason H. Moore 4
Artificial Intelligence in Medicine
1960 1970 1980 1990 2000 2010 2020
Artificial Intelligence
• Computers that plan, solve problems, and reason
• 1950s – Alan Turing – “Can machines think?”
Turing Test
(c) Jason H. Moore 5
Artificial Intelligence
• Top-down AI: Build a machine that mimics the
mind
• Bottom-up AI: Neural networks, cellular
systems as building blocks for intelligent
machines
• AI coined at Dartmouth College in 1956
• "Every aspect of learning or any other feature
of intelligence can be so precisely described
that a machine can be made to simulate it."
(c) Jason H. Moore 6
Shortliffe’s MYCIN – 1970s
Patient Data Rules
Consultation
System
Explanation
System
Knowledge
Acquisition
(c) Jason H. Moore 7
Shortliffe’s MYCIN
Never used in clinical practice
• Ethical and legal issues
• Standalone system
• Required physician entry
(>30 mins)
Shortliffe’s MYCIN Book (1976)
(c) Jason H. Moore 8
Artificial Intelligence
Today
IBM Watson – 2010s
• Natural language processing
• Information retrieval
• Knowledge representation
• Automated reasoning
• Machine learning
• 200M pages – 4TB
• 2800 core threads
• 16TB RAM
• No internet
(c) Jason H. Moore 9
2010
IBM Watson – 2010s
(c) Jason H. Moore 10
(c) Jason H. Moore 11
Artificial Intelligence
For Data Analytics
(c) Jason H. Moore 12
Data Analytics Pipeline
D DI FS FC ML I A
*-
+
V
Big Data
D
http://www.kdnuggets.com/
(c) Jason H. Moore 13
Data Integration
DI Relational Database Graph Database
Michael Hunger – Neo4j
Feature Selection
FS
Ritchie – PLoS Genetics (2013)
Sohangir – J Soft Engin App (2013)
(c) Jason H. Moore 14
Feature Construction
FC
1 2 3
4 5 6
7 8 9
1 2 3 4 5 6 7 8 9
0 1 2
0
1
2
X1
X2
Z1
0 0 0
0 1 1
0 1 1
0 1 2
0
1
2
X3
X4 0 1
Z2 1 0 1
0 1 0
1 0 1
0 1 2
0
1
2
X5
X6 0 1
Z3
Machine Learning
ML
*-
+
http://suanfazu.com/
(c) Jason H. Moore 15
Statistical and Biological Interpretation
I
Biological Validation
V
Talbot, Zebrafish (2014) dev.biologists.org
(c) Jason H. Moore 16
Clinical Application
A
M.D. Anderson
Why Artificial Intelligence?
Importance
PCA
Polynomial
DTRF
LR
LR RF
LR
(c) Jason H. Moore 17
Accessible
Artificial Intelligence
PennAI
AI should be open, easy, and accessible
(c) Jason H. Moore 18
(c) Jason H. Moore 19
http://scikit-learn.org/
(c) Jason H. Moore 20
https://kaixhin.github.io/FGLab/
Controller: Future Gadget Lab
(c) Jason H. Moore 21
Database: MongoDB
(c) Jason H. Moore 22
ML Results -> Knowledge
ML Results -> Knowledge
(c) Jason H. Moore 23
(c) Jason H. Moore 24
(c) Jason H. Moore 25
Penn Machine Learning Benchmarks
(PMLB)BioData Mining 10:36 (2017)
(c) Jason H. Moore 26
Penn Machine Learning Benchmarks
(PMLB)Pacific Symposium on Biocomputing (2018)
(c) Jason H. Moore 27
Acknowledgments
• PennAI Team– Josh Cohen, Weixuan Fu, Paul Kopec, Bill La Cava, Randy Olson,
Moshe Sipper, Sharon Tartarone, Heather Williams
• NIH grants R01s AI11679, LM012601,
LM010098, UC4 DK112217
• epistasis.org, epistasisblog.org
• twitter.com: @moorejh
• PennAI.org