data mining for dummies - dmsk · dummies melanie ganz-benjaminsen assistant professor neurobiology...
TRANSCRIPT
![Page 1: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/1.jpg)
Data mining for Dummies
Melanie Ganz-Benjaminsen Assistant Professor
Neurobiology Research UnitCopenhagen University Hospital/Rigshospitalet
Department of Computer ScienceUniversity of Copenhagen
![Page 2: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/2.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
MSc in Physics PhD in CSPostDocin USA
PostDocat RH
Asst. Prof. at DIKU
Who am I?
![Page 3: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/3.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
![Page 4: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/4.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Data mining
• process used to extract usable data from a larger set of “raw” data
• greatly exceeds the average data analysis you can do manually
![Page 5: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/5.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Data science
From http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
![Page 6: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/6.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Machine learning
• think of machine learning as a means of building models of data
• mathematical models that help understand the data
• “learning” since there are parameters in the model that get tuned based on the available data
![Page 7: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/7.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Machine learning
• Supervised learning:
– Classification
– Regression
• Unsupervised learning:
– Dimensionality reduction
– Clustering
• Semi-supervised learning
![Page 8: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/8.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Classification: Predicting discrete labels
![Page 9: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/9.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Classification: Predicting discrete labels
![Page 10: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/10.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Classification: Predicting discrete labels
![Page 11: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/11.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Contextualization
People who suffered a stroke
Healthy controls
![Page 12: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/12.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Models of existing data
![Page 13: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/13.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Prediction on new data
Categorize/make risk profiles for new patients
![Page 14: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/14.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Too easy?
• benefit of the machine learning approach is that it can generalize to much larger datasets in many more dimensions!
• More dimensions? -> more variables e.g. gender, family history, etc.
![Page 15: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/15.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Real example - clustering
Images taken from Beliveau et al., JNS (2017)
![Page 16: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/16.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Real example - clustering
K = 7 K = 18
Images courtesy of Vincent Beliveau
![Page 17: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/17.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Recap Machine learning
• is building mathematical models to help describe the relation between “input” and “output” data
– input can be age and blood pressure and output stroke status
– or input can be 5-dimensional serotonin data at ca. 10.000 vertexes of cortex and output the number of regions I want to cluster the cortical data in
• BUT mathematical models can be limited and need to be appropriate for your data
![Page 18: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/18.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Bottom line?
High dimensional clinical and epidemiological data & statistical models with computer power
behind them (aka machine learning)
Personalized medicine ?
![Page 19: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/19.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
KU Artificial Intelligence centre
• The Data Science Laboratory (DSL) acts as the entrance for researchers and students to the AI Centre.
• Its overall aim is to enhance the quality of data analyses in research carried out at SCIENCE.
![Page 20: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/20.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
Thank for your attention!
Questions?
![Page 21: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer](https://reader034.vdocuments.net/reader034/viewer/2022050715/5f25cc9f8c59a11ea821ff54/html5/thumbnails/21.jpg)
Melanie Ganz-Benjaminsen, PhD
NRU, Copenhagen University Hospital, Rigshospitalet
References
• Brown, M.S., 2014. Data mining for dummies. John Wiley & Sons
• https://jakevdp.github.io/PythonDataScienceHandbook/
• http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
• Beliveau, V., Ganz, M., Feng, L., Ozenne, B., Højgaard, L., Fisher, P.M., Svarer, C., Greve, D.N. and Knudsen, G.M., 2017. A high-resolution in vivo atlas of the human brain's serotonin system. Journal of Neuroscience, 37(1), pp.120-128.
• Data Science lab: https://datalab.science.ku.dk/english/