big data versus small data analysis towards personalized medicine practice

2
MEETING ABSTRACT Open Access Big data versus small data analysis towards personalized medicine practice Mira Marcus-Kalish From EPMA-World Congress 2013 Brussels, Belgium. 20-21 September 2013 The new era in medical research and development, based on new approaches and advanced technologies, enabled profound insights, in various levels and degrees of sensi- tivity, on the human body physical and mental function- ing. The identification of these micro and macro environmental factors created a real need for new analy- sis tools and translational models. The big datacreated and harvested in health care, from the molecular to the physiological, imaging and environmental data, etc, cre- ated an inevitable growing need for sophisticated analysis tools. In 10 years each individual will be surrounded by a virtual cloud of billions of data pointsP4 medicine1. More than that there is a need to provide a broad understanding on the all factors interplay, based on com- bining all relevant data sources and utilizing all available analysis tools. New technologies are developed to har- vest, verify, control, transfer, align and transform data, as well as new innovative analysis and predictive targeted tools to meet these challenges, both in academia and industry. In parallel there is a growing need to new inno- vative and sophisticated tools to analyse small data. That is to enable reliable and responsible preventive, predictive and personalized medicine, in most cases where there is more data than equivalent records. New philosophical approaches, as well as innovative analysis tools, specially developed to meet these challenges, must be taken, based on the coupling of computer sciences, complexity theory, non linear dynamics, logic theory, etc. That is in compliance and in parallel to the statistical models such as generalized linear models or classification trees that are further endowed by new regularization methods, as well as recent developments in FDR (False Discovery Rate) testing emphasizing the hierarchical structure. A targeted data mining tools will be presented to com- ply with the real need to deal with all interplay of all fac- tors originated in different sources, such as: molecular, genetics, images, physiology, etc, presented in different scales. More than that the goal is to analyze small sets of data containing many parameters, including missing values. The presented patented rule discovery and predic- tion data mining tool, enables the simultaneous analysis of multilevel, multisource data (imaging, signals, categori- cal, numerical and descriptive data) as is, with no data manipulation, such as normalization, or neglecting part of the data. The goal is to relate to the whole data set as is, while handling missing data - which is one of the major needs in providing the whole system approach. The algorithm reveals the underlying rules while attaching a level of confidence to each rule, identifies the unexpected rules and detects the unexpected cases. Furthermore the tool was found to be less sensitive to over- fitting (small sets with many parameters). In summary the Data Mining and prediction tool is proven to reveal all If Than rules, all If than Not rules and a subset of If and Only If (necessary and sufficient conditions) tools, with no prior assumptions. Three detailed examples will be presented. The evaluation of the Epoetin adverse effects to assess long term risks and advancing towards better Epoetin driven treatment modalities, based on the survival analysis of dialysis patients and cardiovascular patients treated with EPO. The second example will deal with dementia and neuro- degenerative combined data taken from the hospital practice (MCI, AD and normal patients) to evaluate and optimize the various treatments and provide accurate predictions. The third example deals with the analysis of the progression of disease in ALS patients based on the patient s current disease status. The data mining tool was already successfully applied to other applications such as: autoimmune diseases, drug development, Correspondence: [email protected] Tel Aviv University, Israel Marcus-Kalish EPMA Journal 2014, 5(Suppl 1):A52 http://www.epmajournal.com/content/5/S1/A52 © 2014 Marcus-Kalish; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Upload: mira

Post on 24-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

MEETING ABSTRACT Open Access

Big data versus small data analysis towardspersonalized medicine practiceMira Marcus-Kalish

From EPMA-World Congress 2013Brussels, Belgium. 20-21 September 2013

The new era in medical research and development, basedon new approaches and advanced technologies, enabledprofound insights, in various levels and degrees of sensi-tivity, on the human body physical and mental function-ing. The identification of these micro and macroenvironmental factors created a real need for new analy-sis tools and translational models. The “big data” createdand harvested in health care, from the molecular to thephysiological, imaging and environmental data, etc, cre-ated an inevitable growing need for sophisticated analysistools. “In 10 years each individual will be surrounded bya virtual cloud of billions of data pointsP4 medicine”1. More than that there is a need to provide a broadunderstanding on the all factors interplay, based on com-bining all relevant data sources and utilizing all availableanalysis tools. New technologies are developed to har-vest, verify, control, transfer, align and transform data, aswell as new innovative analysis and predictive targetedtools to meet these challenges, both in academia andindustry. In parallel there is a growing need to new inno-vative and sophisticated tools to analyse “small data”.That is to enable reliable and responsible preventive,predictive and personalized medicine, in most caseswhere there is more data than equivalent records. Newphilosophical approaches, as well as innovative analysistools, specially developed to meet these challenges, mustbe taken, based on the coupling of computer sciences,complexity theory, non linear dynamics, logic theory, etc.That is in compliance and in parallel to the statisticalmodels such as generalized linear models or classificationtrees that are further endowed by new regularizationmethods, as well as recent developments in FDR (FalseDiscovery Rate) testing emphasizing the hierarchicalstructure.

A targeted data mining tools will be presented to com-ply with the real need to deal with all interplay of all fac-tors originated in different sources, such as: molecular,genetics, images, physiology, etc, presented in differentscales. More than that the goal is to analyze small sets ofdata containing many parameters, including missingvalues. The presented patented rule discovery and predic-tion data mining tool, enables the simultaneous analysisof multilevel, multisource data (imaging, signals, categori-cal, numerical and descriptive data) as is, with no datamanipulation, such as normalization, or neglecting partof the data. The goal is to relate to the whole data setas is, while handling missing data - which is one of themajor needs in providing the whole system approach.The algorithm reveals the underlying rules while attachinga level of confidence to each rule, identifies the unexpectedrules and detects the unexpected cases. Furthermore thetool was found to be less sensitive to over- fitting (smallsets with many parameters).In summary the Data Mining and prediction tool is

proven to reveal all If Than rules, all If than Not rulesand a subset of If and Only If (necessary and sufficientconditions) tools, with no prior assumptions. Threedetailed examples will be presented. The evaluation ofthe Epoetin adverse effects to assess long term risks andadvancing towards better Epoetin driven treatmentmodalities, based on the survival analysis of dialysispatients and cardiovascular patients treated with EPO.The second example will deal with dementia and neuro-degenerative combined data taken from the hospitalpractice (MCI, AD and normal patients) to evaluate andoptimize the various treatments and provide accuratepredictions. The third example deals with the analysis ofthe progression of disease in ALS patients based on thepatient s current disease status. The data mining toolwas already successfully applied to other applicationssuch as: autoimmune diseases, drug development,Correspondence: [email protected]

Tel Aviv University, Israel

Marcus-Kalish EPMA Journal 2014, 5(Suppl 1):A52http://www.epmajournal.com/content/5/S1/A52

© 2014 Marcus-Kalish; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

personalized skin treatment, skin genetic data and bio-markers analysis, etc.

Published: 11 February 2014

Reference1. Hood L: The methods of fractal analysis of diagnostic images. Initial

clinical experience. Nat Biotechnol 2011, 29(3):191.

doi:10.1186/1878-5085-5-S1-A52Cite this article as: Marcus-Kalish: Big data versus small data analysistowards personalized medicine practice. EPMA Journal 2014 5(Suppl 1):A52.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Marcus-Kalish EPMA Journal 2014, 5(Suppl 1):A52http://www.epmajournal.com/content/5/S1/A52

Page 2 of 2