big data versus small data analysis towards personalized medicine practice

MEETING ABSTRACT Open Access

Big data versus small data analysis towardspersonalized medicine practiceMira Marcus-Kalish

From EPMA-World Congress 2013Brussels, Belgium. 20-21 September 2013

The new era in medical research and development, basedon new approaches and advanced technologies, enabledprofound insights, in various levels and degrees of sensi-tivity, on the human body physical and mental function-ing. The identification of these micro and macroenvironmental factors created a real need for new analy-sis tools and translational models. The “big data” createdand harvested in health care, from the molecular to thephysiological, imaging and environmental data, etc, cre-ated an inevitable growing need for sophisticated analysistools. “In 10 years each individual will be surrounded bya virtual cloud of billions of data pointsP4 medicine”1. More than that there is a need to provide a broadunderstanding on the all factors interplay, based on com-bining all relevant data sources and utilizing all availableanalysis tools. New technologies are developed to har-vest, verify, control, transfer, align and transform data, aswell as new innovative analysis and predictive targetedtools to meet these challenges, both in academia andindustry. In parallel there is a growing need to new inno-vative and sophisticated tools to analyse “small data”.That is to enable reliable and responsible preventive,predictive and personalized medicine, in most caseswhere there is more data than equivalent records. Newphilosophical approaches, as well as innovative analysistools, specially developed to meet these challenges, mustbe taken, based on the coupling of computer sciences,complexity theory, non linear dynamics, logic theory, etc.That is in compliance and in parallel to the statisticalmodels such as generalized linear models or classificationtrees that are further endowed by new regularizationmethods, as well as recent developments in FDR (FalseDiscovery Rate) testing emphasizing the hierarchicalstructure.

A targeted data mining tools will be presented to com-ply with the real need to deal with all interplay of all fac-tors originated in different sources, such as: molecular,genetics, images, physiology, etc, presented in differentscales. More than that the goal is to analyze small sets ofdata containing many parameters, including missingvalues. The presented patented rule discovery and predic-tion data mining tool, enables the simultaneous analysisof multilevel, multisource data (imaging, signals, categori-cal, numerical and descriptive data) as is, with no datamanipulation, such as normalization, or neglecting partof the data. The goal is to relate to the whole data setas is, while handling missing data - which is one of themajor needs in providing the whole system approach.The algorithm reveals the underlying rules while attachinga level of confidence to each rule, identifies the unexpectedrules and detects the unexpected cases. Furthermore thetool was found to be less sensitive to over- fitting (smallsets with many parameters).In summary the Data Mining and prediction tool is

proven to reveal all If Than rules, all If than Not rulesand a subset of If and Only If (necessary and sufficientconditions) tools, with no prior assumptions. Threedetailed examples will be presented. The evaluation ofthe Epoetin adverse effects to assess long term risks andadvancing towards better Epoetin driven treatmentmodalities, based on the survival analysis of dialysispatients and cardiovascular patients treated with EPO.The second example will deal with dementia and neuro-degenerative combined data taken from the hospitalpractice (MCI, AD and normal patients) to evaluate andoptimize the various treatments and provide accuratepredictions. The third example deals with the analysis ofthe progression of disease in ALS patients based on thepatient s current disease status. The data mining toolwas already successfully applied to other applicationssuch as: autoimmune diseases, drug development,Correspondence: [email protected]

Tel Aviv University, Israel

Marcus-Kalish EPMA Journal 2014, 5(Suppl 1):A52http://www.epmajournal.com/content/5/S1/A52

© 2014 Marcus-Kalish; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

mailto:[email protected]

http://creativecommons.org/licenses/by/2.0

http://creativecommons.org/publicdomain/zero/1.0/

personalized skin treatment, skin genetic data and bio-markers analysis, etc.

Published: 11 February 2014

Reference1. Hood L: The methods of fractal analysis of diagnostic images. Initial

clinical experience. Nat Biotechnol 2011, 29(3):191.

doi:10.1186/1878-5085-5-S1-A52Cite this article as: Marcus-Kalish: Big data versus small data analysistowards personalized medicine practice. EPMA Journal 2014 5(Suppl 1):A52.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Marcus-Kalish EPMA Journal 2014, 5(Suppl 1):A52http://www.epmajournal.com/content/5/S1/A52

of 2

big data versus small data analysis towards personalized medicine practice

Documents