Big data, big knowledge big data for personalized healthcare

Download Big data, big knowledge big data for personalized healthcare

Post on 16-Feb-2017




8 download

Embed Size (px)



    Big Data, Big Knowledge: Big Datafor Personalized Healthcare

    Marco Viceconti, Peter Hunter, and Rod Hose

    AbstractThe idea that the purely phenomenological knowl-edge that we can extract by analyzing large amounts of data canbe useful in healthcare seems to contradict the desire of VPH re-searchers to build detailed mechanistic models for individual pa-tients. But in practice no model is ever entirely phenomenologicalor entirely mechanistic. We propose in this position paper that bigdata analytics can be successfully combined with VPH technolo-gies to produce robust and effective in silico medicine solutions.In order to do this, big data technologies must be further devel-oped to cope with some specific requirements that emerge fromthis application. Such requirements are: working with sensitivedata; analytics of complex and heterogeneous data spaces, includ-ing nontextual information; distributed data management undersecurity and performance constraints; specialized analytics to inte-grate bioinformatics and systems biology information with clinicalobservations at tissue, organ and organisms scales; and specializedanalytics to define the physiological envelope during the dailylife of each patient. These domain-specific requirements suggest aneed for targeted funding, in which big data technologies for insilico medicine becomes the research priority.

    Index TermsBig data, healthcare, virtual physiological human.


    THE birth of big data, as a concept if not as a term, is usu-ally associated with a META Group report by Doug Laneyentitled 3-D Data Management: Controlling Data Volume,Velocity, and Variety published in 2001 [1]. Further devel-opments now suggest big data problems are identified by theso-called 5V: volume (quantity of data), variety (data fromdifferent categories), velocity (fast generation of new data), ve-racity (quality of the data), and value (in the data) [2].

    For a long time the development of big data technologies wasinspired by business intelligence [3] and by big science (suchas the Large Hadron Collider at CERN) [4]. But when in 2009Google Flu, simply by analyzing Google queries, predicted flu-like illness rates as accurately as the CDCs enormously complexand expensive monitoring network [5], some analysts started toclaim that all problems of modern healthcare could be solvedby big data [6].

    In 2005, the term virtual physiological human (VPH) wasintroduced to indicate a framework of methods and technolo-gies that, once established, will make possible the collaborative

    Manuscript received September 29, 2014; revised December 14, 2014; ac-cepted February 6, 2015. Date of publication February 24, 2015; date of currentversion July 23, 2015.

    M. Viceconti is with the VPH Institute for Integrative Biomedical Research,and the Insigneo Institute for In Silico Medicine, University of Sheffield,Sheffield S1 3JD, U.K. (e-mail:

    P. Hunter is with the Auckland Bioengineering Institute, University ofAuckland, 1010 Auckland, New Zealand (e-mail:

    R. Hose is with the Insigneo institute for In Silico Medicine, University ofSheffield, Sheffield S1 3JD, U.K. (e-mail:

    Digital Object Identifier 10.1109/JBHI.2015.2406883

    investigation of the human body as a single complex system[7], [8]. The idea was quite simple:

    1) To reduce the complexity of living organisms, we decom-pose them into parts (cells, tissues, organs, organ systems)and investigate one part in isolation from the others. Thisapproach has produced, for example, the medical special-ties, where the nephrologist looks only at your kidneys,and the dermatologist only at your skin; this makes it verydifficult to cope with multiorgan or systemic diseases, totreat multiple diseases (so common in the ageing popula-tion), and in general to unravel systemic emergence dueto genotype-phenotype interactions.

    2) But if we can recompose with computer models all thedata and all the knowledge we have obtained about eachpart, we can use simulations to investigate how these partsinteract with one another, across space and time and acrossorgan systems.

    Though this may be conceptually simple, the VPH visioncontains a tremendous challenge, namely, the development ofmathematical models capable of accurately predicting what willhappen to a biological system. To tackle this huge challenge,multifaceted research is necessary: around medical imagingand sensing technologies (to produce quantitative data aboutthe patients anatomy and physiology) [9][11], data process-ing to extract from such data information that in some casesis not immediately available [12][14], biomedical modelingto capture the available knowledge into predictive simulations[15], [16], and computational science and engineering to runhuge hypermodels (orchestrations of multiple models) underthe operational conditions imposed by clinical usage [17][19];see also the special issue entirely dedicated to multiscalemodeling [20].

    But the real challenge is the production of that mechanis-tic knowledge, quantitative, and defined over space, time andacross multiple space-time scales, capable of being predictivewith sufficient accuracy. After ten years of research this has pro-duced a complex impact scenario in which a number of targetapplications, where such knowledge was already available, arenow being tested clinically; some examples of VPH applicationsthat reached the clinical assessment stage are:

    1) The VPHOP consortium developed a multiscale model-ing technology based on conventional diagnostic imagingmethods that makes it possible, in a clinical setting, topredict for each patient the strength of their bones, howthis strength is likely to change over time, and the proba-bility that they will overload their bones during daily life.With these three predictions, the evaluation of the absoluterisk of bone fracture in patients affected by osteoporosiswill be much more accurate than any prediction based on

    2168-2194 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See standards/publications/rights/index.html for more information. +917620593389


    external and indirect determinants, as it is in current clin-ical practice [21].

    2) More than 500 000 end-stage renal disease patients inEurope live on chronic intermittent haemodialysis treat-ment. A successful treatment critically depends on a well-functioning vascular access, a surgically created arterio-venous shunt used to connect the patient circulation tothe artificial kidney. The ARCH project aimed to improvethe outcome of vascular access creation and long-termfunction with an image-based, patient-specific compu-tational modeling approach. ARCH developed patient-specific computational models for vascular surgery thatmakes possible to plan such surgery in advance on thebasis of the patients data, and obtain a prediction of thevascular access function outcome, allowing an optimiza-tion of the surgical procedure and a reduction of associatedcomplications such as nonmaturation. A prospective studyis currently running, coordinated by the Mario Negri In-stitute in Italy. Preliminary results on 63 patients confirmthe efficacy of this technology [22].

    3) Percutaneous coronary intervention (PCI) guided by frac-tional flow reserve (FFR) is superior to standard assess-ment alone to treat coronaries stenosis. FFR-guided PCIresults in improved clinical outcomes, a reduction in thenumber of stents implanted, and reduced cost. However,currently FFR is used in few patients, because it is inva-sive and it requires special instrumentation. A less inva-sive FFR would be a valuable tool. The VirtuHeart projectdeveloped a patient-specific computer model that accu-rately predicts myocardial FFR from angiographic im-ages alone, in patients with coronary artery disease. In aphase 1 study the methods showed an accuracy of 97%,when compared to standard FFR [23]. A similar approach,but based on computed tomography imaging, is even at amore advanced stage, having recently completed a phase 2trial [24].

    While these and some other VPH projects have reached theclinical assessment stage, quite a few other projects are still inthe technological development, or preclinical assessment phase.But in some cases the mechanistic knowledge currently availablesimply turned out to be insufficient to develop clinically relevantmodels.

    So it is perhaps not surprising that recently, especially in thearea of personalized healthcare (so promising but so challeng-ing) some people have started to advocate the use of big datatechnologies as an alternative approach, in order to reduce thecomplexity that developing a reliable, quantitative mechanisticknowledge involves.

    This trend is fascinating from an epistemological point ofview. The VPH was born around the need to overcome thelimitations of a biology founded on the collection of a hugeamount of observational data, frequently affected by consider-able noise, and boxed into a radical reductionism that preventedmost researchers from looking at anything bigger than a singlecell [25], [26]. Suggesting that we revert to a phenomenologicalapproach where a predictive model is supposed to emerge notfrom mechanistic theories but by only doing high-dimensional

    big data analysis, may be perceived by some as a step towardthat empiricism the VPH was created to overcome.

    In the following, we will explain why the use of big data meth-ods and technologies could actually empower and strengthencurrent VPH approaches, increasing considerably its chances ofclinical impact in many difficult targets. But in order for thatto happen, it is important that big data researchers are aware thatwhen used in the context of computational biomedicine, big datamethods need to cope with a number of hurdles that are specificto the domain. Only by developing a research agenda for bigdata in computational biomedicine can we hope to achieve thisambitious goal.


    As engineers who have worked for many years in researchhospitals, we recognize that clinical and engineering researchersshare a similar mind-set. Both in traditional engineering and inmedicine, the research domain is defined in terms of problemsolving, not of knowledge discovery. The motto common to bothdisciplines is whatever works.

    But there is a fundamental difference: Engineers usually dealwith problems related to phenomena on which there is a largebody of reliable knowledge from physics and chemistry. Whena good reliable mechanistic theory is not available, engineersresort to empirical models, as far as they can solve the problemat hand. But when they do this, they are left with a sense offragility and mistrust, and they try to replace them as soon aspossible with theory-based mechanistic models, which are bothpredictive and explanatory.

    Medical researchers deal with problems for which there isa much less well-established body of knowledge; in addition,this knowledge is frequently qualitative or semiquantitative, andobtained in highly controlled experiments quite removed fromclinical reality, in order to tame the complexity involved. Thus,not surprisingly, many clinical researchers consider mechanisticmodels too simple to be trusted, and in general the whole ideaof a mechanistic model is looked upon with suspicion.

    But in the end whatever works remains the basic principle.In some VPH clinical target areas where we can prove con-vincingly that our mechanistic models can provide more accu-rate predictions than the epidemiology-based phenomenologicalmodels, the penetration into clinical practice is happening. Onthe other hand, when our mechanistic knowledge is insufficient,the predictive accuracy of our models is poor, and models basedon empirical/statistical evidences are still preferred.

    The true problem behind this story is the competition betweentwo methods of modeling nature that are both effective in certaincases. Big data can help computational biomedicine to transformthis competition into collaboration, significantly increasing theacceptance of VPH technologies in clinical practice.


    In order to illustrate this concept we will use as a guidingexample the problem of predicting the risk of bone fracture in awoman affected by osteoporosis, a pathological reduction of herbone mineralized mass [27]. The goal is to develop predictors +917620593389


    that indicate whether the patient is likely to fracture over a giventime (typically in the following ten years). If a fracture actuallyoccurs in that period, this is the true value used to decide if theoutcome prediction was right or wrong.

    Because the primary manifestation of the disease is quitesimple (reduction of the mineral density of the bone tissue) notsurprisingly researchers found that when such mineral densitycould be accurately measured in the regions where the mostdisabling fractures occurred (hip and spine), such measurementwas a predictor of the risk of fracture [28]. In controlled clinicaltrials, where patients are recruited to exclude all confoundingfactors, the bone mineral density (BMD) strongly correlatedwith the occurrence of hip fractures. Unfortunately, when BMDis used as a predictor, especially over randomized populations,the accuracy drops to 6065% [29]. Given that fracture is abinary event, tossing a coin would give us 50%, so this is con-sidered not good enough.

    Epidemiologists run huge international, multicentre clinicaltrials where the fracture events are related to a number of observ-ables; the data are then fitted with statistical models that providephenomenological models capable of predicting the likelihoodof fracture; the most famous, called FRAX, was developed byJohn Kanis at the University of Sheffield, U.K., and is consideredby the World Health Organisation the reference tool to predictrisk of fractures. The predictive accuracy of FRAX is compa-rable to that of BMD, but it seems more robust for randomisedfemale cohorts [30].

    In the VPHOP project, one of the flagship VPH projectsfunded by the Seventh Framework Program of the EuropeanCommission, we took a different approach: We developed amultiscale patient-specific model informed by medical imagingand wearable sensors, and used this model to predict the actualrisk of fracture of the hip and at the spine, essentially simulatingten years of the patients daily life [18]. The results of the firstclinical assessment, published only a few weeks ago, suggestthat the VPHOP approach could increase the predictive accuracyto 8085% [31]. Significant...