challenges in medical imaging and the visceral model
Post on 22-Feb-2017
460 Views
Preview:
TRANSCRIPT
Overview
• Systematic evaluations– Information retrieval, industrial challenges
• ImageCLEF– 2003-2016
• Challenges in medical imaging and Open Science– Conference and platforms (Kaggle, Topcoder, …)
• VISCERAL– “Moving the algorithms to the data and not the data
to the algorithms”• Conclusions
Systematic evaluations
• 1960-: the Cranfield tests– Test collection, tasks, ground truth– Automatic indexing better than manual terms
• 1992-: TREC – Text Retrieval Conference– At NIST, Gaithersburg– Many different tasks over the years
• 1999-: CLEF, TRECVid as offspring of TREC• Industrial performance benchmarks
– TPC (1988), common transaction processing frame– Supercomputer benchmark (1993), common criteria– …
Cleverdon, C. W. (1960). ASLIB Cranfield research project on the comparative efficiency of indexing systems. ASLIB Proceedings, XII, 421-431.
• Benchmark on multimodal image retrieval– Run since 2003, medical task since 2004– Part of the Cross Language Evaluation Forum (CLEF)
• Many tasks related to medical image retrieval– Image classification (modality, body part, …)– Image-based retrieval– Case-based retrieval (finding similar cases)– Compound figure separation– Caption prediction– …
• Many old databases remain available, imageclef.org
Henning Müller, Paul Clough, Thomas Deselaers, Barbara Caputo, ImageCLEF –Experimental evaluation of visual information retrieval, Springer, 2010.
ImageCLEF experiences• Creating a community is important to have a good
participation (many groups register to access data)– Workshop to discuss results, evolution of tasks over
the years (attracting postgraduate students)• Impact of data sets can be high (see also TREC)
– Overview articles are frequently cited, best participant algorithms as well
• Large data causes problems in some countries• Hard to make groups collaborate
– Evaluate system components• Little interactive evaluation of systems• Not everything is fully reproducible
Open Science
• Initiatives to share data, tasks and tools– Not only experts, really everyone– More efficient way to do science, no reimplementation– NIH, some journals push for open data, open access
• Data papers and executable papers– Full reproducibility
• Which is otherwise often not given!!• Challenges as an important way to bring many
people into the loop of data science– http://www.challenge.gov/– Kaggle, TopCoder, …
Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.
Challenges in Medical Imaging
• Grand Challenges in Medical Imaging– http://grand-challenge.org/– Including challenges and details on impact & why
• 2007: MICCAI workshop on liver segmentation– Including common data
• Now most conferences organize many challenge sessions in addition to workshops– MICCAI, ISBI, SPIE, ICPR, …
• Problem: still, most publications are on closed data sets, impossible to verify, small, ...– What if all were available on a secure infrastructure?
Platforms for ML challenges
• Kaggle– Much influence on machine learning challenges– Big commercial factor, giving price money and
also hiring good talent– Download data, submit results
• TopCoder– Use of code instead of results list– 79,900,000 of price money distributed– Almost 1 million members
• Many other exist in specific domains– Sage Bionetworks in the biomedical field
Challenges with challenges
• Get a large number of participants and different techniques, as there are many burdens– Only one can win in the end, price money
• Same conditions for all (computation, bandwidth)• How to distribute very large data sets?• How to deal with confidential/restricted data?
– Medical, commercial data, forbidden data sets• How to deal with quickly changing data?
– Data of cell phone providers, Internet companies• Reproducibility
– Optimizations on test data, particularly with prices
VISCERAL model
• VISCERAL – Visual Concept Extraction Challenge in Radiology
• “Bringing the algorithms to the data”– Have the data centrally stored, in our case in the
cloud (which can be HIPAA compliant)• Three types of challenges
– Anatomy segmentations (3x), 20 organs– Retrieval challenge (2x), finding similar cases– Lesion detection challenge (2x), 5 organs
• Provide large data sets that are well annotated and can be shared long term– Challenging with IRB approval in three countries
Test DataTraining Data
Participants Organiser
Participant Virtual MachinesRegistration
System
Annotation Management System
Analysis System
Annotators (Radiologists)
Locally Installed Annotation Clients
Microsoft Azure Cloud
Test Data
Silver corpus (example trachea)
• Executable code of all participants– Run it on new data, do label fusion
Dice 0.85 Dice 0.71 Dice 0.84 Dice 0.83
Participant segmentations
Dice 0.92
Silver Corpus
Evaluation as a Service (EaaS)
• Evaluation via APIs, code, cloud, …• Workshop in Sierre in March 2015• Many aspects, viewpoints,
interests• White paper published
– ArXiv• All comments are
welcome!!
Cloud-based evaluation
• Workshop at MartinosCenter, Boston, MA, November 2015
• How to run benchmarks on very large data sets in the cloud (reproducibility, motivation)
• Many different stakeholders– Scientists, infrastructure providers, companies,
funding organizations• Sustainability is a major challenge• Interest of singe persons vs. interests of a domain
– Give credit to creators of data and tools• Nature Scientific data
Coding4Cancer & others
• Challenge on cancer prediction (breast, lung)• Price money for the challenges
– Make code open source to be eligible
• Commercial medical imaging challenges– Zebra Medical Vision
• Large data sets available for research• Use of their infrastructure, only, using Docker
– RadLogic• Plug-in concept for algorithms of scientists
What is needed now?
• Long-term vision of how medical data analysis will develop and how data & tools can be shared– Moon shot initiative on cancer (Biden)
• International research infrastructure– Public-private partnerships to make them
sustainable, still sharing costs is not clear– Leaving data where produced, moving the code
• Incentives to share data and task environments– Those doing the major work should receive credit– More work on those preparing data & tasks
• Annotate data, standard formats, support to others
Conclusions• Open Science is developing quickly
– Potential advantages for all• The medical domain is complicated as data
require protection (the more, the bigger)– Particularly for genomics– No duplications limit data exposure
• Translational aspects also need to be taken into account (transfer code towards products)– Executable “papers” and available data should help– Objective performance comparison
• Challenges will be part of this ecosystem
top related