building global scientific computing infrastructure through lab, academic and industrial...
Post on 22-Dec-2015
214 views
TRANSCRIPT
Building Global Scientific Computing Infrastructure through Lab, Academic and Industrial Collaborations
• Bill Hoffman - CTO/Founder• Berk Geveci - Director of Scientific Computing• Kitware Inc.
Thank You
• Thank you to the unsung heroes of Open Source – Scientific computing
community and Gov. labs
• Google and Facebook would not be around without the open source infrastructure built in part by you DOE folks
Talk Overview
• About Kitware
• Why Open Source for Collaboration
• Successful Collaboration Platforms Supported by Kitware
• Suggestions for Future Directions
Kitware: the Company
• Founded in 1998
• Founders: 5 previous employees of GE Corporate Research
• Privately held, profitable from creation, no debt
– Revenues projected at $12 million in 2010• ~$15 million if subcontractors included
– Principally consulting/grants, with support product revenue
• Approximately 90 employees; growing rapidly (30% in 2010)
– > 25 PhD
– Looking to hire 20 to 30 in 2011
Kitware Is• A software company• creating open-source collaboration
platforms• which are used globally for
– research– teaching– commercial application.
• This software is created by– internationally recognized experts– in extended communities– using a rigorous, quality-inducing software
development process.
Why Open Source?• World-wide visibility
– Marketing (7.5 million web hits/month)– Hiring
• Candidates have trained with the software
– Collaboration Platform• Academic• Research• Commercial
– Distributed maintenance
• High quality base for products
– Commercial– Proprietary– Specialized
9
Why Open Source?
• Software licensing fees are minimal– Support costs– Consulting costs
• Software survives independent of any single company– Community support
10
Source Is Ideal for Scientific Computing
• Open Science
• Authenticity (see what you get)
• Quality-inducing, agile, collaborative software process
• Scalability
• Business model
11
Open Science
• Reproducible– Data (Open Data)– Software / algorithms (Open Source)– Publications (Open Access)
• Impediment-free Dissemination– Results– Research ideas
12
Example: Alzheimer’s Research
• From NY Times Article:"The key to the Alzheimer’s project was an agreement as ambitious as its goal:
not just to raise money, not just to do research on a vast scale, but also to share all the data, making every single finding public immediately, available to anyone with a computer anywhere in the world.
No one would own the data. No one could submit patent applications, though private companies would ultimately profit from any drugs or imaging tests developed as a result of the effort.
“It was unbelievable,” said Dr. John Q. Trojanowski, an Alzheimer’s researcher at the University of Pennsylvania. “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be public immediately.”
13
Authenticity
• See what you get
• Try before you commit
• Access to outside, independent experts– Avoid vendor lock-in– Hire from the community
14
Agile Software Process
• Open source communities require extensive collaboration– Distributed development and user communities
• Necessarily require agile processes– Responsive to customer– Responsive to technology changes
15
Scalability
• Scalable Software Development– Eric Raymond The Cathedral & The Bazaar
“open-source peer review is the only scalable method for achieving high reliability and quality.”
(assuming community size is big enough !!)
16
Business Model
• Open source software– Services and support
– Consulting
– Collaborative R&D
• Commercial products– Value-added products
– Applications built on open source base
• Redhat for scientific computing*
17
Successful Collaborations
• VTK
• ParaView
• ITK
• CMake
• Client Specific Work built on those tools– ISP– ERDC
18
•From Ohloh: Very large, active development team: Over the past
twelve months, 66 developers contributed new code to VTK. This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh.
VTK Development Team
and many others...
National Library of MedicineSegmentation and Registration
ToolkitInsight Toolkit (ITK)$15 million over 7
yearsLeading edge
algorithmsOpen source software
www.itk.org
CMake – huge impact started with NLM
• 3000+ downloads per day from www.cmake.org• Major Linux distributions and Cygwin provide CMake
packages• KDE, Second Life, Trilinos, Boost (Expermentally), many
others
KDE 2006 – Tipping Point!
CMake Who Is Involved?
Users• KDE
• Second Life
• ITK
• VTK
• ParaView
• Trilinos
• Scribus
• Boost (Experimentaly)
• Mysql
• LLVM
• many more
Supporters
• Kitware
• ARL
• National Library of Medicine
• Sandia National Labs
• Los Alamos National Labs
• NAMIC
• Commercial Customers
CDash - Trilinos (Multi-Package Dashboard)http://trilinos-dev.sandia.gov/cdash/index.php
Main Project
Sub Projects
Genesis of ISPISP was developed beginning in 2008. It is a synthesis of three different tools:
• Midas for data archival and transmission
• VolView (modified) for the visualization and display core
• Lesion Sizing Toolkit for additional functionality
The resulting data archive and viewing application has been running 24/7 since 2008 and provides a means for readers to interactively explore the data of participating authors.
Lesion Sizing Toolkit
ISP (VolView Based)
MIDAS
Kitware SBIR History• No stranger to SBIR funding
– First contract was an SBIR
– 16 Phase II's
• Funded many advances across our tools– ParaView Web
– In-Situ analysis
– AMR Volume Rendering
– Higher Order Finite Element Visualization
• Tibbetts award for Image-Guided Surgery (IGSTK) Phase I and II STTRS– recognizes companies who represent excellence in achieving the mission
and goals of SBIR and STTR programs
32
The Need for Indoor Plumbing
• http://www.kitware.com/blog/home/post/78• Rather than creating usable tools for scientists and
engineers, often what is created are shiny toys with little practical use. Instead, as one of our collaborators Russ Taylor at UNC so aptly put it, we could use a lot more basic "indoor plumbing" to complement our bleeding-edge zero-G toilets with the latest bells and whistles.
33
2008 Rejected Proposal DOE Office of Science• "CMake - The Next Generation Petascale
Build Tool“
• Reviewers that wanted zero-G toilets- “The proposed method is an evolutionary development beyond current state of the
art. There seems to be very little novelty in the proposed approach.”
- “Of the four components of the petascale functionality only two are new, e.g., something beyond what is available with GNU make.”
- “If the DOE HPC centers feel the need for such an extended version of Cmake and Ctest it should probably be obtained in a acquisition process from this and/or competing vendors.”
34
Reviewers that wanted indoor plumbing• “This is a good proposal to enhance CMake for petascale
systems and to perform community outreach on it.”
• “The approach is very appropriate”
• “This proposal addresses the often-overlooked but important area of Application Build Tools. .... In large HPC software projects, which have to be portable to many complex systems, the build infrastructure and process are labor-intensive and often frustrating. The proposed changes seem to have reasonably low risk and significant benefit.”
35
Investment in Infrastructures has a huge payoff• Allows your people to focus on the science
• Allows the “outsourcing” of cross platform maintenance
• Software Engineering is a “Solved Problem”?
36
Kitware’s view on Multi-Core
• Huge opportunity • Different Platforms and Cross Compiling
• We have the expertise to help others achieve goals
• People will want our packages ITK, VTK, ParaView and CMake to take advantage of multi-core and will collaborate and fund us to develop, investment in open-source packages have a huge global impact
37