software and hardware infrastructures to conquer data explosion in life science - life science...
DESCRIPTION
TRANSCRIPT
© 2012 IBM Corporation1
Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel
Romeo Kienzler
Data Scientist and Architect, Pos. Graduate in Information Systems and Bioinformatics
IBM Innovation Center Zurich [email protected] https://www.ibm.com/developerworks/mydeveloperworks/profiles/user/RomeoKienzler
© 2012 IBM Corporation2
Outline
● Data Growth● Data Growth in Life Science
● BigData in Life Science● How to address BigData?● Outlook
© 2012 IBM Corporation3
3
Data Growth
Data AVAILABLE to an organization
data an organization can PROCESS
Missed
opportunity
100 Million Tweets are posted every day, 35 hours of video are being uploaded every minute,6.1 x 10^12 text messages have been sent in 2011 and 247 x 10^9 E-Mails passed through the net.80 % spam and viruses. => Filtering is more and more important.
Up to 2003 the same amount of data has been produced as between 2003 and now
© 2012 IBM Corporation4
New Data Sources in Life Sciences
● DNA (RNA) Sequencing● Next-Generation Sequencing● DNA Transistor
● Imaging and Video● Unstructured Text
© 2012 IBM Corporation11
SIIB (Strategic IP Insight Platform)
Integrated chemical, biological and textual search
Deep analytics on scientific literature and patents
Aggregation of world wide Patent Data and scientific literature (30M+ docs) with ongoing updates
© 2012 IBM Corporation12
The challange
● Store a huge amount of data
● Process a huge amount of data (incl. Search/Find)
● Don't consume too much energy
© 2012 IBM Corporation15
Use many Hard Drives - Limits
(*) Given a Disk Capacity of 25TB
300 Crashes per Day, Data Loss after two weeks
© 2012 IBM Corporation16
Separate the Signal From the Noise¹
¹http://www.ibmsystemsmag.com/power/businessstrategy/BI-and-Analytics/signal_noise/
© 2012 IBM Corporation18
Use many CPU's
Supercomputer before
➔ Weather
➔ Atom Bombs
➔ Science
➔ Crash Tests
Supercomputer in a Rack
➔ 18 TB Main Memory, 1008 CPU Cores, 113 TFLOPS (1st TOP500 2013: 17590 TFLOPS 2004: 71 TFLOPS)
© 2012 IBM Corporation22
Example FPGA: IBM Pure Data
● Up to 1,28 PB Storage
● Up to 10 Racks
● Up to 500 GigaByte/s Throughput
● Up to 1120 FPGA + 1120 Intel CPU Cores / 960 Hard Drives
© 2012 IBM Corporation23
Example FPGA: Conveycomputers
● Accelerates BWA by 15x
● Accelerates Smith-Waterman
Source: www.conveycomputer.com
© 2012 IBM Corporation25
Example: Cloud
● Managed Infrastructure● Dynamic Provisioning● Specialized HW● SaaS
Source: www.basespace.illumina.com
© 2012 IBM Corporation26
Conclusion● Main BigData Sources are Sequences and Plain Text● Many others to come (e.g. Images and Videos)● Store Data on many Commodity Hard Drives (Energy Problem not solved)
● Filter Signal from Noise● Process Data on many CPU's● Usage of specialized Hardware / CPU's● Research in performance of algorithms
© 2012 IBM Corporation27
Outlook● Currently very heterogeneous infrastructures● Trends:
● Virtualization● Standardization● Consumerization
● Limits● Space● Energy consumption
● What shall I do?● RELAX
© 2012 IBM Corporation28
The future will be full of surprises
A battery powered pocket size super computer?
Raspberry Pi
Parallela