software and hardware infrastructures to conquer data explosion in life science - life science...

29
© 2012 IBM Corporation 1 Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel Romeo Kienzler Data Scientist and Architect, Pos. Graduate in Information Systems and Bioinformatics IBM Innovation Center Zurich [email protected] https://www.ibm.com/developerworks/mydeveloperworks/profiles/user/RomeoKienzler

Upload: romeo-kienzler

Post on 30-Oct-2014

248 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

© 2012 IBM Corporation1

Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel

Romeo Kienzler

Data Scientist and Architect, Pos. Graduate in Information Systems and Bioinformatics

IBM Innovation Center Zurich [email protected] https://www.ibm.com/developerworks/mydeveloperworks/profiles/user/RomeoKienzler

© 2012 IBM Corporation2

Outline

● Data Growth● Data Growth in Life Science

● BigData in Life Science● How to address BigData?● Outlook

© 2012 IBM Corporation3

3

Data Growth

Data AVAILABLE to an organization

data an organization can PROCESS

Missed

opportunity

100 Million Tweets are posted every day, 35 hours of video are being uploaded every minute,6.1 x 10^12 text messages have been sent in 2011 and 247 x 10^9 E-Mails passed through the net.80 % spam and viruses. => Filtering is more and more important.

Up to 2003 the same amount of data has been produced as between 2003 and now

© 2012 IBM Corporation4

New Data Sources in Life Sciences

● DNA (RNA) Sequencing● Next-Generation Sequencing● DNA Transistor

● Imaging and Video● Unstructured Text

© 2012 IBM Corporation5

Data Growth in Life Sciences

Source: www.osehra.org

© 2012 IBM Corporation6

Data Growth in Life Sciences

Source: www.crops.org

© 2012 IBM Corporation7

Examples - NGS

© 2012 IBM Corporation8

Images and Videos

Source: www.phys.org

© 2012 IBM Corporation9

Examples – Text Analytics

Source: www.theglobalistreport.com

© 2012 IBM Corporation10

Watson

© 2012 IBM Corporation11

SIIB (Strategic IP Insight Platform)

Integrated chemical, biological and textual search

Deep analytics on scientific literature and patents

Aggregation of world wide Patent Data and scientific literature (30M+ docs) with ongoing updates

© 2012 IBM Corporation12

The challange

● Store a huge amount of data

● Process a huge amount of data (incl. Search/Find)

● Don't consume too much energy

© 2012 IBM Corporation13

Use many Hard Drives

© 2012 IBM Corporation14

Use many Hard Drives

© 2012 IBM Corporation15

Use many Hard Drives - Limits

(*) Given a Disk Capacity of 25TB

300 Crashes per Day, Data Loss after two weeks

© 2012 IBM Corporation16

Separate the Signal From the Noise¹

¹http://www.ibmsystemsmag.com/power/businessstrategy/BI-and-Analytics/signal_noise/

© 2012 IBM Corporation17

Store only what you need

© 2012 IBM Corporation18

Use many CPU's

Supercomputer before

➔ Weather

➔ Atom Bombs

➔ Science

➔ Crash Tests

Supercomputer in a Rack

➔ 18 TB Main Memory, 1008 CPU Cores, 113 TFLOPS (1st TOP500 2013: 17590 TFLOPS 2004: 71 TFLOPS)

© 2012 IBM Corporation19

© 2012 IBM Corporation20

Use specialized CPU's: GPUs

Source: www.ethz.ch

Source: www.nvidia.com

© 2012 IBM Corporation21

Use specialized CPU's: FPGA's

Source: www.virtex.com

© 2012 IBM Corporation22

Example FPGA: IBM Pure Data

● Up to 1,28 PB Storage

● Up to 10 Racks

● Up to 500 GigaByte/s Throughput

● Up to 1120 FPGA + 1120 Intel CPU Cores / 960 Hard Drives

© 2012 IBM Corporation23

Example FPGA: Conveycomputers

● Accelerates BWA by 15x

● Accelerates Smith-Waterman

Source: www.conveycomputer.com

© 2012 IBM Corporation24

Example: Algorithms

Source: www.biomedcentral.com/1471-2105/9/S2/S10

© 2012 IBM Corporation25

Example: Cloud

● Managed Infrastructure● Dynamic Provisioning● Specialized HW● SaaS

Source: www.basespace.illumina.com

© 2012 IBM Corporation26

Conclusion● Main BigData Sources are Sequences and Plain Text● Many others to come (e.g. Images and Videos)● Store Data on many Commodity Hard Drives (Energy Problem not solved)

● Filter Signal from Noise● Process Data on many CPU's● Usage of specialized Hardware / CPU's● Research in performance of algorithms

© 2012 IBM Corporation27

Outlook● Currently very heterogeneous infrastructures● Trends:

● Virtualization● Standardization● Consumerization

● Limits● Space● Energy consumption

● What shall I do?● RELAX

© 2012 IBM Corporation28

The future will be full of surprises

A battery powered pocket size super computer?

Raspberry Pi

Parallela

© 2012 IBM Corporation29

Acknowledgements

Slides 14 – 16 & 21 have been taken from a Keynote speech of Axel Köster, IBM Germany