2013: trends from the trenches

70
1 Trends from the trenches. 2013 Bio IT World - Boston

Upload: chris-dagdigian

Post on 25-May-2015

1.557 views

Category:

Technology


2 download

DESCRIPTION

Slides from the 2013 "trends talk" as delivered annually at Bio-IT World Boston.

TRANSCRIPT

Page 1: 2013: Trends from the Trenches

1

Trends from the trenches.2013 Bio IT World - Boston

Page 2: 2013: Trends from the Trenches

2

Some less aspirational title slides ...

Page 3: 2013: Trends from the Trenches

3

Trends from the trenches.2013 Bio IT World Boston

Page 4: 2013: Trends from the Trenches

4

Trends from the trenches.2013 Bio IT World Boston

Page 5: 2013: Trends from the Trenches

5

I’m Chris.

I’m an infrastructure geek.

I work for the BioTeam.

www.bioteam.net - Twitter: @chris_dag

Page 6: 2013: Trends from the Trenches

Who, What, Why ...

6

BioTeam

‣ Independent consulting shop

‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done

‣ 10+ years bridging the “gap” between science, IT & high performance computing

Page 7: 2013: Trends from the Trenches

Apologies in advance

7

If you have not heard me speak ...

‣ “Infamous” for speaking very fast and carrying a huge slide deck

• ~70 slides for 25 minutes about average for me

• Let me mention what happened after my Pharma HPC best practices talk yesterday ...

By the time you see this slide I’ll be on my ~4th espresso

Page 8: 2013: Trends from the Trenches

8

Why I do this talk every year ...

‣ Bioteam works for everyone

• Pharma, Biotech, EDU, Nonprofit, .Gov, etc.

‣ We get to see how groups of smart people approach similar problems

‣ We can speak honestly & objectively about what we see “in the real world”

Page 9: 2013: Trends from the Trenches

Listen to me at your own risk

9

Standard Dag Disclaimer

‣ I’m not an expert, pundit, visionary or “thought leader”

‣ Any career success entirely due to shamelessly copying what actual smart people do

‣ I’m biased, burnt-out & cynical

‣ Filter my words accordingly

Page 10: 2013: Trends from the Trenches

10

So why are you here?And before 9am!

Page 11: 2013: Trends from the Trenches

11

It’s a risky time to be doing Bio-IT

Page 12: 2013: Trends from the Trenches

12

Big Picture / Meta Issue

‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed

• Example: CCD sensor upgrade on that confocal microscopy rig just doubled storage requirements

• Example: The 2D ultrasound imager is now a 3D imager

• Example: Illumina HiSeq upgrade just doubled the rate at which you can acquire genomes. Massive downstream increase in storage, compute & data movement needs

‣ For the above examples, do you think IT was informed in advance?

Page 13: 2013: Trends from the Trenches

Science progressing way faster than IT can refresh/changeThe Central Problem Is ...

‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure

• Bench science is changing month-to-month ...• ... while our IT infrastructure only gets refreshed every

2-7 years

‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...)

13

Page 14: 2013: Trends from the Trenches

The Central Problem Is ...

‣ The easy period is over

‣ 5 years ago we could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary

‣ That does not work any more; real solutions required

14

Page 15: 2013: Trends from the Trenches

15

The new normal.

Page 16: 2013: Trends from the Trenches

And a related problem ...

‣ It has never been easier to acquire vast amounts of data cheaply and easily

‣ Growth rate of data creation/ingest exceeds rate at which the storage industry is improving disk capacity

‣ Not just a storage lifecycle problem. This data *moves* and often needs to be shared among multiple entities and providers

• ... ideally without punching holes in your firewall or consuming all available internet bandwidth

16

Page 17: 2013: Trends from the Trenches

If you get it wrong ...

‣ Lost opportunity‣ Missing capability‣ Frustrated & very vocal scientific staff‣ Problems in recruiting, retention,

publication & product development

17

Page 18: 2013: Trends from the Trenches

18

Enough groundwork. Lets Talk Trends*

Page 19: 2013: Trends from the Trenches

19

Topic: DevOps & Org Charts

Page 20: 2013: Trends from the Trenches

20

The social contract betweenscientist and IT is changing forever

Page 21: 2013: Trends from the Trenches

21

You can blame “the cloud” for this

Page 22: 2013: Trends from the Trenches

22

DevOps & Scriptable Everything

‣ On (real) clouds, EVERYTHING has an API

‣ If it’s got an API you can automate and orchestrate it

‣ “scriptable datacenters” are now a very real thing

Page 23: 2013: Trends from the Trenches

23

DevOps & Scriptable Everything

‣ Incredible innovation in the past few years

‣ Driven mainly by companies with massive internet ‘fleets’ to manage

‣ ... but the benefits trickle down to us little people

Page 24: 2013: Trends from the Trenches

24

DevOps will conquer the enterprise

‣ Over the past few years cloud automation/orchestration methods have been trickling down into our local infrastructures

‣ This will have significant impact on careers, job descriptions and org charts

Page 25: 2013: Trends from the Trenches

2013: Continue to blur the lines between all these roles

25

Scientist/SysAdmin/Programmer

‣ Radical change in how IT is provisioned, delivered, managed & supported

• Technology Driver: Virtualization & Cloud

• Ops Driver:Configuration Mgmt, Systems Orchestration & Infrastructure Automation

‣ SysAdmins & IT staff need to re-skill and retrain to stay relevant

www.opscode.com

Page 26: 2013: Trends from the Trenches

2013: Continue to blur the lines between all these roles

26

Scientist/SysAdmin/Programmer

‣ When everything has an API ...

‣ ... anything can be ‘orchestrated’ or ‘automated’ remotely

‣ And by the way ...‣ The APIs (‘knobs &

buttons’) are accessible to all, not just the bearded practitioners sitting in that room next to the datacenter

Page 27: 2013: Trends from the Trenches

2013: Continue to blur the lines between all these roles

27

Scientist/SysAdmin/Programmer

‣ IT jobs, roles and responsibilities are going to change significantly

‣ SysAdmins must learn to program in order to harness automation tools

‣ Programmers & Scientists can now self-provision and control sophisticated IT resources

Page 28: 2013: Trends from the Trenches

2013: Continue to blur the lines between all these roles

28

Scientist/SysAdmin/Programmer

‣ My take on the future ...• SysAdmins (Windows & Linux) who

can’t code will have career issues • Far more control is going into the

hands of the research end user • IT support roles will radically change

-- no longer owners or gatekeepers

‣ IT will “own” policies, procedures, reference patterns, identity mgmt, security & best practices

‣ Research will control the “what”, “when” and “how big”

Page 29: 2013: Trends from the Trenches

29

Topic: Facility Observations

Page 30: 2013: Trends from the Trenches

30

Facility 1: Enterprise vs Shadow IT

‣ Marked difference in the types of facilities we’ve been working in

‣ Discovery Research systems are firmly embedded in the enterprise datacenter

‣ ... moving away from “wild west” unchaperoned locations and mini-facilities

Page 31: 2013: Trends from the Trenches

31

Facility 2: Colo Suites for R&D

‣ Marked increase in use of commercial colocation facilities for R&D systems

• And they’ve noticed!- Markly Group (One Summer) has a booth

- Sabey is on this afternoon’s NYGenome panel

‣ Potential reasons:• Expensive to build high-density hosting at small scale• Easier metro networking to link remote users/sites• Direct connect to cloud provider(s)• High-speed research nets only a cross-connect away

Page 32: 2013: Trends from the Trenches

32

Facility 3: Some really old stuff ...

‣ Final facility observation

‣ Average age of infrastructure we work on seems to be increasing

‣ ... very few aggressive 2-year refresh cycles these days‣ Potential reasons

• Recession & consolidation still effecting or deferring major technology upgrades and changes

• Cloud: local upgrades deferred pending strategic cloud decisions• Cloud: economic analysis showing stark truth that local setups

need to be run efficiently and at high utilization in order to justify existence

Page 33: 2013: Trends from the Trenches

33

Facility 3: Virtualization

‣ Every HPC environment we’ve worked on since 2011has included (or plans to include) a local virtualization environment

• True for big systems: 2k cores / 2 petabyte disk

• True for small systems: 96 core CompChem cluster

‣ Unlikely to change; too many advantages

Page 34: 2013: Trends from the Trenches

34

Facility 3: Virtualization

‣ HPC + Virtualization solves a lot of problems• Deals with valid biz/scientific need for researchers to

run/own/manage their own servers ‘near’ HPC stack

‣ Solves a ton of research IT support issues• Or at least leaves us a clear boundary line

‣ Lets us obtain useful “cloud” features without choking on endless BS shoveled at us by “private cloud” vendors

• Example: Server Catalogs + Self-service Provisioning

Page 35: 2013: Trends from the Trenches

35

Topic: Compute

Page 36: 2013: Trends from the Trenches

36

Compute:

‣ Still feels like a solved problem in 2013

‣ Compute power is a commodity

• Inexpensive relative to other costs

• Far less vendor differentiation than storage

• Easy to acquire; easy to deploy

Page 37: 2013: Trends from the Trenches

Fat nodes are wiping out small and midsized clusters

37

Compute: Fat Nodes

‣ This box has 64 CPU Cores• ... and up to 1TB of RAM

‣ Fantastic Genomics/Chemistry system

• A 256GB RAM version only costs $13,000*

‣ BioIT Homework:• Go visit the Sillicon Mechanics

booth and find out the current cost of a box with 1TB RAM

Page 38: 2013: Trends from the Trenches

Possibly the most significant ’13 compute trend

38

Page 39: 2013: Trends from the Trenches

Defensive hedge against Big Data / HDFS

39

Compute: Local Disk is Back

‣ We’ve started to see organizations move away from blade servers and 1U pizza box enclosures for HPC

‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available

‣ Why? Hadoop & Big Data‣ This is a defensive hedge against future

HDFS or similar requirements• Remember the ‘meta’ problem - science is

changing far faster than we can refresh IT. This is a defensive future-proofing play.

‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count

Page 40: 2013: Trends from the Trenches

40

Topic: Network

Page 41: 2013: Trends from the Trenches

41

Network:

‣ 10 Gigabit Ethernet still the standard

• ... although not as pervasive as I predicted in prior trend talks

‣ Non-Cisco options attractive• BioIT homework: listen to the Arista

talks and visit their booth.

‣ SDN still more hype than reality in our market

• May not see it until next round of large private cloud rollouts or new facility construction (if even)

Page 42: 2013: Trends from the Trenches

42

Network:

‣ Infiniband for message passing in decline

• Still see it for comp chem, modeling & structure work; Started building such a system last week

• Still see it for parallel and clustered storage

• Decline seems to match decreasing popularity of MPI for latest generation of informatics and ‘omics tools

‣ Hadoop / HDFS seems to favor throughput and bandwidth over latency

Page 43: 2013: Trends from the Trenches

43

Topic: Storage

Page 44: 2013: Trends from the Trenches

44

Storage

‣ Still the biggest expense, biggest headache and scariest systems to design in modern life science informatics environments

‣ Most of my slides for last year’s trends talk focused on storage & data lifecycle issues

• Check http://slideshare.net/chrisdag/ if you want to see what I’ve said in the past

• Dag accuracy check: It was great yesterday to see DataDirect talking about the KVM hypervisor running on their storage shelves! I’m convinced more and more apps will run directly on storage in the future

‣ ... not doing that this year. The core problems and common approaches are largely unchanged and don’t need to be restated

Page 45: 2013: Trends from the Trenches

45

It’s 2013, we know what questions to ask of our storage

Page 46: 2013: Trends from the Trenches

46

Data like this lets us make realistic capacity planning and purchase decisions

NGS new data generation: 6-month window

Page 47: 2013: Trends from the Trenches

47

Storage: 2013

‣ Advice: Stay on top of the “compute nodes with many disks” trends.

‣ HDFS if suddenly required by your scientists can be painful to deploy in a standard scale-out NAS environment

Page 48: 2013: Trends from the Trenches

48

Storage: 2013

‣ Object Storage is getting interesting

Page 49: 2013: Trends from the Trenches

Object Storage + Commodity Disk Pods

49

Storage: 2013

‣ Object storage is far more approachable• ... used to see it in proprietary solutions for specific niche needs• potentially on it’s way to the mainstream now

‣ Why?• Benefits are compelling across a wide variety of interesting use cases• Amazon S3 showed what a globe-spanning general purpose object

store could do; this is starting to convince developers & ISVs to modify their software to support it

• www.swiftstack.com and others are making local object stores easy, inexpensive and approachable on commodity gear

• Most of your Tier1 storage and server vendors have a fully supported object store stack they can sell to you (or simply enable in a product you already have deployed in-house)

Page 50: 2013: Trends from the Trenches

50

Remember this disruptive technology example from last year?

Page 51: 2013: Trends from the Trenches

51

100 Terabytes for $12,000 (more info: http://biote.am/8p )

Page 52: 2013: Trends from the Trenches

52

Storage: 2013

‣ There are MANY reasons why you should not build that $12K backblaze pod

• ... done wrong you will potentially inconvenience researchers, lose critical scientific information and (probably) lose your job

‣ Inexpensive or open source object storage software makes the ultra-cheap storage pod concept viable

Page 53: 2013: Trends from the Trenches

53

Storage: 2013

‣ A single unit like this is risky and should only be used for well known and scoped use cases. Risks generally outweigh the disruptive price advantage

‣ However ...

‣ What if you had 3+ of these units running an object store stack with automatic triple location replication, recovery and self-healing?

• Then things get interesting• This is one of the ‘lab’ projects I hope to work on in ’13

Page 54: 2013: Trends from the Trenches

54

Storage: 2013

‣ Caveat/Warning• The 2013 editions of “backblaze-like” enclosures mitigate

many of the earlier availability, operational and reliability concerns

• Still a aggressive play that carries risk in exchange for a disruptive price point

‣ There is a middle ground• Lots of action in the ZFS space with safer & more mainstream

enclosures• BioIT Homework: Visit the Silicon Mechanics booth and

check out what they are doing with Nexenta’s Open Storage stuff.

Page 55: 2013: Trends from the Trenches

55

Topic: Cloud

Page 56: 2013: Trends from the Trenches

56

Can you do a Bio-IT talk without using the ‘C’ word?

Page 57: 2013: Trends from the Trenches

57

Cloud: 2013

‣ Our core advice remains the same‣ What’s changed

Page 58: 2013: Trends from the Trenches

Core Advice

58

Cloud: 2013

‣ Research Organizations need a cloud strategy today

• Those that don’t will be bypassed by frustrated users

‣ IaaS cloud services are only a departmental credit card away ... and some senior scientists are too big to be fired for violating IT policy

Page 59: 2013: Trends from the Trenches

Design Patterns

59

Cloud Advice

‣ You actually need three tested cloud design patterns:

‣ (1) To handle ‘legacy’ scientific apps & workflows

‣ (2) The special stuff that is worth re-architecting

‣ (3) Hadoop & big data analytics

Page 60: 2013: Trends from the Trenches

Legacy HPC on the Cloud

60

Cloud Advice

‣ MIT StarCluster• http://web.mit.edu/star/cluster/

‣ This is your baseline‣ Extend as needed

Page 61: 2013: Trends from the Trenches

“Cloudy” HPC

61

Cloud Advice

‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver

‣ This is where you have the most freedom‣ Many published best practices you can borrow‣ Warning: Cloud vendor lock-in potential is

strongest here

Page 62: 2013: Trends from the Trenches

What you need to know

62

Hadoop & “Big Data”

‣ “Hadoop” and “Big Data” are now general terms

‣ You need to drill down to find out what people actually mean

‣ We are still in the period where senior leadership may demand “Hadoop” or “BigData” capability without any actual business or scientific need

Page 63: 2013: Trends from the Trenches

What you need to know

63

Hadoop & “Big Data”

‣ In broad terms you can break “Big Data” down into two very basic use cases:

1. Compute: Hadoop can be used as a very powerful platform for the analysis of very large data sets. The google search term here is “map reduce”

2. Data Stores: Hadoop is driving the development of very sophisticated “no-SQL” “non-Relational” databases and data query engines. The google search terms include “nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc.

‣ Your job is to figure out which type applies for the groups requesting “Hadoop” or “BigData” capability

Page 64: 2013: Trends from the Trenches

What has changed ..Cloud: 2013

‣ Lets revisit some of my bile from prior years‣ “... private clouds: still utter crap”‣ “... some AWS competitors are delusional

pretenders”‣ “... AWS has a multi-year lead on the

competition”

64

Page 65: 2013: Trends from the Trenches

Private Clouds in 2013:

‣ I’m no longer dismissing them as “utter crap”

‣ Usable & useful in certain situations

‣ BioTeam positive experiences with OpenStack‣ Hype vs. Reality ratio still wacky

‣ Sensible only for certain shops• Have you seen what you have to do

to your networks & gear?

‣ Still important to remain cynical and perform proper due dillegenge

Page 66: 2013: Trends from the Trenches

Non-AWS IaaS in 2013

‣ Three main drivers for BioTeam’s evolving IaaS practices and thinking for 2013:

‣ (1) Real world success with OpenStack & BT

‣ (2) Real world success with Google Compute

‣ (3) Real world multi-cloud DevOps

‣ Just to remain honest though:• AWS still has multi-year lead in product, service and features• .. and many novel capabilities• But some of the competition has some interesting benefits that AWS can’t match

Page 67: 2013: Trends from the Trenches

BioTeam, BT & OpenStack

‣ We’ve been working with BT for a while now on various projects

‣ BT Cloud using OpenStack under the hood with some really nice architecture and operational features

‣ BioTeam developed a Chef-based HPC clustering stack and other tools that are currently being used by BT customers

• ... some of whom have spoken openly at this meeting

Page 68: 2013: Trends from the Trenches

BioTeam & Google Compute Engine

‣ We can’t even get into the preview program‣ But one of our customers did‣ ... and we’ve been able to do some successful and

interesting stuff• Without changing operations or DevOps tools our client is capable of

running both on AWS and Google Compute

• For this client and a few other use cases we believe we can span both clouds or construct architectures that would enable fast and relatively friction-free transitions

Page 69: 2013: Trends from the Trenches

Wrapping up ...Chef, AWS, OpenStack & Google

‣ 2012 was the 1st year we did real work spanning multiple IaaS cloud platforms or at least replicating workloads on multiple platforms

‣ We’ve learned a lot - I think this may result in some interesting talks at next year’s Bio-IT meeting

- By BioTeam and actual end-users

‣ What makes this all possible is the DevOps / Orchestration stuff mentioned at the beginning of this presentation.

Page 70: 2013: Trends from the Trenches

70

end; Thanks!Slides: http://slideshare.net/chrisdag/