spark integration into an enterprise stack

15
Spark Integration Into an Enterprise Stack Open Source Successes & Challenges

Upload: lilia

Post on 25-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Spark Integration Into an Enterprise Stack. Open Source Successes & Challenges. About the presenter. Zen-Empiricist Director of WANdisco Bigdata Engineering In charge of delivering company’s enterprise grade NonStop Hadoop solution ASF Hadoop , MRUnit committer - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spark  Integration Into an Enterprise Stack

Spark Integration Into an Enterprise StackOpen Source Successes & Challenges

Page 2: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt

Zen-Empiricist Director of WANdisco Bigdata Engineering

– In charge of delivering company’s enterprise grade NonStop Hadoop solution ASF Hadoop, MRUnit committer ASF Bigtop’s co-author Spark/Shark contributor Apply with caution: highly abrasive (according to most - now former -

managers)

/ page 2

Konstantin (Cos) BoudnikAbout the presenter

Page 3: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt

Most apparent characteristics:– Fail-fast on your own dime– Hard or impossible to control by authority (!)– Resistant to political correctness bias (aka political bulls#$t)– Creates huge competitive advantage

Resulting in– Highly successful projects– Innovations up to the limit– Technologically disruptive– Rules the world (once matured)

Empirical evidences:– Everything on the planet is “Powered by Linux”– “Bad” news: Android market share will never double again– Firefox is THE web-browser of the world

I ran out of the slide space and my time slot is limited...

/ page 3

Anarchy: ἀν + ἀρχός (an + arkhos) without a rulerOpen-source is a force of natural evolution

Page 4: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt

Open => anyone can do what they’re most interested in doing Innovative => creates formats & standards as it goes; abandon them

in passing Stable => we’ll fix it in the next release, Backward compatible => we might break it, but we’ll fix it Fault tolerant and, at least, highly available => if you configure the

hell out of it Configuration management => shall scripts or Python to generate

configuration Deployment management (packages and Puppet) => here’s your

tarball Supported (there’s a throat to choke) => “Gone fishing!” Secure => million eyeballs will find all you bugs in no time

/ page 4

I am not bashing the open-source: it is my bread & butterWhat “open source” often-time is

Page 5: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt

Compatible with standards, scalable Stable: features set, release schedules, bug fixes, upgrades Backward compatible with itself Fault tolerant and, at least, highly available Configuration management (you know your environment) Deployment management (packages and Puppet) Supported (there’s a throat to choke) Secure … and more

/ page 5

What “enterprise grade” really isLet’s call spade a spade

Page 6: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt / page 6

The devils is in the detailsThe goals are aligned. How about semantics?

Characteristic Open Source EnterpriseOpen Agile Compatible with standards

Stable Bugs get fixed; “works for me”

RHEL: - not a single change since 1867

Innovative We have all cool features NaN

Backward compatible Easy upgrade to next release;fixed on “trunk”

Year 2013: - we have to run on JDK1.3

Fault tolerant & HA Let’s restart damn thing $100m/min in downtime costs

Configuration Mgmt A script, or sketchy docs Change of control, puppet, etc.

Deployment Mgmt A tarball Staging environments,long upgrade paths

Supported mailto:[email protected] A throat to choke

Page 7: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt

Open JDK7– Guess what? Not everybody are in love with Larry Ellison

Hive 0.11’ish– It is 3 light years ahead of Hive 0.9 and 5 light years behind an enterprise grade

Spark 0.8 – Hello Apache Incubation!

Shark 0.8’ish

/ page 7

What we have builtCase study: major telecom SI

Page 8: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt / page 8

What it implies for the development and customers alikeHow the stack looks like?

Page 9: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt / page 9

Memory leaks: JobConf hold by ThreadLocalFixes that span multiple components

Page 10: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt

それが何を意味している

/ page 10

Semantic and toolset barriers between JVM languagesWhat does it mean?

Page 11: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt / page 11

Upstream components live their own lives oftentimesUnsynchronized release trains

Page 12: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt / page 12

I want everything on the menu! NOW!Impatient Customers

Page 13: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt / page 13

“Hold my beer!” (Famous last words)What else can possibly go wrong?

Page 14: Spark  Integration Into an Enterprise Stack

Shark Integration: Challenges and Lessons Learnt

Proper system integration– Git & well-thought branching model– ASF Bigtop as the integration point

Close collaboration with open source community– All fixes and features are offered to appropriate projects; most are accepted

Tireless and careful back-poring Continuous Integration and Delivery Simplifying development where is possible

– Switch from “org.apache.hive” to “edu.berkeley.cs.shark”– Keep open your version control system

Education and expectations management– “released” in open-source not always means “usable in the datacenter”

/ page 14

“What to do, what to do?” (r. Bender)Lessons learnt & principles applied

Page 15: Spark  Integration Into an Enterprise Stack

Contact: Samantha Leggat | t: 925.396.1194 | [email protected]

WANdisco, Bishop Ranch 8, 5000 Executive Pkwy, Suite 270, San Ramon, CA 94583

Thank [email protected]@c0sin