automating big data with the automic hadoop agent

22
Automic World 2015 Automating Big Data with the Hadoop Agent Dave Kellermanns Chief Automation Architect

Upload: automic-software

Post on 15-Apr-2017

412 views

Category:

Presentations & Public Speaking


4 download

TRANSCRIPT

Page 1: Automating Big Data with the Automic Hadoop Agent

Automic World 2015

Automating Big Data with the Hadoop Agent

Dave KellermannsChief Automation Architect

Page 2: Automating Big Data with the Automic Hadoop Agent

2 Property of Automic Software. All rights reserved

Page 3: Automating Big Data with the Automic Hadoop Agent

3 Property of Automic Software. All rights reserved

Every day, we create 2.5 quintillion (18 zeroes !) bytes of data

So much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This is called “Internet of the Things”. Connect all together. But the data is called

BIG DATA

What is Big Data ?

Source.Forbes.com

Page 4: Automating Big Data with the Automic Hadoop Agent

4 Property of Automic Software. All rights reserved

Think you can avoid Big Data?

The Big Data technology and services market represents a fast-growing multibillion-dollar worldwide opportunity [...] that will grow at a 26.4% compound annual growth rate to $41.5 billion through 2018, or about six times the growth

rate of the overall information technology market […]

IDC - 2015

Page 5: Automating Big Data with the Automic Hadoop Agent

5 Property of Automic Software. All rights reserved

• Make better, more quantitative decisions

• Reach new levels of profits, efficiently

• Predict with unprecedented accuracy to influence

business outcomes

• Deliver highly personalized customer experiences at

massive scale

• Make new discoveries using massive amounts of data

• Recognize new revenue streams from digital exhaust

Why are companies focused right now on Big Data ?

Page 6: Automating Big Data with the Automic Hadoop Agent

6 Property of Automic Software. All rights reserved

Where does Big Data fit into the Enterprise?

Page 7: Automating Big Data with the Automic Hadoop Agent

7 Property of Automic Software. All rights reserved

• Big data technologies must be integrated with more traditional data systems and sources

• Efficient Dev-Test-Prod change control needs to be implemented end-to-end

• Administration, development, operations, and analytics must all need tools tailored to their roles to maximize

• Automation is a core requirement for making these complex systems accessible. It has to be easy to use and customizable

Simplifying user experience and procedures

Page 8: Automating Big Data with the Automic Hadoop Agent

8 Property of Automic Software. All rights reserved

A conflict in the skillset of analysts vs data engineers

People running the data platform

<workflow-app xmlns="uri:workflow:0.4" name="hive-add-partition-searchevents-wf"> <start to="hive-add-partition-searchevents" /> <action name="hive-add-partition-searchevents" retry-max="1" retry-interval="1"> <hive xmlns="uri:oozie:hive-action:0.4"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> ... ... <script>add_partition_hive_searchevents_script.q</script> <param>YEAR=${YEAR}</param> <param>MONTH=${MONTH}</param> <param>DAY=${DAY}</param> <param>HOUR=${HOUR}</param> </hive> <ok to="end" /> <error to="fail" /> </action><bundle-app name='BundleApp-LoadAndIndexTopCustomerQueries' xmlns='uri:oozie:bundle:0.2'> <controls> <kick-off-time>${jobStart}</kick-off-time> </controls> <coordinator name='CoordApp-LoadCustomerQueries' > <app-path>${coordAppPathLoadCustomerQueries}</app-path> </coordinator>

<coordinator name='CoordApp-IndexTopQueriesES' > <app-path>${coordAppPathIndexTopQueriesES}</app-path> </coordinator></bundle-app>....<coordinator-app name="CoordApp-LoadCustomerQueries" frequency="${coord:days(1)}" start="${jobStart}" end="${jobEnd}" timezone="UTC" xmlns="uri:oozie:coordinator:0.2"> ... <action> <workflow> <app-path>${workflowRoot}/hive-action-load-customerqueries.xml </app-path> </workflow> </action></coordinator-app>...<coordinator-app name="CoordApp-IndexTopQueriesES" frequency="${coord:days(1)}" start="${jobStartIndex}" end="${jobEnd}" timezone="UTC" xmlns="uri:oozie:coordinator:0.2"> ... <action> <workflow>

Automic helps to bridge the gap between the skillsets of the people who need the tool and the skillsets required to run the tool

People wanting data

Page 9: Automating Big Data with the Automic Hadoop Agent

9 Property of Automic Software. All rights reserved

Hadoop Open Source

“The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.”

“Open source as a development model promotes a universal access via a free license to a product's design or blueprint, and universal redistribution

of that design or blueprint, including subsequent improvements to it by anyone”

Page 10: Automating Big Data with the Automic Hadoop Agent

10 Property of Automic Software. All rights reserved

Many people work on Hadoop

Page 11: Automating Big Data with the Automic Hadoop Agent

11 Property of Automic Software. All rights reserved

3 Releases of the Hadoop Platform

Page 12: Automating Big Data with the Automic Hadoop Agent

12 Property of Automic Software. All rights reserved

New capabilities keep on coming

Page 13: Automating Big Data with the Automic Hadoop Agent

13 Property of Automic Software. All rights reserved

APIs do change constantly

Page 14: Automating Big Data with the Automic Hadoop Agent

14 Property of Automic Software. All rights reserved

© Automic. All rights reserved.

Configuration & Objects

Page 15: Automating Big Data with the Automic Hadoop Agent

15 Property of Automic Software. All rights reserved

Proven value for Data Automation

Improve Decisions

Business & Operational Intelligence

Data Warehousing

Big Data

Call centre performance

Hadoop Big Data

automation

Data Ingestion

across IaaS

Fast Cognos Analytics delivery

POS data mining, ETL

& MFT

Page 16: Automating Big Data with the Automic Hadoop Agent

16 Property of Automic Software. All rights reserved

Proven Value for Data Automation

Self-service platform for

data scientists

We use Automic in our data center to define dependencies between various jobs between our data center and the cloud, and run them as ‘process flows’.

Automic ensures that the right data is delivered on time to Data Scientists. This requires approximately 6,000 jobs per day.

Ashi ShethManger of Enterprise Services, Netflix

Page 17: Automating Big Data with the Automic Hadoop Agent

17 Property of Automic Software. All rights reserved

Business Benefit to NetflixTo “Give Viewers What They Want”

Collect hundreds of terabytes of data daily

Petabyte-scale

Platform Engineers

… build templates and workflows using ONE Automation

… enable data scientists to perform all kinds of ad hoc analysis without having to deal with the complexity of the underlying data infrastructure

Automic

1 2

• >50m subscribers• >40 countries

Recommendation EngineData Scientists… perform data-driven experiments and tests on a daily basis

… and many other tools

using… to improvethe quality of recommendations

… resultingin happycustomers!

3 4

Page 18: Automating Big Data with the Automic Hadoop Agent

18 Property of Automic Software. All rights reserved

eBay relies on Automic

If Automic goes down eBay loses 70% of their web traffic to Amazon

– Automic automates Hadoop for eBay which provides all of their business intelligence for optimized SEO

– Automic moves data, schedules the map reduce, schedules the analytics and then pushes the output to Google

Page 19: Automating Big Data with the Automic Hadoop Agent

19 Property of Automic Software. All rights reserved

Automating ebay Data Warehouse Platforms

ebay DW environment

Teradata:– Mozart: 2.6PB(used storage)/6.6PB(total storage)– Martini: 1.4PB used, 8.5PM total – EDW concurrent queries: 500+

Singularity (eBay specific TD):– Vivaldi: 9.5PB(used storage) /16.9PB (total storage)– Davinci:2.5PM used, 3.4PB total 

• SG concurrent queries:100+

Hadoop:– Hadoop Total: 71.5PB /91.9PB (used storage / total storage)– Hadoop Ares: 29.5PB /41.4PB, Hadoop Apollo: 32.2PB /37.8PB,

Hadoop Artemis: 9.8PB/11.9PB– Hadoop concurrent jobs running: 1000+  Source: http://www.slideshare.net/madananil/hadoop-at-ebay

Page 20: Automating Big Data with the Automic Hadoop Agent

20 Property of Automic Software. All rights reserved

Automic’s Value to Big Data

• We help our customers to get out of the scripting business by abstracting the APIs from the

user by using Hadoop templates

• Current functionality can be extended by Automic and Users alike and in turn distributed via

Automic’s Marketplace, so there is no need to wait for vendors to catch up and release a

new Agent for new APIs (think Falcon, Ranger, Knox, Ambari, Cloudbreak, etc.)

• Automic and it’s Objects are agnostic – templates work with Hortonworks, Cloudera, MapR

– they can even help you transition between Hadoop distributions

Page 21: Automating Big Data with the Automic Hadoop Agent

21 Property of Automic Software. All rights reserved

Contact

Dave KellermannsChief Automation Architect

[email protected]

+1 (720) 440-2838

Page 22: Automating Big Data with the Automic Hadoop Agent

Thank you!