crawl, walk, run: how to get started with hadoop

21
Grab some coee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 16-Jul-2015

166 views

Category:

Technology


1 download

TRANSCRIPT

Grab some

coffee and

enjoy the

pre-show

banter before

the top of the

hour!

The Briefing Room

Crawl, Walk, Run: How to Get Started with Hadoop

Twitter Tag: #briefr The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Twitter Tag: #briefr The Briefing Room

  Reveal the essential characteristics of enterprise software, good and bad

  Provide a forum for detailed analysis of today’s innovative technologies

 Give vendors a chance to explain their product to savvy analysts

  Allow audience members to pose serious questions... and get answers!

Mission

Twitter Tag: #briefr The Briefing Room

Topics

This Month: HADOOP ECOSYSTEM

February: DATA IN MOTION

January: ANALYTICS

Twitter Tag: #briefr The Briefing Room

The Up Sides of Disruption

….Splice Machine?

Twitter Tag: #briefr The Briefing Room

Analyst: William McKnight

William is President of McKnight Consulting Group. His clients have included 17 of the Global 2000. Many clients have gone public with their success story. His team's implementations have won multiple Best Practices awards. William is an Entrepreneur of the Year Finalist, a frequent best practices judge and an expert witness. He has hundreds of articles and dozens of white papers in publication. William has also given numerous keynote presentations worldwide at major conferences and has given hundreds of public seminars and webinars. William’s experience includes taking his company to placement on the Inc. 500 and the Dallas 100 to seller of a multi-million dollar consulting firm. He is a passionate communicator and motivator, and a former IT VP of a Fortune 50 company.

Twitter Tag: #briefr The Briefing Room

Splice Machine

  Splice Machine is a SQL-on-Hadoop database

 The product is ACID-compliant and can power both OLAP and OLTP workloads

  Splice Machine is built on Java-based Apache Derby and Hbase/Hadoop

Twitter Tag: #briefr The Briefing Room

Guest: Rich Reimer

Rich Reimer, VP of Marketing and Product Management Rich has over 15 years of sales, marketing and management experience in high-tech companies. Before joining Splice Machine, Rich worked at Zynga as the Treasure Isle studio head, where he used petabytes of data from millions of daily users to optimize the business in real-time. Prior to Zynga, he was the COO and co-founder of a social media platform named Grouply. Before founding Grouply, Rich held executive positions at Siebel Systems, Blue Martini Software and Oracle Corporation as well as sales and marketing positions at General Electric and Bell Atlantic.

Twitter Tag: #briefr The Briefing Room

Perceptions & Questions

Analyst: William McKnight

Source: Intel

WHAT HAPPENS IN AN INTERNET MINUTE

FUELED BY DISRUPTIVE TECHNOLOGY FACTORS

Social Media

Cloud Computing

Mobile

Internet of Things

Big Data is the next Natural Resource “We have for the first time an economy based on a key resource (Information)

that is not only renewable, but self-generating.

Running out of it is not a problem, but drowning in it is.” — John Naisbitt

Transactional & Application Data

Machine Data Social Data Enterprise Content

•  Volume •  Structured

•  Throughput

•  Velocity •  Structured

•  Ingestion

•  Variety •  Unstructured

•  Veracity

•  Variety •  Unstructured

•  Volume

BIG DATA IS ADDITIVE TO EXISTING DATA

IF THIS WERE EASY, EVERYONE WOULD ALREADY BE LEVERAGING BIG DATA

“Big Data offers big business gains but hidden costs and complexity present barriers that most organizations will struggle with”

- The Cost of Big Data, Eric Savitz, Forbes 5/2012

§  Big data skills are in short supply §  Custom built solutions lack integrated management §  Companies need to get used to the open source nature of the software

that is enhanced by committers §  Requires integration effort within the existing analytic ecosystem §  Big data will be less valuable per capita than other data

  Source: 603 global decision-makers involved in business intelligence, data management, and governance initiatives Source: Forrsights Strategy Spotlight: Business Intelligence And Big Data, Q4 2012

14%

19%

3%

8%

7%

7%

21%

13%

“What best describes your firm’s current usage/plans to adopt big data technologies and solutions?”

Planning to implement in more than 1 year

Planning to implement in the next 12 months

Implemented, not expanding

Expanding/upgrading implementation

Average performers are

thinking about big data

Top performers are expanding their big data

implementations

Rest of organizations

(<15% growth) (N = 482)

High performance (>15% growth)

(N = 58)

TOP PERFORMERS (GREATER THAN 15% ANNUAL GROWTH) REALIZE THEY NEED MORE

VEHICLES FOR BIG DATA

Data Warehouse

Regional and Departmental

Views

ADS

Applications & Engines

Operational Analytics & Hot Views

Data Marts Independent

Dependent

Relational Data

Conformed Dimensions

Last Year

This Year

Next Year

THE EVER-EXPANDING DATA WAREHOUSE

•  Enterprise Data Warehouse users face huge annual upgrade expenses

•  To avoid this spend, organizations are looking for lower cost alternatives

•  Movement of data to tape not desired, because data is offline and not available for analytics

•  Moving infrequently used data to Hadoop is a cost-effective, online option that preserves ability to query

Cost

On the slide with the sad people overwhelming their RDBMS… how do we know when scale up has become cost prohibitive?

What data should get moved to the data warehouses and data marts and what data is fine left in the data lake?

Isn’t SQL-on-Hadoop SQL on HDFS? How is Splice Machine, as a SQL-on-Hadoop solution, giving the ‘best of

both worlds’? How do you get data with schema into the flat files of HDFS without ‘data

page’ style formatting? Is the best advantage of SQL-on-Hadoop having the full transformation

capabilities of ETL or ELT on the data? Is a data lake the best ‘on-ramp’ to big data or is data archival off RDBMS?

QUESTIONS FOR SPLICE MACHINE

Twitter Tag: #briefr The Briefing Room

Twitter Tag: #briefr The Briefing Room

Upcoming Topics

www.insideanalysis.com

This Month: HADOOP ECOSYSTEM

February: DATA IN MOTION

January: ANALYTICS

Twitter Tag: #briefr The Briefing Room

THANK YOU for your

ATTENTION!

Some images provided courtesy of Wikimedia Commons and Wikipedia