variety, velocity and volume · cap theorem nosql databases, cloud computing agility polyglot...

49
© 2011 by The 451 Group. All rights reserved Matthew Aslett Senior analyst, enterprise software [email protected] Variety, Velocity and Volume Meeting the performance challenges of Big Data in the enterprise

Upload: others

Post on 22-Aug-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Matthew Aslett Senior analyst, enterprise software [email protected]

Variety, Velocity and Volume Meeting the performance challenges of Big Data in the enterprise

Page 2: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Big Data, Total Data

2

What is it?

Current data management trends

What technologies are involved?

When to use them

The drivers behind emerging technology choices

Page 3: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved © 2011 by The 451 Group. All rights reserved

451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments.

The 451 Group

Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research.

The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities.

TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide.

ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.

Page 4: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Coverage areas

Matthew Aslett Senior analyst, enterprise software

With The 451 Group since 2007

[email protected]

www.twitter.com/maslett

Commercial Adoption of Open Source (CAOS) Adoption by enterprises

Adoption by vendors

Information Management Database

Data warehousing

Data caching

4

Page 5: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Data Management Trends

Page 6: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Current data management trends

6

The volume, variety and velocity of data is growing rapidly

Data processing capabilities have never been better

The value of data has never been better understood

The data deluge problem is also a big data opportunity

RISK OPPORTUNITY

Page 7: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

What is Big Data?

7

More than just rising data volumes

Big Data ≠ Volume

Page 8: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

What is Big Data?

8

Also variety of data types/sources and velocity of data updates

Big Data = Volume ± Variety ± Velocity

Page 9: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Current data management trends

9

The volume, variety and velocity of data is growing rapidly

Data processing capabilities have never been better

The value of data has never been better understood

‘Big Data’ covers a diverse set of products that can be applied to different problems

‘Big Data’ highlights the problem – volume/variety/velocity,

and promises a solution – value,

but doesn’t provide a path in between

RISK OPPORTUNITY

Page 10: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Total Data

Page 11: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

What is Total Data?

11

Not just another name for Big Data

A concept defined by The 451 Group to describe new approaches to data management – beyond restrictive silos

Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques

Inspired by ‘Total Football’

Page 12: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

What is Total Data?

12

Also the desire of the user to store and process all their data

Value = (Volume ± Variety ± Velocity) x Totality

Page 13: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

What is Total Data?

13

Within tolerable time frames

Value = (Volume ± Variety ± Velocity) x Totality

Time

Data

volume/

variety/

velocity

Rate of query

Value of data

Page 14: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

What is Total Data?

14

Within tolerable time frames

Value = (Volume ± Variety ± Velocity) x Totality

Time

Total Data is making the most efficient use of existing and new data management resources to deliver value from data

The technologies deployed depend on which factor is most significant to the problem and the nature of the query

Page 15: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Technology choices

Page 16: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Application stack

Hardware

Database

Users

Application

Page 17: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Traditional scalability

Database

Users

Application

Users

Application

Users

Application

Hardware

Page 18: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Commodity hardware

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Database

Application Application Application

Users Users Users

Page 19: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

User explosion

Users Users Users Users Users Users Users Users

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Database

Application Application Application

Page 20: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Application scalability

Users Users Users Users Users Users Users Users

Application Application Application Application Application Application

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Database

Page 21: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Data management use cases

Database

Operational

Analytic

Page 22: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Data management use cases

Data management

real-time transaction

and data ingestion

large scale data storage and analysis

Page 23: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Data management requirements

real-time transaction

and data ingestion

large scale data storage and analysis

Data ingestion/analysis • random reads and writes • real-time • low, predictable latency • high performance Data storage/analysis • lower-cost storage • large-scale analytics • read heavy • batch processing

Page 24: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Data management requirements

real-time transaction

and data ingestion

large scale data storage and analysis

Data ingestion/analysis • MySQL • Data caching • NoSQL, NewSQL databases • Stream processing Data storage/analysis • Data warehouse/marts • In-database analytics • Online repository • Hadoop

Page 25: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Emerging scalability

Users Users Users Users Users Users Users Users

Application Application Application Application Application Application

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Data ingestion/

analysis

Data

storage/analysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data

storage/analysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data storage/a

nalysis

Data storage/a

nalysis

Data storage/a

nalysis

Page 26: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Database SPRAIN

26

Scalability Hardware economics

NoSQL, NewSQL databases, Hadoop, cloud computing

Performance Database limitations

Data caching, in-memory, virtual machine, stream/event processing

Relaxed consistency

CAP Theorem NoSQL databases, cloud computing

Agility Polyglot persistence

Agile development, schema-free, non-relational, in-memory

Intricacy Big data, total data

Non-relational database, NoSQL, Hadoop, in-database analytics, memory storage

Necessity Open source The failure of incumbent vendors to address emerging requirements

Page 27: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Emerging scalability

Users Users Users Users Users Users Users Users

Application Application Application Application Application Application

Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware

Data ingestion/

analysis

Data

storage/analysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data

storage/analysis

Data ingestion/

analysis

Data storage/a

nalysis

Data ingestion/

analysis

Data storage/a

nalysis

Data storage/a

nalysis

Data storage/a

nalysis

Virtual machine

Virtual machine

Page 28: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

© 2011 by The 451 Group. All rights reserved

Relevant reports

Total Data

Explaining the total data management approach to dealing with the impact of big data on the data management landscape

Coming late 2011

Including the growing Hadoop ecosystem and real-time

[email protected]

COMING LATE 2011

Page 29: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

Gil Tene

CTO, Azul Systems

Enterprise Big Data

Java, Building Blocks and Performance Impacts

Page 30: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 2 2

What Azul does for the Big Data space

• We provide Java runtimes that scale consistently

• We eliminate the Garbage Collection problem

• Our runtimes are elastic in nature

• Our runtimes are an ideal building block for Big Data

Page 31: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 3 3

Many technologies support Big Data

• Operational databases / data warehouses/marts

• Data integration / data virtualization

• Business intelligence tools

• Hadoop / equivalent / alternative technology

• Relational / non-relational databases

• In-memory databases

• Data caching

• Stream / event processing

• In-database analytics

• Disk / memory storage

Page 32: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. | Azul Company Confidential 4

Value and infrastructure

Page 33: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 5 5

Value inflection points, Data

Data

Value

Volume, Velocity,

Variety

Value of data

Value = (Volume ± Variety ± Velocity) x Totality

Time

Page 34: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 6 6

Volume / Velocity/ Variety

translation to underlying capacity metrics

• Higher Volume = larger data set sizes, amounts of state

• Higher Velocity = higher processing rate, throughput

• Higher Variety = more metadata, indexes, cross-linking

Page 35: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. | Azul Company Confidential 7

Timeliness

Page 36: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 8 8

Inflection points – Timeliness

Data

Value

1/t

Value of data

Value = (Volume ± Variety ± Velocity) x Totality

Time

Page 37: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 9 9

Real world example of timeliness:

Affecting shopping decisions

I went [online] shopping for a camping trip last week

• Spent multi-$100: tent, sleeping bags, other items.

• Researched and shopped around for hours, across

multiple sites

• Relevant ads that showed up within my decision

window affected my shopping

• Ads that showed up a day later did not…

Page 38: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 10 10

Big Data’s value

real-time

transaction

and data

ingestion

large scale

data storage

and analysis

Interactive application

• random reads and writes

• real-time

• low, predictable latency

• high performance

Data storage/analysis

• lower-cost storage

• large-scale analytics

• read heavy

• batch processing

Page 39: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 11 11

Big Data’s value: enhanced by timeliness

real-time

transaction

and data

ingestion

large scale

data storage

and analysis

Interactive application

• random reads and writes

• real-time

• low, predictable latency

• high performance

Data storage/analysis

• lower-cost storage

• large-scale analytics

• read heavy

• batch processing?

Increased

Value

5 seconds

5 minutes

5 hours

1 day

1 week

Page 40: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. | Azul Company Confidential 12

Building blocks

Page 41: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 13 13

Choice of Building blocks size

Building block granularity

Page 42: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 14 14

Inflection points – building block size

Data

Value

Building

Block Size

Value of data

Value = (Volume ± Variety ± Velocity) x Totality

Time

Page 43: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 15 15

Building blocks

• There are value inflection points to building block size

─ At some size levels, orders-of-magnitude improvement occur

─ E.g. when entire index fits in each replica/memory space

─ E.g. when partition sizes are big enough

• Typical physical building block

─ 12-24 core, dual-socket servers

─ 24-96 GB DRAM

• Typical process level building block

─ A Java JVM

─ 1 - 4GB of memory, 1-2 cores

• Why the mismatch?

Page 44: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 16 16

Productivity with a scale challenge

• Variety of languages and technologies in Big Data

• Java/JVM is the enterprise default, highly productive

• But…

• Building blocks are practically limited in size

• Main limitation reason: Garbage Collection

• Main Garbage Collection issue: GC pause times

• Imposes practical limits due to stability/responsiveness

• Physical building blocks broken into 10s of tiny pieces

Page 45: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. | Azul Company Confidential 17

Critical infrastructure units

Page 46: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 18 18

Critical unit example - HDFS

Source: Apache Hadoop documentation

Page 47: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 19 19

Critical Infrastructure units

Size of critical infrastructure units drive cluster scale

• Central metadata nodes

• Central in-memory index nodes

• Central authoritative data nodes

• Graph-DB and dense relationship nodes

Page 48: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

©2011 Azul Systems, Inc. 20 20

Summary: Azul’s Impact on Big Data

Break the building block size barrier

• Each instance can elastically fill an entire physical unit

• Expose/leverage value inflection points

• Remove/Expand cluster scale limitations

• Reduce/Consolidate instance counts

• Address tuning challenges

• Ensure predictable performance

Page 49: Variety, Velocity and Volume · CAP Theorem NoSQL databases, cloud computing Agility Polyglot persistence Agile development, schema-free, non-relational, in-memory Intricacy Big data,

For More Information: Azul Systems Web site: www.azulsystems.com

Technical resources: ./resources

Zing trial: ./trial

451 Group: www.the451group.com

blogs.the451group.com/information_management/

Webinar replay:

www.azulsystems.com/resources/webinars