introduction to apache accumulo

Post on 19-Jul-2015

154 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

How to use this presentation

• Covered topics: Accumulo architecture, operational maintenance, fault handling

• Intended Audience: Developers, supporters, PMs who are conversant in multi-component systems, i.e. involved in web services.

• Presumes familiarity with RDBMS

• Expected running time: 40 - 60 minutes

• License: CC-BY-SA 2.0

• Please let me know if you find it useful and what it could use: busbey@cloudera.com

Introduction to Apache AccumuloScaling a web application made easier

Sean Busbey // Software Engineer

3© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Let’s talk about Apache Accumulo…

4© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

But in the context of a specific use case

•I really like technology that solves a problem.

•Keep in mind that this won’t be exhaustive.

•YMMV, proof-of-concepts with metrics are better than slides.

5© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Who am I?

• Apache Accumulo PMC

• Apache HBase committer

• Software Engineer on Cloudera’s storage team

6© 2015 Cloudera licensed CC-BY-SA 2.0

That is to say, I work for a vendor and no longer have operational scale problems of my own.

We’ll focus on an application that enables conversations centered on cute cats.

8© 2015 Cloudera licensed CC-BY-SA 2.0

9© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Simple sharing model built with privacy

controls

•User defines a group that may see their posting

•User posts a picture to a given group

•Members of the group may write short messages

10© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Straight forward web architecture

11© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Relational Data Model

Will map user names to identifiers used elsewhere.

Will track ownership and descriptive name.

Will allow users to add and remove members.

User table Group table Group membership table

12© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Relational Data Model

Tracks distribution group, owner, and topical image.

Individual comments from users.

Topic table Comment table

13© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

First growth: robustness

14© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

First growth: robustness

15© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Second growth: application scale out

16© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Scaling reads: what goes into this page?

17© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Database reads eventually become a

bottleneck

18© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Scale by de-normalizing in favor of reads

19© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Change to writes - original

20© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Change to writes – de-normalized

Generally known

as the fan-out

pattern.

21© 2015 Cloudera licensed CC-BY-SA 2.0

22© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

The trick is to not get crushed by the writes

•Each poster now does a write for each member of the group a post goes to.

•Removing access is now a much larger delete query.

•Most databases are geared toward few writes and many reads; are we screwed?

23© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Recall our access pattern

Basically one of

these consumer

boxes.

24© 2015 Cloudera licensed CC-BY-SA 2.0

25© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Lines up very well with sharding

•Divide the query space up by e.g. a hash of user id into n shards.

•Store a copy of table on each shard, but just for user ids that hash to that shard.

•Reads and writes are spread across instances.

26© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Database shards Layout

27© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

What were the nice-to-haves for the RDBMS

again?

• No longer leveraging relational data model.

• Now running, backing up, and failing over num shards number of database instances.

• Robustness in a shard has to be managed.

• Sharding is essentially static; adding more resources with growth still painful.

28© 2015 Cloudera licensed CC-BY-SA 2.0

Now we have some context for Accumulo.Our goal is to end up with less operational overhead.

29© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/

30© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Accumulo-based App Layout

31© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/

32© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

In Accumulo, you address cells rather than

records

Key Valu

e

33© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Keys are multi-dimensional

Key Valu

eRo

w

Column Tim

e

34© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Keys are multi-dimensional

Key Valu

eRo

w

Column Tim

eFamily Qualifier Visibility

35© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Accumulo doesn’t assume a schema

•All key and value components, save time, are byte[]

•The application is responsible for serialization

•Common to use different serialization for the values in different columns.

36© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Mapping records to cells

•Treat a row as a database

• Essentially each column is a record field

•Treat each cell as a database record

• Need to uniquely identify each record

• Useful if you generally need the whole row and not a subset of columns

• Can then treat each row as a shard of database records.

37© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Let’s use a concrete example.

38© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Already know our reads are within a shard.

39© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Mapping our data into cells

Key Value

Row Column Family Column Qualifier Visibility author, image url,

and commentreader id discussion id comment order group id

40© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

We end up with something close to our

original.

41© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Note the use of visibility

42© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Visibility enforcement

•At scan time, our application will pass in the groups for the current user.

•Accumulo will filter any cells that don’t match those groups.

• Group removal is a simple update in the group management system again.

43© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Sparse column storage

•We are creating lots of columns: per discussion per group member.

•Accumulo only stores columns that exist in a given row.

44© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/

45© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

All cells sorted according to key

• Total ordering based on lex-sort of raw byte arraysof key components.

• Time is sorted most-recent-first

• Reads are done on a contiguous range of cells.

46© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

When sorted our data looks like this….

47© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

And the scan for a page is roughly…

48© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Lexicoders

• Turning different kinds of data into sortable bytes is painful

• Accumulo ships implementations for several common Java types

• Also for e.g. reversing the sort order and building compound keys.

49© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Inefficiencies in our data model

Key Value

Row Column Family Column Qualifier Visibility author, image url,

and commentreader id discussion id comment order group id

50© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Two categories of data

Key Value

Row Column Family Column Qualifier Visibility author, image url

reader id discussion id image group id

Key Value

Row Column Family Column Qualifier Visibility author, comment

reader id discussion id text group id

51© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

And now our data looks like this

52© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

And the scan for a page covers less data

53© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

“The Apache Accumulo™ sorted, distributedkey/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/

54© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Our simplified diagram

55© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Slightly less simplified

56© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Back to the data model

Key Valu

eRo

w

Column Tim

eFamily Qualifier Visibility

57© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Back to the data model

Key Valu

eRo

w

Column Tim

eFamily Qualifier Visibility

58© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Rows are grouped into Tablets

• Tablet is defined by a start and end row

• All cells for a given row must be in the same Tablet.

59© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Tablets are assigned to Tablet Servers

• At any given point in time, a Tablet is serviced by a single Tablet Server

60© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Slightly less simplified

61© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Tablets are assigned to Tablet Servers

• At any given point in time, a Tablet is serviced by a single Tablet Server

• That server is responsible for client reads and writes to all hosted Tablets

• Finding the proper server is handled by the Accumulo libraries

• Proper key design means io load gets spread across multiple machines

62© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/

63© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Tablet assignment is not static

• Assignment tend to have steady state

• But can move in the event of new resources or failure

64© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Remember our RDBMS scaling?

65© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

New RDBMS shard

1. Provision hardware for service

2. Rewrite data under new sharding

3. Update application services

• Doing this without an outage is hard work (and well paid if you can get it)

66© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

New Accumulo Tablet Server

1. Provision hardware for service

2. Add server to cluster

3. Tablets automatically migrate from busier nodes to new node

• No outage from client perspective.

67© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/

68© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

All distributed systems have communication

failures

In the face of such a failure you can either

• remain available on remaining nodes to all clients

• provide a consistent view of updates to a subset of clients

69© 2015 Cloudera licensed CC-BY-SA 2.0

Now you know the basics of CAPRemember that you can’t give up partition tolerance

70© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Remember our RDBMS robustness?

71© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Accumulo is a CP system

• Tablet Servers ensure that updates have been written to a distributed write-ahead-log before acknowledging

• Tablet Server failures are automatically detected

• Newly assigned hosts for recovered Tablets then replay edits up until last ack before serving new requests

72© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

73© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Client write

74© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Write goals

• Low latency ack

• Don’t lose acked writes in face of node failure

75© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Client write

1

76© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Client write

1

2

77© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Client write

1

2

3

78© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

79© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

80© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

81© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

82© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

83© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Recovery timing

• Tunable time to detection – increases network load

• Size of outstanding write ahead logs

84© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Client write

1

2

3

4

85© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Accumulo-based App Layout

86© 2015 Cloudera licensed CC-BY-SA 2.0

What’s the catch?

87© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0

Gaps

• Still requires application updates to use API – no interactive SQL bindings*

• No Disaster Recovery – coming in next minor release

Thank you.

Mr. Mean photo from mockup is © 2004 Flickr user aznewbeginning; cc-by-sa 2.0 https://flic.kr/p/4uzdRc

top related