introduction to apache accumulo
TRANSCRIPT
1© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
How to use this presentation
• Covered topics: Accumulo architecture, operational maintenance, fault handling
• Intended Audience: Developers, supporters, PMs who are conversant in multi-component systems, i.e. involved in web services.
• Presumes familiarity with RDBMS
• Expected running time: 40 - 60 minutes
• License: CC-BY-SA 2.0
• Please let me know if you find it useful and what it could use: [email protected]
Introduction to Apache AccumuloScaling a web application made easier
Sean Busbey // Software Engineer
3© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Let’s talk about Apache Accumulo…
4© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
But in the context of a specific use case
•I really like technology that solves a problem.
•Keep in mind that this won’t be exhaustive.
•YMMV, proof-of-concepts with metrics are better than slides.
5© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Who am I?
• Apache Accumulo PMC
• Apache HBase committer
• Software Engineer on Cloudera’s storage team
6© 2015 Cloudera licensed CC-BY-SA 2.0
That is to say, I work for a vendor and no longer have operational scale problems of my own.
We’ll focus on an application that enables conversations centered on cute cats.
8© 2015 Cloudera licensed CC-BY-SA 2.0
9© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Simple sharing model built with privacy
controls
•User defines a group that may see their posting
•User posts a picture to a given group
•Members of the group may write short messages
10© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Straight forward web architecture
11© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Relational Data Model
Will map user names to identifiers used elsewhere.
Will track ownership and descriptive name.
Will allow users to add and remove members.
User table Group table Group membership table
12© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Relational Data Model
Tracks distribution group, owner, and topical image.
Individual comments from users.
Topic table Comment table
13© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
First growth: robustness
14© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
First growth: robustness
15© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Second growth: application scale out
16© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Scaling reads: what goes into this page?
17© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Database reads eventually become a
bottleneck
18© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Scale by de-normalizing in favor of reads
19© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Change to writes - original
20© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Change to writes – de-normalized
Generally known
as the fan-out
pattern.
21© 2015 Cloudera licensed CC-BY-SA 2.0
22© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
The trick is to not get crushed by the writes
•Each poster now does a write for each member of the group a post goes to.
•Removing access is now a much larger delete query.
•Most databases are geared toward few writes and many reads; are we screwed?
23© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Recall our access pattern
Basically one of
these consumer
boxes.
24© 2015 Cloudera licensed CC-BY-SA 2.0
25© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Lines up very well with sharding
•Divide the query space up by e.g. a hash of user id into n shards.
•Store a copy of table on each shard, but just for user ids that hash to that shard.
•Reads and writes are spread across instances.
26© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Database shards Layout
27© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
What were the nice-to-haves for the RDBMS
again?
• No longer leveraging relational data model.
• Now running, backing up, and failing over num shards number of database instances.
• Robustness in a shard has to be managed.
• Sharding is essentially static; adding more resources with growth still painful.
28© 2015 Cloudera licensed CC-BY-SA 2.0
Now we have some context for Accumulo.Our goal is to end up with less operational overhead.
29© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/
30© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Accumulo-based App Layout
31© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/
32© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
In Accumulo, you address cells rather than
records
Key Valu
e
33© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Keys are multi-dimensional
Key Valu
eRo
w
Column Tim
e
34© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Keys are multi-dimensional
Key Valu
eRo
w
Column Tim
eFamily Qualifier Visibility
35© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Accumulo doesn’t assume a schema
•All key and value components, save time, are byte[]
•The application is responsible for serialization
•Common to use different serialization for the values in different columns.
36© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Mapping records to cells
•Treat a row as a database
• Essentially each column is a record field
•Treat each cell as a database record
• Need to uniquely identify each record
• Useful if you generally need the whole row and not a subset of columns
• Can then treat each row as a shard of database records.
37© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Let’s use a concrete example.
38© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Already know our reads are within a shard.
39© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Mapping our data into cells
Key Value
Row Column Family Column Qualifier Visibility author, image url,
and commentreader id discussion id comment order group id
40© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
We end up with something close to our
original.
41© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Note the use of visibility
42© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Visibility enforcement
•At scan time, our application will pass in the groups for the current user.
•Accumulo will filter any cells that don’t match those groups.
• Group removal is a simple update in the group management system again.
43© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Sparse column storage
•We are creating lots of columns: per discussion per group member.
•Accumulo only stores columns that exist in a given row.
44© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/
45© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
All cells sorted according to key
• Total ordering based on lex-sort of raw byte arraysof key components.
• Time is sorted most-recent-first
• Reads are done on a contiguous range of cells.
46© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
When sorted our data looks like this….
47© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
And the scan for a page is roughly…
48© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Lexicoders
• Turning different kinds of data into sortable bytes is painful
• Accumulo ships implementations for several common Java types
• Also for e.g. reversing the sort order and building compound keys.
49© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Inefficiencies in our data model
Key Value
Row Column Family Column Qualifier Visibility author, image url,
and commentreader id discussion id comment order group id
50© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Two categories of data
Key Value
Row Column Family Column Qualifier Visibility author, image url
reader id discussion id image group id
Key Value
Row Column Family Column Qualifier Visibility author, comment
reader id discussion id text group id
51© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
And now our data looks like this
52© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
And the scan for a page covers less data
53© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™ sorted, distributedkey/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/
54© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Our simplified diagram
55© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Slightly less simplified
56© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Back to the data model
Key Valu
eRo
w
Column Tim
eFamily Qualifier Visibility
57© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Back to the data model
Key Valu
eRo
w
Column Tim
eFamily Qualifier Visibility
58© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Rows are grouped into Tablets
• Tablet is defined by a start and end row
• All cells for a given row must be in the same Tablet.
59© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Tablets are assigned to Tablet Servers
• At any given point in time, a Tablet is serviced by a single Tablet Server
60© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Slightly less simplified
61© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Tablets are assigned to Tablet Servers
• At any given point in time, a Tablet is serviced by a single Tablet Server
• That server is responsible for client reads and writes to all hosted Tablets
• Finding the proper server is handled by the Accumulo libraries
• Proper key design means io load gets spread across multiple machines
62© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/
63© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Tablet assignment is not static
• Assignment tend to have steady state
• But can move in the event of new resources or failure
64© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Remember our RDBMS scaling?
65© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
New RDBMS shard
1. Provision hardware for service
2. Rewrite data under new sharding
3. Update application services
• Doing this without an outage is hard work (and well paid if you can get it)
66© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
New Accumulo Tablet Server
1. Provision hardware for service
2. Add server to cluster
3. Tablets automatically migrate from busier nodes to new node
• No outage from client perspective.
67© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.”Accumulo PMC via https://accumulo.apache.org/
68© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
All distributed systems have communication
failures
In the face of such a failure you can either
• remain available on remaining nodes to all clients
• provide a consistent view of updates to a subset of clients
69© 2015 Cloudera licensed CC-BY-SA 2.0
Now you know the basics of CAPRemember that you can’t give up partition tolerance
70© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Remember our RDBMS robustness?
71© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Accumulo is a CP system
• Tablet Servers ensure that updates have been written to a distributed write-ahead-log before acknowledging
• Tablet Server failures are automatically detected
• Newly assigned hosts for recovered Tablets then replay edits up until last ack before serving new requests
72© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
73© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client write
74© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Write goals
• Low latency ack
• Don’t lose acked writes in face of node failure
75© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client write
1
76© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client write
1
2
77© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client write
1
2
3
78© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
79© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
80© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
81© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
82© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
83© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Recovery timing
• Tunable time to detection – increases network load
• Size of outstanding write ahead logs
84© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client write
1
2
3
4
85© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Accumulo-based App Layout
86© 2015 Cloudera licensed CC-BY-SA 2.0
What’s the catch?
87© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Gaps
• Still requires application updates to use API – no interactive SQL bindings*
• No Disaster Recovery – coming in next minor release
Thank you.
Mr. Mean photo from mockup is © 2004 Flickr user aznewbeginning; cc-by-sa 2.0 https://flic.kr/p/4uzdRc