cloudcon east presentation
DESCRIPTION
Introduction to HiveDB presented at Cloudcon East.TRANSCRIPT
![Page 1: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/1.jpg)
Scaling with HiveDB
![Page 2: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/2.jpg)
The Problem• Hundreds of
millions of products
• Millions of new products every week
• Accelerating growth
![Page 3: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/3.jpg)
Scaling up is no longer an option.
![Page 4: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/4.jpg)
Requirements
➡ OLTP optimized➡ Constant response time is more
important than low-latency➡ Expand capacity online➡ Ability to move records➡ No re-sharding
![Page 5: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/5.jpg)
2 years agoClustered Non-relational Unstructured
Oracle RACMS SQL ServerMySQL Cluster
Teradata (OLAP)
HadoopMogileFS
S3
![Page 6: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/6.jpg)
TodayClustered Non-relational Unstructured
Oracle RACMS SQL ServerMySQL Cluster
DB2Teradata (OLAP)
HiveDB
HypertableHBase
CouchDBSimpleDB
HadoopMogileFS
S3
![Page 7: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/7.jpg)
System Storage Interface Partitioning Expansion Node Types Maturity
Oracle RAC Shared SQL TransparentNo
downtime Identical 7 years
MySQL Cluster Memory / Local SQL Transparent
Requires Restart Mixed 3 years
Hypertable Local HQL TransparentNo
downtime Mixed?
(released 2/08)
DB2 Local SQL Fixed HashDegraded Performan
ceIdentical
3 years (25 years
total)
HiveDB Local SQL Key-basedDirectory
No downtime Mixed 18 months
(+13 years!)
![Page 8: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/8.jpg)
Operating a distributed system is hard.
![Page 9: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/9.jpg)
Non-functional concerns quickly dominate functional concerns.
![Page 10: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/10.jpg)
ReplicationFail overMonitoring
![Page 11: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/11.jpg)
Our approach:minimize cleverness
![Page 12: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/12.jpg)
Build atop solid, easy to manage technologies.
![Page 13: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/13.jpg)
So we can get back to selling t-shirts.
![Page 14: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/14.jpg)
MySQL is fast.
![Page 15: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/15.jpg)
MySQL is well understood.
![Page 16: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/16.jpg)
MySQL replication works.
![Page 17: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/17.jpg)
MySQL is free.
![Page 18: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/18.jpg)
The same for JAVA
![Page 19: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/19.jpg)
![Page 20: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/20.jpg)
The simplest thing
➡ HiveDB is a JDBC Gatekeeper for applications.
➡ Lightest integration➡ Smallest possible implementation
![Page 21: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/21.jpg)
The eBay approach
➡ Pro: Internally consistent data➡ Con: Re-sharding of data required➡ Con: Re-sharding is a DBA
operation
![Page 22: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/22.jpg)
Partition by keyA simple bucketed partitioning scheme expressed in code.
![Page 23: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/23.jpg)
Partition by key
![Page 24: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/24.jpg)
Directory
➡ No broadcasting➡ No re-partitioning➡ Easy to relocate records➡ Easy to add capacity
![Page 25: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/25.jpg)
![Page 26: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/26.jpg)
Disadvantages
➡ Intelligence moved from the database to the application (Queries have to be planned and indexed)
➡ Can’t join across partitions➡ NO OLAP! (We consider this a
benefit.)
![Page 27: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/27.jpg)
Hibernate ShardsPartitioned Hibernate from Google
![Page 28: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/28.jpg)
Why did we build this thing again?
![Page 29: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/29.jpg)
Oh wait, you have to tell it where things are.
![Page 30: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/30.jpg)
![Page 31: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/31.jpg)
Benefits of Shards
➡ Unified data access layer➡ Result set aggregation across
partitions➡ Everyone in the JAVA-world knows
Hibernate.
![Page 32: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/32.jpg)
![Page 33: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/33.jpg)
![Page 34: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/34.jpg)
![Page 35: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/35.jpg)
![Page 36: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/36.jpg)
![Page 37: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/37.jpg)
![Page 38: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/38.jpg)
Working on simplifying it even further.
![Page 39: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/39.jpg)
![Page 40: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/40.jpg)
But does it go?
![Page 41: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/41.jpg)
Case Study: CafePress➡ Leader in User-generated
Commerce➡ More products than eBay
(>250,000,000)➡ 24/7/365
![Page 42: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/42.jpg)
Case Study: Performance Requirements
➡ Thousands of queries/sec➡ 10 : 1 read/write ratio➡ Geographically distributed
![Page 43: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/43.jpg)
Case Study:Test Environment
➡ Real schema ➡ Production-class hardware➡ Partial data (~40M records)
![Page 44: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/44.jpg)
[email protected] Modified on April 09 2007
CafePress HiveDB 2007Performance Test Environment
JMeter (100s of threads)
web s
erv
ice
(hiv
edb)
Dell 1950 / 2x2 Xeon
Tomcat 5.5
Hardware LB
Directory Partition 0 Partition 1
client.jar
data
bases
(mysql)
Dell 1950 / 2x2 Xeon
Tomcat 5.5
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
Dell 1950 / 2x2 Xeon
Tomcat 5.5
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
JMeter (100s of threads)
client.jar
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
load g
enera
tors
48GB backplane non-
blocking gigabit switch
100MBit switch
com
mand &
contr
ol
JMeter (1 thread)
client.jar
Measurement
Workstation
JMeter (no threads)
client.jar
Test Controller
Workstation
![Page 45: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/45.jpg)
Case Study:Performance Goals
➡ Large object reads: 1500/s
➡ Large object writes: 200/s
➡ Response time: 100ms
![Page 46: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/46.jpg)
Case Study:Performance Results
➡ Large object reads: 1500/sActual result: 2250/s
➡ Large object writes: 200/sActual result: 300/s
➡ Response time: 100msActual result: 8ms
![Page 47: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/47.jpg)
Case Study:Performance Results
➡ Max read throughputActual result: 4100/s(CPU limited in Java layer; MySQL <25% utilized)
![Page 48: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/48.jpg)
Case Study:Results➡ Billions of queries served➡ Highest DB uptime at CafePress➡ Hundreds of millions of updates
performed
![Page 49: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/49.jpg)
Show don’t tell!
![Page 50: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/50.jpg)
High Availability
➡ We don’t specify a fail over strategy.
➡ We don’t specify a backup or replication strategy.
![Page 51: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/51.jpg)
Fail Over Options
➡ Load balancer➡ MySQL Proxy➡ Flipper / MMM➡ HAProxy
![Page 52: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/52.jpg)
Replication
➡ You should use MySQL replication.➡ It really works.➡ You can add in LVM snapshots for
even more disastrous disaster recovery.
![Page 53: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/53.jpg)
The BottleneckThere is one problem with HiveDB. The directory is a scaling bottleneck.
![Page 54: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/54.jpg)
There’s no reason it has to be a database.
![Page 55: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/55.jpg)
MemcacheD
➡ Hive directory implemented atop memcached
![Page 56: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/56.jpg)
Roadmap
➡ MemcacheD directory➡ Simplified Hibernate support➡ Management web service➡ Command-line tools➡ Framework Adapters
![Page 57: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/57.jpg)
Call for developers!
➡ Grails (GORM)➡ ActiveRecord (JRuby on Rails)➡ Ambition (JRuby or web service)➡ iBatis➡ JPA➡ ... Postgres?
![Page 58: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/58.jpg)
Contributing
➡ Post to the mailing list http://groups.google.com/group/hivedb-dev
➡ Comment on our sitehttp://www.hivedb.org
➡ Submit a patch / pull requestgit clone git://github.com/britt/hivedb.git
![Page 59: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/59.jpg)
Photo Credits➡ http://www.flickr.com/photos/7362313@N07/1240245941/
sizes/o
➡ http://www.flickr.com/photos/99287245@N00/2229322675/sizes/o
➡ http://www.flickr.com/photos/51035555243@N01/26006138/
➡ http://www.flickr.com/photos/vgm8383/2176897085/sizes/o/
➡ http://t-shirts.cafepress.com/item/money-organic-cotton-tee/73439294
![Page 60: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/60.jpg)
Blobject
➡ Gets around the problem of ALTER statements. No data set of this size can be transformed synchronously.
➡ The hive can contain multiple versions of a serialized record.
➡ Compression
![Page 61: Cloudcon East Presentation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54b78f864a7959b12c8b474c/html5/thumbnails/61.jpg)
Hadoop!
➡ Query and transform blobbed data asynchronously.