Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Founder CTO, ParElastic

Download Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Founder CTO, ParElastic

Post on 09-May-2015

673 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

http://www.DatabaseMonth.com/database/parelastic-database-scalability

TRANSCRIPT

<ul><li>1.Scalability and database virtualization How virtualizing your databases improves performance, and lowers costs New York City MySQL Meetup, October 3, 2013</li></ul><p>2. Whats this presentation about? Scalability and the database tier Whats the problem? How did we get here? Some proposed solutions What are parallel databases? Whats ParElastic? How do I get ParElastic? Q&amp;AOctober 3, 2013Tweet this presentation #parelasticScalability and the database tier | NYC MySQL Meetup2 3. What is the scalability problem?October 3, 2013Scalability and the database tier | NYC MySQL Meetup3 4. What is the scalability problem? Has many faces Connections and Concurrency Data Volume and Retention Period Databases and Tenants Read vs. Write Your problem(s) May be more than one May change over timeOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup4 5. Connections and Concurrency More [Active] Connections Worse Performance Sizing your databaseOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup5 6. Data Volume and Retention Period Longer Retention Period More Data More Data Worse Performance Progressive deterioration All data in memory All indexes in memory Not enough memory October 3, 2013Scalability and the database tier | NYC MySQL Meetup6 7. Databases and Tenants Common paradigm in SaaS applications Each tenants application instance has a database Several databases on each database instance More databases per instance Worse Performance In one customer engagement we were informed that no more than 1000 tenants could be located on one database instance before performance became unacceptable October 3, 2013Scalability and the database tier | NYC MySQL Meetup7 8. Read vs. Write Simple read (SELECT) queries could scale well Key based lookups With favorable indexes Things that cause heartburn Complex joins (with large data sets) Sorts Aggregation Reads are easier to scale than writesOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup8 9. How did we get here? A brief history lesson October 3, 2013Scalability and the database tier | NYC MySQL Meetup9 10. How did we get here? [1] A combination of factors Changes in the application user/usage Driven by the Internet and mobile computing News Cycles are getting shorter Economics Commodity computing is cheap and getting cheaper Solutions that can scale-out win, others lose Ability to leverage higher core-densities Other databases does a better job at this than MySQL MySQL would do great if you had a 20GHz processor ;)October 3, 2013Scalability and the database tier | NYC MySQL Meetup10 11. How did we get here? [2] The Evolution of the Database Management System A battle between generalized and specialized The Relational Database Management System (RDBMS) Designed for monolithic systems SMP Scale-Up Applications evolve quickly! Databases respond slowlyOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup11 12. How did we get here? [3] Moores Law Scale-Up seemed like a fine answer But there are limits October 3, 2013Scalability and the database tier | NYC MySQL Meetup12 13. How did we get here? [4] Database architectures traditionally were Shared CPU/Memory/Disk Also known as Shared-Everything But Shared-Everything doesnt scale At least not for databases A server costing twice as much doesnt always give you twice as much database power. You reach a point of diminishing returns.October 3, 2013Scalability and the database tier | NYC MySQL Meetup13 14. How did we get here? [5] You can pay more but you may not get more Source: Amazon RDS TPC-C Benchmark. Md. Borhan Uddin, Bo He, Radu Sion, Cloud Computing Center, SUNY Stony Brook. Viewed online http://digitalpiglet.org/research/sion2010cloud-rds.pdfOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup14 15. Some proposed solutionsOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup15 16. Some proposed solutions Several strategies have been advocated Cache, Cache, Cache, Get a bigger server [a.k.a. Scale-Up] Sharding [a form of Scale-Out] NoSQL or NewSQL [typically Scale-Out] Replication and variants We look at each one in more detailOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup16 17. Cache, Cache, Cache! Thats easy! Do some caching!caching transitive verb to cache cache noun Temporary computer storage used for quick retrieval of data in order to increase processing speed. Caching only addresses read; not write Social Media workloads are 'write heavy, 'interactive and highly personalized October 3, 2013Scalability and the database tier | NYC MySQL Meetup17 18. Get a bigger server [Scale-Up] I will use a bigger database server Can I even get a bigger server? What if m2.4xlarge isnt enough? Maybe I just have too much data? Maybe I have too many users?October 3, 2013Scalability and the database tier | NYC MySQL Meetup18 19. Sharding [a form of Scale-Out] Sharding will solve my problem!shard noun shrd a piece or fragment of a brittle substance ; broadly : a small piece or part sharding noun shr-di (a) to make ones application brittle or fragmented; (b) to take one big problem and make many small problems; (c) to complicate an application while claiming to solve a scalability problem; (d) to decrease developer productivity; (e) a bad idea; (f) sharding library: a mechanism that attempts (unsuccessfully) to hide the bad taste of shardingOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup19 20. NoSQL or NewSQL? You need NoSQL or NewSQL! Yes, I have to rewrite my application Yes, not all queries will work No, theres no standard query language No, most do not have ACID guarantees; hell some dont even guarantee Durability Yes, most are somewhat untried science-experiments More flavors than Ben &amp; Jerrys Ice Cream [yes, really] But, all the cool kids are doing it!October 3, 2013Scalability and the database tier | NYC MySQL Meetup20 21. Replication and variants Replication based solutions (typically called clustering) Many copies of the data Distribute queries across the copies Keep the copies synchronized: like herding cats Write bottleneck Read/Write splitting Single Master (gets all the writes) Many Slaves (share the reads) Unpredictable latency Write bottleneckOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup21 22. What about MySQL Cluster? MySQL Cluster is a strange beast For best results, you must use the NDB interface Only supports the NDB storage engine Primarily a distributed in-memory Key-Value Store That is ACID compliant and supports joins and things if you use the SQL interface But no one tells you about the performance of this path! Published benchmarks are all FlexAsync which talk directly to the NDB interface And READ-ONLY For more details visit http://www.parelastic.com/blog/mysql-cluster-and-benchmarks Or stick around after the presentation and we can chat! October 3, 2013Scalability and the database tier | NYC MySQL Meetup22 23. What are parallel databases?October 3, 2013Scalability and the database tier | NYC MySQL Meetup23 24. What are parallel databases? 1 A database architecture proposed in 1992 Very successfully applied to many database problems Oracle Exadata, Netezza, Teradata, Greenplum, An example of the Shared Nothing database 2 paradigm1Parallel Database Systems: The future of high performance database processing [1992, Dewitt, Gray, ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf] 2 The Case for Shared Nothing [1986, Stonebraker, http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf]October 3, 2013Scalability and the database tier | NYC MySQL Meetup24 25. How parallel databases execute queriesImage from Parallel Database Systems: The future of high performance database processing [1992, Dewitt, Gray, ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf]October 3, 2013Scalability and the database tier | NYC MySQL Meetup25 26. Benefits of parallel databases Linear improvement in reads Linear improvements in writes Better than linear improvement in joins Better than linear improvement in aggregation Better than linear improvement in sortsFor more details, refer Parallel Database Systems: The future of high performance database processing [1992, Dewitt, Gray, ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf]October 3, 2013Scalability and the database tier | NYC MySQL Meetup26 27. Parallel Databases vs. Sharding Parallel Database Database architecture Application is data location agnostic Application perceives a single database Requires no application rewrites Application is not constrained by parallel database architecture A parallel database handles any schema October 3, 2013 Sharding Application architecture Application is data location aware Application perceives a collection of databases Requires application rewrites Application is constrained to the limitations of the sharding architecture Not all schemas are shardableScalability and the database tier | NYC MySQL Meetup27 28. What is ParElastic? Hypervisor for databasesOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup28 29. What is ParElastic? An approach to relational database virtualization Addresses issues of scalability in relational databases A parallel database architecture Built on standard MySQL or MySQL variant databases Horizontal Scalability ElasticOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup29 30. ParElastic: System ArchitectureParElastic Architecture protected by US8214356, Apparatus for elastic database processing with heterogeneous data10/7/2013Flex Your Database | ParElastic Database Virtualization Engine30 31. Data Distribution: How it works User data is distributed across multiple storage nodes Queries are executed in parallel by some [or all] nodes Multiple distribution models supported Range Hash Broadcast Random ParElastic guarantees co-location and query executionOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup31 32. Storage Elasticity: How it works A generational scheme Storage Nodes added over time Each creates a new generation Unnecessary to migrate large amounts of data A key drawback with sharding that requires reshardingStorage Elasticity protected by US8478790, US8386532 and other patents.October 3, 2013Scalability and the database tier | NYC MySQL Meetup32 33. ParElastic: How It Works10/7/2013Flex Your Database | ParElastic Database Virtualization Engine33 34. ParElastic: Simple query processing exampleSELECT COUNT(*) FROM CUSTOMER; count(*) -------2771 (1 row affected)PROVISION 1 DYNAMIC NODE ON DYNAMIC NODE CREATE TEMP TABLE T1 ( C INT ); ON ALL STORAGE NODES SELECT COUNT(*) FROM CUSTOMER AND REDISTRIBUTE TO T1 ON DYNAMIC NODE SELECT SUM(C) FROM T1;10/7/2013Flex Your Database | ParElastic Database Virtualization Engine34 35. ParElastic Performance Benefits Connection Scalability ParElastic Tier Elasticity; have more or less ParElastic servers Storage / Data Volume Scalability Add ParElastic Persistent Nodes as data volumes increase Multiple machines working together Workloads are variable Compute Node Elasticity; have more or less as required Databases and Tenants [SaaS applications] ParElastic Adaptive Multi-tenancy No application change Queries processed by, data stored on standard MySQL! 10/7/2013Flex Your Database | ParElastic Database Virtualization Engine35 36. ParElastic Multi-TenancyOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup36 37. ParElastic Concurrency [1]October 3, 2013Scalability and the database tier | NYC MySQL Meetup37 38. ParElastic Concurrency [2]October 3, 2013Scalability and the database tier | NYC MySQL Meetup38 39. ParElastic data ingest One Million rows/s! 15 Storage Nodes, 2 ParElastic ServersTests conducted in Amazon Cloud. Native MySQL testing on m1.xlarge server, standard MySQL, standard EBS volumes. Test driver was a c1.xlarge server to provide sufficient CPU head-room to generate load. ParElastic run with 5 and 15 persistent storage nodes identically configured, m1.xlarge, standard MySQL, standard EBS Volumes. 15 node test employed two c1.xlarge test drivers. Best ParElastic performance was with 10 threads, 10 persistent storage nodes and an insert batch size of 5,000 tuples per insert batch. Best native MySQL performance was with 2 threads and a batch size of 10,000 tuples per insert batch.October 3, 2013Scalability and the database tier | NYC MySQL Meetup39 40. Whats the ParElastic Overhead? Query Time 15.72ms Test Client Machine 1Query Time 17.03msParElastic overhead ~ 1.31ms Network RTT 0.35msMachine 1ParElastic Machine 2mysqldmysqldMachine 2Machine 3October 3, 2013Test Clientmysqld Scalability and the database tier | NYC MySQL MeetupMachine 440 41. Characterizing ParElastic Performance A fixed cost, the overhead per query A variable cost for query processing Consider this example, a simple COUNT query.October 3, 2013Scalability and the database tier | NYC MySQL Meetup41 42. Some things to keep in mind Horizontal Scale-Out benefits from Being stateless, or at least having less state Adhering to a truly shared nothing approach Horizontal Scale-Out is impeded by Complex or Shared State Things that violate the shared nothing paradigmOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup42 43. What is ParElastic? An approach to relational database virtualization "A Hypervisor for the Database Tier" Scale out database capacity across many servers Effectively handle workloads too big for one server Share this pool of database among many applications Efficiently allocate database capacity to workload An elastic, multi-tenant, parallel database architecture Built on standard MySQL or MySQL variant databases Horizontal Scalability ElasticOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup43 44. Some target markets Database Virtualization Hypervisor for the Database Reduce capex and simplify administration for development and test SaaS Enablement Simplified deployment of SaaS applications using multitenancy High Volume Database Applications High traffic websites, (e.g. social, ecommerce, on-line games) High speed data ingest (e.g. click tracking, sensor arrays, mobile)October 3, 2013Scalability and the database tier | NYC MySQL Meetup45 45. Where do I get ParElastic?October 3, 2013Scalability and the database tier | NYC MySQL Meetup46 46. Getting ParElastic For Evaluations Available at no charge on Amazon Marketplace Preconfigured for evaluation purposes; not performance testing Runs completely on a single EC2 instance For Larger Configurations Contact ParElastic Email: info@parelastic.com Twitter: @parelastic Web: http://www.parelastic.comOctober 3, 2013Scalability and the database tier | NYC MySQL Meetup47 47. Getting ParElastic On the Amazon AWS Marketplace (aws.amazon.com/marketplace) Quick start guide and simple (tw...</p>

Recommended

View more >