white paper: mongodb performance benchmarking with vmax
TRANSCRIPT
White Paper
EMC VMAX ALL FLASH WITH MONGODB MongoDB Performance Benchmarking with VMAX All Flash
EMC Solutions
Abstract
This white paper provides performance benchmarking results when deploying a MongoDB environment in an EMC® VMAX All Flash storage array. The paper details how MongoDB benefits from the advanced technical features of EMC VMAX All Flash systems.
April 2016
Copyright
2 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Copyright © 2016 EMC Corporation. All rights reserved. Published in the USA.
Published April 2016
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.
The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, FAST, PowerPath, SnapVX, SRDF, Symmetrix, TimeFinder, Unisphere, VMAX, VMAX3 and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Part Number H15005
Contents
3 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
Contents
Executive Summary ..................................................................................................................... 4
Technology Overview................................................................................................................... 6
Solution Overview ....................................................................................................................... 9
Testing environment .................................................................................................................. 10
Benefits of using VMAX and MongoDB ........................................................................................ 12
Conclusion ............................................................................................................................... 22
Executive Summary
4 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Executive Summary
As Big Data trends continue to evolve, customers must adopt next generation technologies for their operational databases. Databases and storage systems must handle larger datasets with different formats and sources generated by web, mobile, social, and cloud applications. These data trends are challenging the capabilities of traditional relational databases that were never designed to address massive data growth, performance, scale, and realtime data modeling. Customers are looking for newer and more agile ways to conduct realtime analytics and make critical business decisions.
MongoDB offers the best of traditional databases, as well as the flexibility, scale, and performance required by today’s applications. MongoDB is one of the fastest growing NoSQL databases, with more than 10 million downloads and more than 2000 customers, and is implemented by over one-third of Fortune 100 companies.
MongoDB is deployed in large enterprise environments and key vertical markets where performance, scale, and availability are a critical requirement. As traditional IT organizations adopt these newer technologies, they want to continue to use the trusted shared storage infrastructure that they have relied on for years with enterprise features such as high availability, advanced replication technologies, multi-tenancy, and security.
This solution shows how customers can use their existing storage area network (SAN) infrastructure with trusted EMC VMAX enterprise data services platform for MongoDB. VMAX is an EMC flagship Tier 1 storage platform with industry leading performance, scale and density, and is implemented by more than 94 percent of the Fortune 50 companies. Customers can solve end-to-end operational challenges associated with direct-attached storage (DAS) by consolidating MongoDB with their existing mission-critical applications on VMAX. They can then take advantage of simplified scale-out architecture, high resiliency, guaranteed service level agreements (SLAs), and compelling total cost of ownership (TCO) savings
With the traditional commodity/DAS architectures, resources are added in a completely linear fashion. Homogeneous servers are added with exact increments of CPU, memory and storage. Because applications do not consume resources linearly, resources can become stranded. For instance, adding nodes for pure storage capacity results in underutilized CPU or memory. By scaling storage independently, you can make better use of compute resources and potentially save on the datacenter footprint, hardware costs, and software licenses.
This document provides a MongoDB 3.2 performance benchmarking reference for implementation on EMC VMAX All Flash storage arrays.
This document is intended for use by pre-sales personnel, sales engineers, and customers who want to understand the benefits of implementing a MongoDB environment on an EMC VMAX All Flash storage array.
Business case
Solution overview
Recommendations
Document purpose
Audience
Executive Summary
5 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
EMC and the authors of this document welcome your feedback on the solution and the solution documentation. Contact [email protected] with your comments.
Authors: Harry Tu, Kecheng Bi, Praneetha Manthravadi, Kathleen McCarthy.
We value your feedback!
Technology Overview
6 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Technology Overview
EMC VMAX All Flash is an enterprise data services platform that is well suited to solve the CIO challenge of embracing a modernized flash-centric data center and hybrid cloud, while simultaneously simplifying, automating, and consolidating IT operations. VMAX is the industry’s leading Tier 1 highly resilient, scalable, and agile platform with a complete set of rich software data services
VMAX All Flash 450 arrays and 850 arrays use the latest 3D NAND Flash technology to consolidate high demand transaction processing workloads to deliver consistent <0.5ms response times. VMAX All Flash systems come in appliance-like packaging that is easy to configure, deploy, and manage. Figure 1 shows the arrays.
Figure 1. VMAX All Flash models and scaling
For enterprises that require petabyte-level scale, VMAX All Flash is purpose-built to manage high-demand, heavy-transaction workloads easily while storing petabytes of vital data. The VMAX All Flash hardware design features the turbo-charged Dynamic Virtual Matrix architecture that enables extreme speed and consistent sub-millisecond response time.
VMAX delivers millions of IOPS at massive scale using up to 384 cores. VMAX uses advanced multi-core/multi-threading algorithms and a flash-optimized design to meet strict SLAs for high-demand OLTP, virtualized applications, and high growth databases.
VMAX architecture is trusted for always-on availability with advanced fault isolation, robust data integrity checking, and proven non-disruptive hardware and software upgrades. Along with six-nines availability for 24x7 forever operations, VMAX uses SRDF® software, the gold standard for multi-site remote replication. Also, with EMC
VMAX All Flash
Technology Overview
7 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
TimeFinder® SnapVX™ technology, users can create hundreds of snapshots for each workload to optimize decision support, application testing, and business analytics.
The arrays are built for easy management, extreme performance, and massive scalability in a small footprint that provides compelling TCO compared to DAS architectures.
VMAX All Flash uses the industry’s first open storage and hypervisor converged operating system, HYPERMAX OS, which combines industry-leading high availability, I/O management, quality of service, data integrity validation, storage tiering, and data security with an open application platform. HYPERMAX OS features the first realtime, non-disruptive storage hypervisor that manages and protects embedded data services by extending VMAX high availability to services that traditionally run external to the array. It also provides direct access to hardware resources to maximize performance and can be upgraded without disruption.
MongoDB Enterprise edition is a document-oriented database, which is designed for a broad array of modern applications. It is used by organizations of all sizes to power mission-critical operational applications where low latency, high throughput, and continuous availability are critical requirements of the system. MongoDB incorporates the innovations of a NoSQL database—scalability, performance, and data model flexibility—while maintaining the foundation of strong consistency, secondary indexes, and a rich query language that developers expect from traditional, relational databases.
MongoDB is built for scalability, performance and high availability, scaling from single server deployments to large, complex, multi-site architectures.
Replica sets and sharding are two types of MongoDB clusters. MongoDB uses its native replication to maintain multiple copies of data across replica sets. A replica set is a group of MongoDB instances that maintain the same dataset. Replica sets help prevent downtime by detecting failures and automatically initiating failover, as shown in Figure 2.
HYPERMAX OS
MongoDB
Technology Overview
8 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Figure 2. MongoDB replication example
Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.
MongoDB scales horizontally using sharding. Sharding splits data into ranges and uniformly distributes the shards across multiple computers, enabling even data distribution. Each shard is an independent database, and collectively, the shards make up a single, logical database as illustrated in Figure 3.
Figure 3. MongoDB sharding
Solution Overview
9 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
Solution Overview
MongoDB can use local attached storage to hold its data sets—a simple solution to meet the requirements of a general system.
The testing that is described in this white paper uses the EMC VMAX All Flash array to support a MongoDB sharding environment that can meet enterprise demands for performance, scalability, and data replication. This testing demonstrates VMAX All Flash features with SnapVX, VMware vSphere® Virtual Volumes™, and Data at Rest Encryption (D@RE). We validate the impact on the MongoDB performance when we enable these features. This performance benchmarking uses MongoDB databases running Yahoo! Cloud Serving Benchmark (YCSB) random I/O workloads. YCSB is an open source, extensible workload generator that is commonly used to compare performance for a set of desired workloads.
Testing environment
10 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Testing environment
Figure 4 shows the hardware configuration that was used in the test environment.
Figure 4. Hardware configuration MongDB testing environment
Testing environment
11 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
Table 1describes the server configuration of the test environment.
Table 1. Server configuration
Note: PowerPath was used in this case. While customers can choose to use either PowerPath or native multipathing, both work effectively with our underlying storage array.
Table 2 describes the software configuration of the test environment.
Table 2. Software configuration
Server configuration
Component Description
Server
Cisco Blade: UCSB-B200-M3
Memory: 512 GB/ each
CPU: Intel Xeon CPU E5-2670 0 @ 2.60 GHx
HBA: Cisco VIC FCoE HBA
Multipath EMC PowerPath®/VE 6.0 SP1 for VMware vSphere
Connectivity Cisco MDS 9706 (8Gb FC)
Array VMAX All Flash
Software configuration
Component Description
OS RHEL 7.2
ESXi 6.0 U1b
vCenter 6.0U1b
Multipath PowerPath/VE 6.0.1
MongoDB Enterprise 3.2
YCSB 0.6.0
Benefits of using VMAX and MongoDB
12 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Benefits of using VMAX and MongoDB
The VMAX All Flash storage array and MongoDB together offer an extraordinary choice for customers who want these benefits:
Simplified storage management
Mixed workload consolidation
Ease of creation and restart of MongoDB copies
Advanced data replication and high availability
Consistent performance (refer to Benchmarking MongoDB performance)
Ability to scale storage independently from compute resources to allow better use of resources, such as CPU and memory
VMware vSphere Virtual Volumes
Data at Rest Encryption (D@RE)
EMC Unisphere® for VMAX is an intuitive management interface that allows IT managers to maximize human productivity by dramatically reducing the time that is required to provision, manage, and monitor VMAX All Flash storage assets. Unisphere 360 software aggregates and monitors up to 200 VMAX All Flash arrays across a single data center.
These steps demonstrate how easy it is to create a LUN and assign it to a host through Unisphere.
1. Log in to Unisphere and select to create hosts and port groups, as shown in Figure 5.
Figure 5. Configuring hosts and port groups before provisioning storage
2. Run the Provision Storage wizard to provision storage to hosts, as shown in Figure 6.
Simplified storage management
Benefits of using VMAX and MongoDB
13 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
Figure 6. Provisioning storage for a host
Figure 7 shows the Unisphere for VMAX dashboard that is used to monitor the storage system status.
Figure 7. Unisphere for VMAX performance dashboard
SnapVX delivers instant point-in-time replicas of host devices that can be used to create gold copies, to test patches, for backup and recovery, for data warehouse refreshes, or any other process that requires parallel access to, or preservation of, primary storage devices.
SnapVX creates snapshots by storing changed tracks (deltas) directly in the Storage Resource Pool (SRP) of the source device. With SnapVX, you do not need to specify a target device and source/target pairs when you create a snapshot, but you can create links from the snapshot to one or more target devices. If there are multiple snapshots and the application must find a particular point-in-time copy for host access, you can link and relink until the correct snapshot is located. In HYPERMAX OS arrays, SnapVX
Advanced data replication
Benefits of using VMAX and MongoDB
14 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
supports up to 256 snapshots per source device (including any emulation mode snapshots). See Figure 8.
Figure 8. SnapVX snapshot
MongoDB, by using the always-consistent snapshots that are available with SnapVX, allows for easy creation of restartable MongoDB copies.
For generating useful comparison metrics, we used three of the standard YCSB random I/O workloads (A, B, C), which are pre-defined within YCSB to simulate common I/O patterns in the NoSQL database environment.
Note: For more information about YCSB, refer to How to benchmark MongoDB with YCSB and How to run YCSB on MongoDB.
SnapVX provides low impact snapshots for VMAX LUNs. We qualified this by using a stand-alone MongoDB instance with YCSB profile Workload A. Figure 9 shows the results of running a workload on a MongoDB environment both with and without SnapVX.
Benchmarking MongoDB performance
Benefits of using VMAX and MongoDB
15 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
Figure 9. MongoDB low impact snapshots for VMAX LUN
The testing environment is composed of 16 virtual servers built on VMware ESXi servers:
Three sharded nodes MongoDB cluster system with a replica set (three members) per each shard
One MongoDB configuration server
Five YCSB client servers to perform stress testing with Mongo instances on each YCSB client
Figure 10 shows the testing environment.
50000
60000
70000
80000
1
16 31 46 61 76 91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
Thro
ugh
pu
t(o
ps/
sec)
1 Hour Data(10 seconds interval)
Workload A - Throughput Comparison
Base - Throughput SnapVX - Throughput
200.00
400.00
600.00
1
16 31 46 61 76 91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
Late
ncy
(us)
1 Hour Data(10 seconds interval)
Workload A - Latency Comparison
Base - Avg Read Latency SnapVX - Avg Read Latency
Base - Avg Update Latency SnapVX - Avg Update Latency
Testing environment configuration
Benefits of using VMAX and MongoDB
16 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Figure 10. Test environment configuration
This test case simulates updates with a heavy workload that is a mix of 50 percent read operations and 50 percent write operations on a 1 TB dataset. Records were selected by using a random Zipfian distribution. An example of a real-world workload that mirrors this testing scenario is an application that tracks the activity of users at eCommerce sites and then personalizes digital advertisements based on their activity.
Figure 11 illustrates Workload A test case results. The test results show an average throughput rate of 65129 operations per second, with an average 0.71ms read latency and an average 0.74ms update latency.
Workload A test case: Update heavy workload
Benefits of using VMAX and MongoDB
17 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
Figure 11. Workload A test results
This test case is read-intensive with minimal writes. It is set up as a mix of 95 percent read and 5 percent write operations that are based on a 1 TB dataset.
Figure 12 illustrates Workload B test case results. The test results show an average throughput of 87648 operations per second, with an average 0.76ms read latency and an average 0.80ms update latency.
Figure 12. Workload B test results
This test case simulates a full read-only workload with no-write I/O required on a 1 TB dataset. The entire dataset is accessed from three MongoDB shard nodes as a distribution pattern, because the dataset set is larger than memory; however, the underlying storage system was still receiving I/O requests.
600.00
700.00
800.00
900.00
1000.00
40000
45000
50000
55000
60000
65000
70000
120 39 58 77 96
115
134
153
172
191
210
229
248
267
286
305
324
343
Late
ncy
(us)
Thro
ugh
pu
t(o
ps/
sec)
1 Hour Data(10 seconds interval)
Sharding - Workload A
Throughput Avg Read Latency Avg Update Latency
600.00
700.00
800.00
900.00
1000.00
60000
65000
70000
75000
80000
85000
90000
1
20 39 58 77 96
115
134
153
172
191
210
229
248
267
286
305
324
343
Late
ncy
(us)
Thro
ugh
pu
t(o
ps/
sec)
1 Hour Data(10 seconds interval)
Sharding - Workload B
Throughput Avg Read Latency Avg Update Latency
Workload B test case: Read-mostly workload
Workload C test case: Read-only workload
Benefits of using VMAX and MongoDB
18 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Figure 13 illustrates Workload C test case results. The test results show an average throughput of 94434 operations per second, with a sustained average of 0.75ms read latency, which is relatively low.
Figure 13. Workload C test results
Virtual Volumes (VVOLs) is a key technology enabler that delivers a significantly new paradigm for how a virtualization administrator manages the underlying storage for virtual machines. This new paradigm is an important step in the VMware vision of a software-defined data center (SDDC) that delivers the quality of service expected from IT consumers. With vSphere Virtual Volumes, the management process moves from the LUN (data store) level to the virtual machine level. This level of granularity is critically important, as it is the core component of a virtualized environment.
While VMware VVOls simplify management and provide per-VM storage control, the revolutionary VMAX All Flash takes VVOL integration to a new level. The VMAX All Flash management paradigm, with radically simplified storage management, realizes the full value of VVOL storage policies. VMAX All Flash provides the highest levels of availability, data protection, and performance directly to the VM.
Planning storage for MongoDB deployment in a virtualization environment is not an easy task, especially when there are mixed types of concurrent workloads running on top of a data store. With VMAX All Flash, planning is no longer a problem. By using the vSphere Virtual Volumes Dashboard, a centralized location that is provided by Unisphere for VMAX to monitor and manage Virtual Volumes, a storage administrator can configure storage containers with different Service Level Objectives (SLOs) to meet MongoDB and virtualization administrators’ requirements, as shown in Figure 14.
600.00
700.00
800.00
900.00
1000.00
60000
70000
80000
90000
100000
1
21 41 61 81
101
121
141
161
181
201
221
241
261
281
301
321
341
Late
ncy
(us)
Thro
ugh
pu
t(o
ps/
sec)
1 Hour Data(10 seconds interval)
Sharding - Workload C
Throughput Avg Read Latency
VMware vSphere Virtual Volumes
Benefits of using VMAX and MongoDB
19 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
Figure 14. Adding multiple storage resources with different SLOs to a storage container
As well as an easy-to-use dashboard, VMAX All Flash also delivers uncompromised performance with Virtual Volumes compared to traditional LUN provisioning. We qualified this performance with a stand-alone MongoDB deployment virtual machine based on vSphere Virtual Volumes and using YCSB Workload A. See Figure 15.
Figure 15. Workload A throughput and latency comparison
Benefits of using VMAX and MongoDB
20 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
D@RE provides built-in, hardware-based, on-array, back-end encryption for the VMAX family with no performance impact on the array. It protects information from unauthorized access when drives or arrays are removed from the customer data center. D@RE provides encryption on the back end by using SAS I/O modules that incorporate XTS-AES 256-bit, data-at-rest encryption. These modules encrypt and decrypt data as it is being written to or read from a drive. All configured drives are encrypted, including data drives, and spares. Also, all array data is encrypted, including Symmetrix® File System and Vault contents. Alternative encryption methods that are available today include costly third-party software that must be managed and can introduce performance degradation. Figure 16 illustrates the D@RE architecture.
Figure 16. D@RE architecture
By using the VMAX D@RE feature, MongoDB data is well protected, eliminating any unauthorized data access, in addition to protecting against threats related to physical removal of media. Based on our test that used a stand-alone MongoDB instance with YCSB profile Workload A, there was no impact on MongoDB performance after we enabled D@RE.
Figure 17 shows the results of running a workload on a MongoDB environment both with and without D@RE. We can see an average throughput rate of 65441 operations per second with D@RE, compared to 66686 operations per second without D@RE. Also, there is a sustained average of 0.48ms update latency.
Data at Rest Encryption
Benefits of using VMAX and MongoDB
21 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash
White Paper
Figure 17. D@RE test results
50000
60000
70000
80000
1
15 29 43 57 71 85 99
113
127
141
155
169
183
197
211
225
239
253
267
281
295
309
323
337
351
Thro
ugh
pu
t(o
ps/
sec)
1 Hour Data(10 seconds interval)
Workload A - Throughput Comparison
Base - Throughput D@RE - Throughput
200.00
300.00
400.00
500.00
600.00
11
52
94
35
77
18
59
911
312
714
115
516
918
319
721
122
523
925
326
728
129
530
932
333
735
1
Late
ncy
(us)
1 Hour Data(10 seconds interval)
Workload A - Latency Comparison
Base - Avg Read Latency D@RE - Avg Read Latency
Base - Avg Update Latency D@RE - Avg Update Latency
Conclusion
22 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper
Conclusion
Based on test report matrices, our tests demonstrate positive results with high throughput and low latency. When MongoDB is consolidated with a customer’s existing mission-critical applications on VMAX All Flash, customers can take advantage of these benefits:
Ease of management
Unisphere for VMAX provides a common user experience across storage platforms. It enables users to provision, manage, and monitor a VMAX All Flash environment easily. Unisphere provides a number of task-orientated dashboards to make monitoring and configuring a VMAX system intuitive and easy to use.
By using VMware Virtual Volumes, virtualization administrators can easily manage the underlying storage on VMAX for any virtual server used by the MongoDB system.
High performance by flash-optimized
VMAX offers flash drives as add-ons to traditional arrays, eliminating bottlenecks to deliver the highest performance and the lowest latency. In addition, EMC Fully Automated Storage Tiering (FAST®) technology and high-capacity NL-SAS drives down costs for storing inactive, less-critical data.
Efficient data replication
VMAX TimeFinder SnapVX allows the user to create snapshots without the need for a target volume. Snapshots can then be used to link to target volumes in either full-copy, or no-copy, mode which can then be presented to the host server.
By leveraging SnapVX, users can easily create copies of MongoDB production data for backups, decision support, data warehouse refreshes, or any other process that requires parallel access to production data.
Advanced data encryption
With D@RE, data is encrypted on all drive types without performance penalty. D@RE secures corporate data on hard drives in and out of the VMAX array providing protection against data theft which is a significant challenge faced by many enterprises today.
Summary