rackspace analytical compute grid (acg)
DESCRIPTION
Rackspace’s Enterprise Business Intelligence group (EBI) was looking for a cost-effective way to support the reporting and information needs of its internal users, which include business and operations personnel. It was also looking to scale out new infrastructure in order to meet their increasing business demands, house increasing amounts of data, and customize the collection of data, while seeking a way to move away from their legacy Data Warehouse solution. To do this, Rackspace built the Analytical Compute Grid (ACG) by using Hadoop, Cassandra and PostgreSQL with an OpenStack cloud. Read more about it in this presentation.TRANSCRIPT
April 12, 2023
Analytical Compute Grid (ACG)
Elastic “Big Data” Infrastructure
by Natasha Gajic
Big Data on Open Cloud
2RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Rackspace’s EBI Environment
Current EnvironmentWindows and Linux
operating systemsOracle and Microsoft
databases solutionsMicrosoft and Oracle
replication technologySSISInformaticaDedicated serversRapid data set growth
“Big Data” ProblemCost of purchasing
additional licensesTime required to set up
new hardwareIncreased demand for DBA
resourcesSystem performanceSystem scalabilityCapacity
3RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Analytical Compute Grid (ACG) Features
•Host ever growing set of data•Quick data collection and retrieval•Rapid scalability•Ease of maintenance•Provide standard data access API
4RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Analytical Compute Grid (ACG) Features
•Ability to provide variety of storage types:
• Columnar
• Relational
• HDFS
•Enable users to select optimal storage type for information collected
•Leverage Rackspace® Private Cloud powered by OpenStack® and open source technology
5RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Analytical Compute Grid (ACG) Quality Attributes
6RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
High Level Architecture
7RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
8RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Image
9RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Database Engine Selection
Columnar Cassandra
Relational PostgreSQL
HDFS Hadoop
10RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Node
11RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Node
12RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Node
13RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Node
14RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
15RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
16RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
17RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
API
18RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
19RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
20RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
What is ACG Indexing Structure?
• System entry point
• Set of pointers ultimately addressing database entities
21RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
What is ACG Indexing Structure?
• System entry point• Set of pointers ultimately addressing database entities
Where is Indexing Structure Located?
• It is a part of ACG so it resides on Open Cloud• ACG Controller manages Indexing Structure
22RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
What ACG Indexing Structure Enables?
• Splitting of large data sets across many instances• Query parallelization• Controlled data store size• Optimal data store configuration• Uniform access to data residing in various storage types• System scalability as it expands horizontally and vertically to address ever growing data set
23RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes
24RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud powered by OpenStack®
Creates ACG node in 30 secondsCreates ACG nodes concurrentlyRe-size ACG nodes adding CPUs
25RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud powered by OpenStack®
Creates ACG node in 30 secondsCreates ACG nodes concurrentlyRe-size ACG nodes adding CPUs
ACG
Indexing structure and controlled data set size allow for: Quick data distribution Query parallelization
26RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability
Rackspace® Private Cloud powered by OpenStack®
Rapidly replace failed ACG nodes
27RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability
Rackspace® Private Cloud powered by OpenStack®
Rapidly replace failed ACG nodes
ACG
Deploys data store native availability mechanisms (replication, data distribution…)
28RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud powered by OpenStack®
Adding ACG nodes expands: Storage capacity CPU power MemoryNo DBA or system administrators activity required
29RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud powered by OpenStack®
Adding ACG nodes expands: Storage capacity CPU power RAM No DBA or system administrators activity required
ACG
Controlled data set size enables: Optimal and stable data store configuration Reducing demand for managing data store objects Stable query execution plans
30RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility
ACG
Variety of storage types:Columnar – Cassandra : time series dataRelational – PostgreSQL : relational dataHDFS – Hadoop : un-structured data
Ability to select optimal storage type for individual use case
31RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
Standard interfaces: SQL language JDBC API ODBC
ACG Management Console
ACG Monitoring Console Loader utility implementing: Bulk Loader Insert Loader
32RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Current State
33RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Current State
ACG Controller
•ACG Manager•Rule Engine•Node Manager•ACG Management Console•ACG Monitoring
Columnar Implementation
•Data Store Controller•JDBC extended to work with supercolumn•Loader integrated with Informatica
Relational Implementation
•Data Store Controller•JDBC driver extended with distributed query rewrite•Loader integrated with Informatica•ODBC (In Progress)
HDFS Implementation
•Will start soon
34RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
35RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Subject:
• Complex availability calculation sourcing 3 months of monitoring data and creating 1 billion records in initial calculation
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
36RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Environment 1
• Data Warehouse Microsoft SQL server database• SSIS data loading• SQL server with 24 CPUs and 250GB RAM was dedicated to the initial calculation
• SQL server stored procedure performed the calculation
• Source and result are stored in traditional data warehouse structure
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
37RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Environment 2
• ACG running two Cassandra clusters 4 nodes each
• Informatica with Cassandra bulk loader• Each ACG node has 2CPUs and 8GB RAM• Java program running on instance with 4CPUs and 8GB RAM
• Source and result are stored in columnar structure suitable for time series data
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
38RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Calculation Duration
•Microsoft SQL Server lasted 5 days•ACG calculation completed in 3.5 hours
• Storage Size• Microsoft SQL server 500GB •ACG 20 GB
• Complexity of the calculation•Columnar data store is optimal for time series data. Sourcing from columnar data store resulted in relatively simple Java calculation process comparing to SQL server stored procedure
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Result
39RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Selecting optimal data store for use case resulted in:
• Substantial performance improvement• Reduced storage demand•Simplified processes•Ability to process terabytes of data per day close to real-time and on-demand
•Improved trending and reporting:• enhances support capabilities
• improved Rackspace customer experience
• Significant cost reduction
ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
40
RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM
RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM
41RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG UI
42RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG UI
43RACKSPACE® HOSTING | WWW.RACKSPACE.COM
ACG UI