real-time big data analytics for the enterprise · yet implementing real-time big data analytics...

8
Executive Summary Companies are using real-time big data analytics to reshape the competitive landscape in their industries. They do it by capturing, storing, and analyzing volumes and varieties of data that were previously unmanageable, and then extracting insights fast enough to support real-time business processes. What started with a few leading Internet companies has spread to finance, healthcare, government, manufacturing, retail, scientific research, and many other fields. Yet implementing real-time big data analytics can be challenging, requiring IT organizations to implement mission-critical solutions based, at least in part, on open- source software that does not always meet enterprise requirements. Not only is integration complex, but IT organizations must establish security, compliance, and high availability from the ground up to ensure the system is up to the challenge of housing sensitive data and supporting revenue-generating business processes. Intel and SAP have addressed these challenges to provide an enterprise-ready solution for real-time big data analytics. With SAP HANA* running on the latest Intel® Xeon® processor E7 family and the Intel® Distribution for Apache Hadoop* software running on the latest Intel Xeon processor E5 family, businesses can ingest, store, and analyze petabytes of polystructured data, and they can generate insights in fractions of a second to support real-time business processes. This solution includes a rich set of data management and business intelligence tools for turning data into high-value insights that can be embedded into other applications and business processes. Just as importantly, the solution is designed to meet enterprise requirements of security, compliance, and high availability so businesses can confidently integrate sensitive data into their analytics environment. This white paper discusses the value of performing real-time analytics using all available enterprise data and describes how Intel and SAP have overcome the inherent challenges to deliver an enterprise-ready solution. Real-Time Big Data Analytics for the Enterprise SAP HANA* and the Intel® Distribution for Apache Hadoop* Software White Paper Intel® Distribution for Apache Hadoop* Big Data

Upload: others

Post on 24-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-Time Big Data Analytics for the Enterprise · Yet implementing real-time big data analytics can be challenging, ... data-warehouse-like functionality for Hadoop environments

Executive Summary

Companies are using real-time big data analytics to reshape the competitive landscape in their industries. They do it by capturing, storing, and analyzing volumes and varieties of data that were previously unmanageable, and then extracting insights fast enough to support real-time business processes. What started with a few leading Internet companies has spread to finance, healthcare, government, manufacturing, retail, scientific research, and many other fields.

Yet implementing real-time big data analytics can be challenging, requiring IT organizations to implement mission-critical solutions based, at least in part, on open-source software that does not always meet enterprise requirements. Not only is integration complex, but IT organizations must establish security, compliance, and high availability from the ground up to ensure the system is up to the challenge of housing sensitive data and supporting revenue-generating business processes.

Intel and SAP have addressed these challenges to provide an enterprise-ready solution for real-time big data analytics. With SAP HANA* running on the latest Intel® Xeon® processor E7 family and the Intel® Distribution for Apache Hadoop* software running on the latest Intel Xeon processor E5 family, businesses can ingest, store, and analyze petabytes of polystructured data, and they can generate insights in fractions of a second to support real-time business processes.

This solution includes a rich set of data management and business intelligence tools for turning data into high-value insights that can be embedded into other applications and business processes. Just as importantly, the solution is designed to meet enterprise requirements of security, compliance, and high availability so businesses can confidently integrate sensitive data into their analytics environment.

This white paper discusses the value of performing real-time analytics using all available enterprise data and describes how Intel and SAP have overcome the inherent challenges to deliver an enterprise-ready solution.

Real-Time Big Data Analytics for the EnterpriseSAP HANA* and the Intel® Distribution for Apache Hadoop* Software

White Paper Intel® Distribution for Apache Hadoop* Big Data

Page 2: Real-Time Big Data Analytics for the Enterprise · Yet implementing real-time big data analytics can be challenging, ... data-warehouse-like functionality for Hadoop environments

Table of Contents

Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Extending Real-Time Analytics to All Enterprise Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Solving the Challenges of Big Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Advanced Analytics across All Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Industry-Leading Performance for Apache Hadoop . . . . . . . . . . . . . . . . . . . . . . . . 4

Integrated Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

An Enterprise-Ready Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

End-to-End Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Enterprise-Class Manageability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

SAP and Intel: A Shared Vision for Big Data Integration . . . . . . . . . . . . . . . . . . . . . . . . 7

SAP: Single Point of Contact for Service and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2

Real-Time Big Data Analytics for the Enterprise

Page 3: Real-Time Big Data Analytics for the Enterprise · Yet implementing real-time big data analytics can be challenging, ... data-warehouse-like functionality for Hadoop environments

Extending Real-Time Analytics to All Enterprise Data

Advances in data analytics are changing the way businesses compete, enabling them to make faster and better decisions based on real-time analysis. Until recently, companies had to make tradeoffs between deep analysis of large data sets and fast time to results. Intel and SAP are eliminating the need to compromise with an analytics platform designed to deliver real-time query performance while acting on petabytes of both structured and unstructured data.

SAP HANA provides a real-time analytics platform using an in-memory database. Organizations can combine large data sets from their operational systems and other sources and perform complex queries in real time, typically in milliseconds. They can even use a single SAP HANA instance as a common foundation for all their applications, both transactional and analytical. This approach streamlines infrastructure and eliminates the physical and operational complexities of moving large amounts of data from operational systems to analytic systems. With these capabilities, SAP HANA answers the business challenge of delivering data-driven intelligence to support real-time business processes.

Big data introduces a new set of challenges. Companies generate enormous volumes of poly-structured data from Web logs, sensors, call records, social network posts, emails, and many other sources. They need a cost-effective, massively scalable solution for capturing, storing, and analyzing this data. They also need to be able to integrate their big data into their real-time analytics environment to maximize business value. For example, many companies want to analyze the clickstream trails of online customers in combination with historical purchasing patterns to deliver personalized offers and information. Deep analysis across diverse data sets can improve outcomes in such scenarios, but results are needed quickly to positively impact online transactions.

Intel and SAP have collaborated to meet this challenge by integrating the Intel Distribution for Apache Hadoop (IDH) software with SAP HANA, SAP Data Services, and SAP Business Objects. The result is a real-time analytics platform designed to efficiently ingest, store, integrate, and analyze all enterprise data. The platform offers:

• Real-time analytics with cost-effective storage that can scale to petabytes, and potentially exabytes, of data.

• Transparent data integration and query federation, so advanced analytics can be applied across all data using SAP tools and familiar SQL-based programming models.

• Enterprise-class support for security, compliance, and manageability so businesses can realize the advantages of real-time big data analytics more quickly and with reduced cost and risk.

3

Real-Time Big Data Analytics for the Enterprise

Page 4: Real-Time Big Data Analytics for the Enterprise · Yet implementing real-time big data analytics can be challenging, ... data-warehouse-like functionality for Hadoop environments

Solving the Challenges of Big Data Integration

SAP HANA is known for its unmatched query performance at scale. Intel collaborated with SAP engineers to help them optimize their in-memory processing platform to get maximum benefit from the hardware capabilities of the Intel Xeon processor E7 family, including its multicore architecture, large cache, large memory capacity and high-bandwidth I/O channels.

Based on these efforts, SAP HANA speeds query processing times by as much as 10,000 times1 versus traditional data warehouse solutions. The latest Intel Xeon processor E7 v2 family delivers even greater performance benefits and can process much larger in-memory data sets. These new processors support three times more memory than previous-generation processors: up to 6 TB on a four-socket server and up to 12 TB on an eight-socket server. They also provide more cores, threads, and system bandwidth to enable up to 2x faster performance2 for complex, ad hoc queries, compared to previous-generation SAP HANA platforms.

The distributed architecture of Apache Hadoop addresses very different requirements than SAP HANA. Hadoop enables query performance and data capacity to be scaled cost-effectively across tens to hundreds of standard, two-socket servers based on Intel Xeon processors and configured with direct-attached storage drives. This clustered architecture stores and processes data at a cost-per-terabyte that is far lower than traditional data warehousing systems.

Although Hadoop enables fast processing of massive data sets, queries typically take minutes to hours to complete. This creates challenges when integrating Hadoop into a real-time analytics environment. Intel and SAP address these challenges in two ways. First, IDH is highly optimized for performance on Intel®

architecture (see sidebar). Second, Intel and SAP make it easy to generate queries that make efficient use of both platforms.

Advanced Analytics across All Data Sets

SAP HANA and SAP Business Objects provide comprehensive support for advanced analytics, including traditional SQL-based queries, dashboards, predictive analytics, planning, text mining, and more. In combination with IDH, these models can be applied transparently across the data stored in both platforms.

BI users and developers see data stored in IDH as an extension of the data stored in SAP HANA. The queries they generate are automatically federated, as appropriate, across the two platforms. For example, one part of a query might extract customer purchasing data from SAP HANA; another part might search associated Web server logs or call center data records in the Hadoop cluster. The results are then combined and further analyzed in SAP HANA to provide desired insights. As part of this query federation process, some components of the SQL queries generated by BI users and developers are automatically translated into MapReduce* applications that can run natively in Hadoop.

The separate parts of a federated query can be performed simultaneously. They can also be performed asynchronously, so that intermediate results from the Hadoop cluster are available as needed to support real-time processes in SAP HANA. Query performance statistics are provided, so developers can shape queries to address specific latency requirements.

Industry-Leading Performance for Apache Hadoop*

The Intel® Distribution for Apache Hadoop* (IDH) software is optimized with the latest Intel® Xeon® processors, Intel® Solid-State Drives, and 10 gigabit Intel® Ethernet Adapters to deliver:

• Up to 30x higher performance than unoptimized Hadoop software running on legacy hardware.3

• Up to 2 .6x faster performance than other open-source Hadoop distributions running on the same hardware platform.4

Additional optimizations within IDH help to improve performance for other key functions, such as MapReduce* job launches and Hive* queries (Hive provides data-warehouse-like functionality for Hadoop environments and is a key component for integrating the Intel Distribution with SAP HANA*.)

These and other optimizations help to shorten query completion times. They also allow organizations to perform more queries in the time available, which provides greater agility and better utilization of the infrastructure.

4

Real-Time Big Data Analytics for the Enterprise

Page 5: Real-Time Big Data Analytics for the Enterprise · Yet implementing real-time big data analytics can be challenging, ... data-warehouse-like functionality for Hadoop environments

Much of this functionality is supported through the SAP HANA Smart Data Access connector, which Intel and SAP have optimized for use with IDH (Figure 1). This connector supports data relocation as well as the creation of proxy tables within SAP HANA to simplify and accelerate data access and query execution.

Intel implemented a number of optimizations to improve query performance on Apache Hadoop. One example is hot replication, in which multiple replicas of frequently used

data are automatically created to avoid contention. Suppose a company launches a popular new product, and the associated data is under continuous demand. Dozens or even hundreds of replicas can be generated so the data can be accessed and manipulated without bottlenecks.

Another performance-enhancing feature is caching. Frequently used data and intermediate query results are automatically stored in the in-memory database of SAP HANA, so they can be accessed almost instantly when needed.

With these and other optimizations, Intel and SAP help to make the integration between SAP HANA and IDH as seamless and as transparent as possible for BI users and developers.

FIgURE 1 . The SAP HANA* Smart Data Access connector has been engineered and optimized by Intel and SAP to simplify and ac-celerate data sharing and query execution across both platforms. As a result, analysts can achieve fast query results across petabytes of structured and un-structured data.

ETL SAP HANA*

Real-Time Analytics with Big Data Integration

Intel® Distribution for Apache Hadoop Software

SAP HANA Smart Data AccessReal Time SAP Business

Objects

Big Data

SAP DataServices

Optimized for:• Data relocation• Query federation and acceleration (proxy tables, hot replication, caching)

WeatherData

OLAPAnalysis

Open source components with:

DataMining

Reporting

MarketData

LocationData

WebLogs

CallLogs

SensorLogs HDFS

Hadoop* Distributed File System

YARN* (+ MapReduce*)Distributed Processing Framework

Pig*Scripting

Mahout*MachineLearning

R*Stats

Hive*Query

HCatalog*Metadata

Intel® Manager for Apache Hadoop* SoftwareDeployment, Configuration, Monitoring, Alerts, and Security

ConnectorsIngest, Export

Sqoo

p*D

ata

Exch

ange

Ooz

ie*

Wor

kflow

HBa

se*

NoS

QL

Stor

e

Zook

eepe

r*Co

ordi

natio

n

Flum

e*Lo

g Co

llect

or

Some Intel optimization Extensive Intel optimization

5

Real-Time Big Data Analytics for the Enterprise

Page 6: Real-Time Big Data Analytics for the Enterprise · Yet implementing real-time big data analytics can be challenging, ... data-warehouse-like functionality for Hadoop environments

Integrated Data Management

SAP Data Services provides an integrated, enterprise-class platform for data integration, data quality, data profiling, and metadata management. System administrators can use it to load and manage data across both SAP HANA and IDH for SAP. They can also use it to manage data that has been loaded independently into the Hadoop cluster.

An Enterprise-Ready Platform

SAP HANA is engineered specifically to support mission-critical computing environments. Intel implements advanced security and reliability features in the Intel Xeon processor E7 family and related platform components, and works with SAP to ensure they are fully utilized throughout the SAP HANA solution stack.

source and proprietary tools to provide a platform that addresses the requirements of enterprise deployments.

End-to-End Security

IDH provides end-to-end security to protect data. Tools and capabilities include:

• Authentication and Access Control. IDH supports user authentication and role-based access controls. Queries generated in SAP Business Objects are authenticated just once for both SAP HANA and IDH, and IDH provides granular access controls for data and services. Users and queries can only access authorized data sets, which helps to protect sensitive data against both internal threats and external hackers.

Intel® Distribution for Apache Hadoop Intel® Manager

ConnectorsNetezza, Oracle, SAP, SQLServer, Teradata, DB2

Intel proprietary components

Intel-optimized open source components

Includes Intel security enhancements

Kafka*Event Bus

Pig*Scripting

Hcatalo*Metadata

SLURM*Scheduler

HiveQuery

R*Stats

Lucene*, Solr*Search

Mahout*Machine Learning

YARN* (+MapReduce*)Distributed Processing Framework

HDFS | Lustre* | GlusterFSHadoop Compatible File Systems

High Availability and Disaster Recovery

Rhino (Security) [Encryption, Authentication, Authorization, Auditing]

Graph Mining Gryphon*Low-latency SQL-92

Recommendation Engine

Analytics Workbench HBase* Explorer

Vertical Accelerators Heat Map

Security Controls

Job Profiler

Resource Monitor

Upgrade

Alerts

Tuning

Unified Logging

Deployment

Configuration

Behavior Model

Sqoo

p*Da

ta T

rans

fer

HBa

se

Flum

e*Lo

g Co

llect

or

Ooz

ie*

Wor

kflow

Zook

eepe

r*Co

ordi

natio

n

Apache Hadoop, on the other hand, is an open-source software application that combines features and optimizations generated by many companies and individuals. This development model enables exceptionally fast innovation, which is evidenced by the rapid evolution of the Hadoop software ecosystem. However, because of this rapid evolution, there are gaps in most available Hadoop distributions, particularly with respect to security, availability, and manageability. These gaps have kept many businesses from deploying Hadoop in production environments.

Intel has worked to close those gaps in IDH. IDH includes the full open source solution stack, with all components pre-integrated and optimized to improve performance on Intel architecture. Intel also integrates a combination of open

Project Rhino Establishing comprehensive security for Apache Hadoop*

FIgURE 2. The Intel® Distribution for Apache Hadoop* includes extensive enhancements for enterprise-class security and compliance and Intel is working on Project Rhino to establish a comprehensive security framework across the Hadoop* ecosystem. The goal is to provide a common authentication and authorization framework with integrated support for regulatory requirements in financial, healthcare, government, and e-commerce environments.

6

Real-Time Big Data Analytics for the Enterprise

Page 7: Real-Time Big Data Analytics for the Enterprise · Yet implementing real-time big data analytics can be challenging, ... data-warehouse-like functionality for Hadoop environments

could fail without impacting service or data availability. However, the cluster NameNode and Job Tracker servers, which are required in all Hadoop deployments, are potential single points of failure. IDH provides integrated support for high availability for both these critical servers. Intel is also working on the open source Project Ladon, which is designed to support disaster recovery of Apache Hadoop through multisite data replication.

Enterprise-Class Manageability

SAP HANA is typically delivered as an appliance for onsite deployments. All hardware and software is tightly integrated and optimized to simplify deployment and management. Apache Hadoop, on the other hand, is based on open source software that is designed to run on large numbers of off-the-shelf servers. Management can be complex in this more distributed computing environment, and the challenges increase as a cluster grows.

IDH includes Intel® Manager for Apache Hadoop software, which combines open source and proprietary tools to provide enterprise-level manageability, including:

• A user friendly interface for managing access controls and for updating the system. Built-in wizards provide workflows and guidance to speed deployment, simplify upgrades, and improve results.

• Automatic cluster configuration and tuning, using the Intel® Active Tuner. Advanced machine-learning algorithms select the best setup based on workload characteristics to deliver optimized query performance quickly and with no need for complex manual tuning.

• Built-in monitoring, with a dashboard that provides a comprehensive view of the cluster and system health.

• Flexible extensibility, with an application programming interface (API) that allows third-party and custom applications to access the functions in Intel Manager for Apache Hadoop.

SAP and Intel: A Shared Vision for Big Data Integration

Intel and SAP continue to jointly engineer, optimize, and enhance the integration of SAP HANA and IDH. The companies are working together to integrate new functionality and to optimize software to derive maximum benefit from advances in hardware. Some objectives of this collaboration include:

• Simplified troubleshooting, so query failures can be identified, diagnosed, and fixed more quickly and efficiently. Future solutions will include built-in analytics for root-cause analysis.

• Enhanced data relocation, so data can be moved more quickly, flexibly, and transparently between the two platforms.

• Stronger security, by further improving integration and by providing more comprehensive, multilayered protections in both hardware and software.

Intel is also deeply involved in hundreds of open source projects to increase Hadoop performance and functionality, and the results of these efforts will continue to increase the capability and value of IDH. Many of these developments are also offered back to the open source community to help drive innovation and interoperability across the broader big data ecosystem.

• Fast, transparent data encryption. IDH uses Intel® Data Protection Technology with Advanced Encryption Standard New Instructions5 (AES-NI), which accelerates encryption and decryption performance by up to 19 times6, to enable strong data protection without compromising query performance. Data can be encrypted selectively and transparently, both in motion and at rest, to meet security and compliance requirements. Within IDH, transparent encryption is supported in Hive, Pig*, MapReduce, HBase*, and the Hadoop Distributed File System* (HDFS*).

• governance. All database operations are logged across both SAP HANA and IDH and can be audited to verify that users only access authorized data sets and services. Reports and automated alerts help IT protect data and document compliance.

Intel is working to extend these and other security capabilities across the Hadoop ecosystem through an open source project called Project Rhino (Figure 2). The goal is to establish a comprehensive security framework for Hadoop that will help businesses address security issues and compliance protocols across a wide range of use cases in financial, healthcare, government, and e-commerce environments. Project Rhino will contribute code to the Apache Foundation so these capabilities will be freely available.

High Availability

Big data analytics are often used to improve outcomes in revenue-producing business processes, so high availability is important. SAP HANA provides integrated support for data replication and system failover to prevent downtime. Hadoop implements 3-way data replication by default, so that any data node in a cluster

7

Real-Time Big Data Analytics for the Enterprise

Page 8: Real-Time Big Data Analytics for the Enterprise · Yet implementing real-time big data analytics can be challenging, ... data-warehouse-like functionality for Hadoop environments

SAP: Single Point of Contact for Service and Support

SAP HANA and IDH are available from SAP sales teams worldwide. SAP offers full support for the joint solution. SAP also offers comprehensive consulting services, from initial planning and assessment through implementation and ongoing optimization. The speed, scale, and flexibility of the platform go far beyond what has been possible in the past, and IT organizations can accelerate deployment by working with experts who have extensive experience with SAP HANA and Apache Hadoop.

Conclusion

SAP and Intel provide an optimized solution for real-time big data analytics based on SAP HANA and the Intel Distribution for Apache Hadoop. Using this joint solution, data and business analysts can combine the performance of in-memory analytics with the massive scalability of Apache Hadoop. As a result, they can store and analyze petabytes of poly-structured data cost effectively at the speeds needed to support real-time business processes.

Intel and SAP have worked closely together to optimize the combined platform to support fast, federated queries that tighten the seams between the two platforms and make it easier for BI users to get the results they want without worrying about the infrastructure. The solution is designed to support enterprise requirements for security, availability, and manageability, so IT organizations can integrate the platform into their datacenter while minimizing cost and risk.

Intel Distribution for Apache Hadoop: http://hadoop .intel .com

SAP Big Data: www .sap .com/bigdata

1. Source: Sikka, Vishal, SAP. “The Business Value of Speed! Lessons from 10,000X SAP HANA Performance Club.” August 2012. http://www.saphana.com/community/blogs/blog/2012/08/05/the-business-value-of-speed.

2. Source: Intel internal measurements November 2013. Configurations: Baseline 1.0x: Intel® E7505 Chipset using four Intel® Xeon® processors E7-4870 (4P/10C/20T, 2.4GHz) with 256GB DDR3-1066 memory scoring 110,061 queries per hour. Source: Intel Technical Report #1347. New Generation 2x: Intel® C606J Chipset using four Intel® Xeon® processors E7-4890 v2 (4P/15C/30T, 2.8GHz) with 512GB DDR3-1333 (running 2:1 VMSE) memory scoring 218,406 queries per hour. Source: Intel Technical Report #1347.

3. Source: TeraSort Benchmarks conducted by Intel in December 2012. Custom settings: mapred.reduce.tasks=100 and mapred.job.reuse.jvm.num.tasks=-1. Cluster configuration: One head node (name node, job tracker), 10 workers (data nodes, task trackers), Cisco Nexus* 5020 10 Gigabit switch. Performance measured using Iometer* with Queue Depth 32. Baseline worker node: SuperMicro SYS-1026T-URF 1U servers with two Intel® Xeon® processors X5690 @ 3.47 GHz, 48 GB RAM, 700 GB 7200 RPM SATA hard drives, Intel® Ethernet Server Adapter I350-T2, Apache Hadoop* 1.0.3, Red Hat Enterprise Linux* 6.3, Oracle Java* 1.7.0_05. Baseline storage: 700 GB 7200 RPM SATA hard drives, upgraded storage: Intel® Solid-State Drive 520 Series (the Intel® Solid-State Drive 520 Series is currently not validated for data center usage). Baseline network adapter: Intel® Ethernet Server Adapter I350-T2, upgraded network adapter: Intel® Ethernet Converged Network Adapter X520-DA2.Upgraded software in worker node: Intel® Distribution for Apache Hadoop* software 2.1.1. Note: Solid-state drive performance varies by capacity. More information: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html.

4. Source: Terasort Benchmarks conducted by Intel. Configuration details: One head node (name node, job tracker), 10 workers (data nodes, task trackers), Dual Intel® Xeon® processor [email protected] GHz, 32 cores per node, 7 x 1 TB dedicated data disks per node, 10 GbE network. System Swap turned off, Kernel Buffer Cache cleared before each performance test.

5. No computer system can provide absolute security. Requires an enabled Intel® processor and software optimized for use of the technology. Consult your system manufacturer and/or software vendor for more information.

6. Source: Intel Internal tests using OpenSSL 1.0.1c* encryption software to encrypt and decrypt a 1 GB text file, with and without AES-NI enabled. Server configuration: 4-socket server with 4 x Intel® Xeon® processor E5-2690 (32 core system, 1 core used in testing), 32 GB memory, CentOS 6.3* operating system, Apache Hadoop Distributed File System* (HDFS*) with namenode, datanode, and the test program all run on the same server, 240 GB Intel® Solid State Drive 320 Series storage. For details, see the Intel Solution Brief, “Fast, Low-Overhead Encryption for Apache Hadoop*.” http://hadoop.intel.com/pdfs/IntelEncryptionforHadoopSolutionBrief.pdf

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A “Mission Critical Application” is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL’S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS ,COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS’ FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

© 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Core, Xeon, Intel Inside, the Intel Inside logo, the Look Inside. logo, and Look Inside. are trademarks of Intel Corporation in the U.S. and/or other countries.*Other names and brands may be claimed as the property of others. Printed in USA 0214/MR/CMD/PDF Please Recycle 329774-001US

Real-Time Big Data Analytics for the Enterprise