data in motion - data at rest - hortonworks a modern architecture

Data in Motion Data at Rest Mats Johansson Solutions Engineer - EMEA 2016-03-22

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About Hortonworks Customer Momentum +800 customers

Public company - NASDAQ: HDP

The Leader in Connected Data Platforms Hortonworks DataFlow for data in motion

Hortonworks Data Platform for data at rest

Partner for Customer Success Leader in open-source community, focused

on innovation to meet enterprise needs

Unrivaled support & professional services

Founded in 2011

Original 24 Architects, Developers, Operators of Hadoop from Yahoo!

850+ E M P L O Y E E S

1600+ E C O S Y S T E M P A R T N E R S

Presenter

Presentation Notes

TALK TRACK When you choose a vendor, you choose a platform that will be with you for a long time. As the originators of both the Apache Hadoop and Apache NiFi technologies as well as the Open Enterprise Hadoop category, Hortonworks is uniquely positioned to help you transform your business with actionable intelligence. We are the only publicly traded pure-play Hadoop company Our momentum is accelerating. We’ve added about half of our subscribers in the last two quarters. Those customers come to us for all the reasons I’ve described: our superior technology that evolves at the pace of open innovation, our proven model for partnering with our customers and our dedication to helping them succeed. I’d like to come back for a larger meeting and a use case workshop to start the journey together. Would next week work for you? [NEXT SLIDE] SOURCE: http://hortonworks.com/about-us/quick-facts/


EMBRACE AN OPEN APPROACH

MASTER THE VALUE OF DATA

EVERY BUSINESS IS A DATA BUSINESS

Presenter

Presentation Notes

TALK TRACK After all, every business is a data business. Tomorrow’s champions are already mastering the value of data and embracing an open approach Hortonworks understands what these future champions need – we work arm in arm with hundreds of these Open, Innovative Enterprises as they create products, services and intelligence. We support these companies that use our products, but more importantly, we partner with them to realize the future of data. This is a future without car crashes or medical accidents. In this future, trains never derail. There is no more money laundering, no more computer viruses, wasted food or dropped calls. That is the future of data. [NEXT SLIDE]


Blind Spots Block Your Ability to Use All the Data

GROUP 3

GROUP 2 GROUP 4

GROUP 1 INTERNET

OF ANYTHING

Fragmented Data-at-rest increases the cost of insight

Data-in-motion streams through your blind spots

Various Data types and massive Volumes

Presenter

Presentation Notes

TALK TRACK Hortonworks customers want to harness the disruptive power of this data flood. They come to us because they know they have Big Data blind spots. They may have petabytes of data at rest, but it’s fragmented across many groups and storage platforms. It’s expensive and time consuming to locate, move and combine the fragmented data. They know that valuable data in motion is streaming by them, but they have no easy way to capture it or analyze it. At Hortonworks, we help our customers manage both data at rest and in motion. [NEXT SLIDE]


The Future of Data Actionable Intelligence

D A T A I N M O T I O N

STORAG

E STO

RAG

E

GROUP 2 GROUP 1

GROUP 4 GROUP 3

D A T A A T R E S T

INTERNET OF

ANYTHING

Presenter

Presentation Notes

TALK TRACK Hortonworks is powering the future of data. Whether from data at rest or data in motion, we help our customers tap into all the data. We give the world’s leading companies and government agencies actionable intelligence to do things that were never before possible. Actionable intelligence means that you can capture perishable insights in real-time by analyzing data in motion. It means drilling into terabytes or petabytes of data at rest for historical insights. And, in turn, those historical insights help you tune your streaming analytics and data flows. Modern data applications live and breath at the intersect between those Connected Data Platforms and the data they manage. Those are the innovative killer applications that deliver actionable intelligence for data discovery, a single view of the data or predictive analytics. [NEXT SLIDE]


Hortonworks® customers leverage our Connected Data Platforms to transform their industries – renovating their IT architectures and innovating with their Data in Motion or Data at Rest to power actionable intelligence through Modern Data Applications.

Social Mapping

Payment Tracking

Factory Yields

Defect Detection

Call Analysis Machine Data

Product Design M & A

Due Diligence

Next Product Recs

Cyber Security

Risk Modeling

Ad Placement

Proactive Repair

Disaster Mitigation

Investment Planning

Inventory Predictions

Customer Support

Sentiment Analysis

Supply Chain

Ad Placement

Basket Analysis Segments

Cross- Sell

Customer Retention

Vendor Scorecards

Optimize Inventories

OPEX Reduction

Mainframe Offloads

Historical Records

Data as a Service

Public Data

Capture

Fraud Prevention

Device Data Ingest

Rapid Reporting

Digital Protection


Presenter

Presentation Notes

TALK TRACK At Hortonworks, we partner with our customers and guide them on their Journey to Actionable Intelligence. You can start your journey anywhere you want. You can renovate your IT architecture to reduce costs and boost functionality. Or you can innovate modern data applications that you use at your own company or sell on the open market. You can start with the most sophisticated use cases if your team is experienced, or you can build your expertise by beginning with less complex use cases that bring quick results. As you build your team’s expertise and comfort with Hortonworks Data Platform and Hortonworks DataFlow, you can then tackle more challenging aspects of your road map. We will help you plan the right path to meet your objectives. Now I’d like to tell you about one Hortonworks customer, Progressive Insurance, and their journey towards eliminating auto accidents. [NEXT SLIDE] DISCUSSION STRATEGY [For Business Prospects] Focus questions What business problems can we help you solve? Which use case would you like to tackle first? What type of challenge is most important: data discovery, building a single view or creating predictive analytics? Calls to action Recommend the Jumpstart package: http://hortonworks.com/services/jumpstart/ Schedule a use case workshop and plan your journey across your most important use cases. Give them an industry-specific White Paper to read. [For IT Prospects] Focus questions Where do you face the most cost pressure to store and process data? Which use case would you like to tackle first: active archive, ETL offload or data enrichment? Calls to Action Recommend that they download Hortonworks Sandbox: http://hortonworks.com/products/hortonworks-sandbox/ Schedule a use case workshop Give them the EDW Optimization White Paper to read: http://hortonworks.com/info/hadoop-and-a-modern-data-architecture/


DATA AT REST DATA IN MOTION

ACTIONABLE INTELLIGENCE

Modern Data Applications

PERISHABLE INSIGHTS

HISTORICAL INSIGHTS

INTERNET OF

ANYTHING

Hortonworks DataFlow

Hortonworks Data Platform

Hortonworks Connected Data Platforms

Presenter

Presentation Notes

TALK TRACK I’m about to go over the products, consulting and training that Hortonworks offers, and I want you to keep this image in mind. Remember: TALK TRACK Hortonworks is powering the future of data. Whether from data at rest or data in motion, we help our customers tap into all the data. We give the world’s leading companies and government agencies actionable intelligence to do things that were never before possible. Actionable intelligence means that you can capture perishable insights in real-time by analyzing data in motion. It means drilling into terabytes or petabytes of data at rest for historical insights. And, in turn, those historical insights help you tune your streaming analytics and data flows. Modern data applications live and breath at the intersect between those Connected Data Platforms and the data they manage. Those are the innovative killer applications that deliver actionable intelligence for data discovery, a single view of the data or predictive analytics. Here are just a few of the modern data apps that convert yesterday’s impossible challenges into today’s new products, cures, conveniences and life saving innovations. These apps are either custom-built by our customers or they come of the shelf, created by Hortonworks or one of of our ecosystem partners to solve a particular problem. Symantec and other cyber security leaders have built powerful apps to detect threats to digital information. Leading pharma, automotive, consumer electronics and packaged goods companies are building their factories of the future that use actionable intelligence to improve manufacturing yields. And age-old industries like automotive, agriculture and retail are taking connected data platforms on the road, through the field or to the cash register to do things that have never before been possible. Capturing perishable �insights from data in motion Ensuring rich, historical insights�on data at rest Necessary for modern�data applications Actionable intelligence means that you can capture perishable insights in real-time by analyzing data in motion. It means drilling into terabytes or petabytes of data at rest for historical insights. And, in turn, those historical insights help you tune your streaming analytics and data flows. Modern data applications live and breath at the intersect between those Connected Data Platforms and the data they manage. Those are the innovative killer applications that deliver actionable intelligence for data discovery, a single view of the data or predictive analytics. The Internet of Anything is doubling the amount of data in the world every 2 years. Connected Data Platforms deliver an open-architected solution to manage data, both in motion and at rest, empowering your organization to gain Actionable Intelligence delivered to your end users through Modern Data Apps. Hortonworks DataFlow (aka HDF) manages your data in motion—bringing it to where you need it for real-time analysis to capture perishable insights or into storage for historical analysis. Hortonworks Data Platform (aka HDP) stores the data at rest and provides historical insights through deep, detailed analysis of everything that’s already happened. Those historical insights from HDP help optimize your data ingest with HDF, which in turn optimizes your data at rest. This is how HDF, HDP, and Modern Data Applications deliver actionable intelligence to your end users. And Actionable Intelligence is the beating heart animating the Future of Data. [NEXT SLIDE]


Store Data

Process and Analyze Data

Acquire Data

Data in Motion : Easy, Simple, Definitive

Dataflow


Reality of Data in Motion: Complex, Chaotic, Messy

Store Data

Process and Analyze Data

Acquire Data

Store Data Store Data

Store Data

Store Data

Acquire Data

Acquire Data

Acquire Data

Dataflow

Presenter

Presentation Notes

In reality, dataflows move all over. Data is moved and stored in multiple places – sometimes interim, sometimes longterm. Data is procesed in different places, and then moved again. Complicated, convoluted, messy.


Enterprise DataFlow Challenges

• Variable Protocols, Formats , Schemas

• Data Size and Speed • Security at Data Plane

• Traceability (Data Lineage) • Prioritization of Resources

• Multi-Directional Flow

• Recoverability and Replay • Transparency of DataFlow • Scaling Down

• Enrichment/Transformation

• Unreliable Comms

GATHER

DELIVER

PRIORITIZE

Track from the edge Through the datacenter

Presenter

Presentation Notes

Typical Answer to Challenges Add Systems…. Add new systems to handle the protocol differences Add new systems to convert the data Add new systems to reorder the data Add new systems to filter the unauthorized data Add new system to slow down or speed up data Add new topics to represent ‘stages of the flow’ And Complexity….


HDF makes Data in Motion Easy

Complicated, messy, and takes weeks to months to move the right data into Hadoop

HDP HORTONWORKS DATA PLATFORM

Streamlined, Efficient, Easy

Presenter

Presentation Notes

Talk Track: Easy button for data ingest with real-time, interactive visual control of dataflows “Data logistics” - like Fedex or UPS for transport and logistics of goods, but for data Accelerates ROI of big data by removing manual labor time and costs of data collection and transport by removing the need for any kind of coding - MONTHS to MINUTES Move data into HDP in 7 minutes or less (refers to this video), also Click Demo Maximize value of HDP/CDW/MAPR/Storm/Spark by making it easy to get data into it


Secure

Real-time

Adaptive

Integrated

Hortonworks DataFlow for Data in Motion Powered by Apache NiFi

Presenter

Presentation Notes

TALK TRACK As part of the Hortonworks Connected Data Platforms, Hortonworks DataFlow manages data in motion. HDF is powered by the 100% open source Apache NiFi project which has its origins at the United States NSA. For managing data in motion, HDF is: real-time, integrated, secure and adaptive. [NEXT SLIDE]


SOURCES REGIONAL INFRASTRUCTURE

CORE INFRASTRUCTURE

HDF Manages Dataflow

Constrained High-Latency

Localized Context

Hybrid – Cloud/On-Premise Low-Latency

Global Context


Real-Time, Visual Control of Data Flows

Add and Adjust Data Sources to maximize the opportunity that you capture from perishable insights

Visually Trace the Data Path to manage the what, who, where and how around data in motion

Dynamically Adjust the Pipeline to match the dataflow with your bandwidth

H O R T O N W O R K S D ATA F L O W Add and adjust

data sources

Visually trace the data path

Dynamically adjust the pipeline

Presenter

Presentation Notes

TALK TRACK First real-time. Hortonworks Dataflow provides real time, interactive control of live data flows. This accelerates value from your big data solution, increases ROI, and allows you to capture insights that may be perishable. Either you act now or you forever lose the opportunity. With HDF, you have the ability to add and adjust data sources and also manage the connection between the sources and destination. You can visually trace the data path to determine if the data is valuable, trustworthy, and usable. And if your data needs change you can dynamically adjust your data flow pipeline by changing which information to collect from the source or the priorities of data within the flow. That way you only collect what you need when you need it. [NEXT SLIDE]


Ecosystem: 130+ Processors

HTTP

Syslog

Email

HTML

Image

Hash Encrypt

Extract

Tail Merge

Evaluate

Duplicate Execute

Scan

GeoEnrich

Replace

Convert Split

Translate

HL7

FTP

UDP

XML

SFTP

Route Content

Route Context

Route Text

Control Rate

Distribute Load AMQP

Presenter

Presentation Notes

The Apache NIFi component of HDF is a data logistics platform that connects any data source to any destination, and provides a universal translation system so to speak - to allow different systems to connect with each other, transforming and delivering previously incompatible data formats and protocols into usable, and easily ingested data for analysis. Through a rdag Apache NiFi has been undergoing tremendous growth and community involvement. There are now 66 contributors and over 130+ processors – an increase of 30% since HDF was first released in Sep 2015. In 6 months time, the community has been growing to expand a wide variety of needs – for different data sources, different transformations, file formats and types. New processors added since Sept include processors supporting: Elastic Search, Splunk, Couchbase, Microsoft Event Hubs, Amazon S3 Details: For instance, one social media commentary stated “Apache NiFi is a relatively new data processing system with a plethora of general-purpose processors and a point and click interface. “ You are going to love it!” In comparison, Since Dec 2014 Streamsets has received code contributions from 11 people whereas Apache NiFi has received contributions from more than 66 contributors.


Graphical

Presenter

Presentation Notes

Drag and drop processors to build a flow Start, stop, and configure components in real time View errors and corresponding error messages View statistics and health of data flow Create templates of common processor & connections Powerful and reliable system to process and distribute data Directed graphs of data routing and transformation Web-based User Interface for creating, monitoring, & controlling data flows Highly configurable - modify data flow at runtime, dynamically prioritize data Data Provenance tracks data through entire system Easily extensible through development of custom components


Integrated Processes and Control

Optimize Your Architecture • Reduce cost and complexity with the

most efficient data collection technologies

Assure Efficient Operations • Via real-time control of data inputs,

outputs, transportation and transformations

Rely on a Common Foundation • Eliminating dependence on multiple

customized systems

C O M M O N A R C H I T E C T U R E

W I T H O U T H O R T O N W O R K S D A T A F L O W

W I T H H O R T O N W O R K S D A T A F L O W

Ingest

Scripts

Messaging

Scripts

HORTONWORKS DATAFLOW

Presenter

Presentation Notes

TALK TRACK The second important characteristic of HDF is that it is integrated. That integration helps you move from slow, laborious ingest processes involving multiple engines and scripts to one seamless, efficient, bi-directional data ingest engine. Hortonworks DataFlow optimizes your architecture for faster, easier, data movement and control. HDF assures efficient operations to reduce costs and optimize your use of precious human resources. And it allows you to rely on a common foundation for all systems to interact with and a single system for users to learn, maintain and manage. [NEXT SLIDE]


Secure Flows with Chain of Custody and Provenance

H O R T O N W O R K S D ATA F L O W

End-to-End Security Apply security rules to encrypt, decrypt, filter and replace data from the point of collection at the jagged edge to its final destination

Granular Control and Sharing Move beyond role-based access and dynamically share an entire dataflow

Real-Time Traceability Rich metadata and contextual detail helps troubleshoot security issues and informs timely decisions

Presenter

Presentation Notes

TALK TRACK Of course, you also need to know that your data is secure. Can you trust it? How do you ensure that the right people see the right data, when they need it? Hortonworks DataFlow comes with end-to-end security to encrypt, decrypt, filter and replace data from origin at the jagged edge all the way through to its final destination. HDF allows granular control and sharing to share appropriate bits of data, with the right parties, without creating inadvertent risks. And it gives you to real-time traceability and provenance for the data, to see the data chain of custody. This gives you useful information on what should be discarded, or not even collected in the future. [NEXT SLIDE]


Adapt to a Broad Range of Data Flow Demands

Automated Bi-directional communication between source and destination adapts data flows automatically, according to current priorities

On-Demand Operational control to adapt to changing conditions and requirements

Scalable By incorporating data from any device—small machine sensors to enterprise data centers—HDF connects you to the broadest set of disparate data sources

Presenter

Presentation Notes

TALK TRACK And finally, you need a data flow solution that can adapt to a broad range of demands. The original architects of Apache NiFi now work at Hortonworks. From the very beginning at the NSA, these architects designed it to meet the changing needs of an evolving data landscape. Hortonworks Dataflow can adapt automatically. If the connection is poor, HDF automatically prioritizes data and skinnies down what data is sent, and saves the rest for later. If the connection is good, it will automatically readjust and send on everything it has been holding on to. Hortonworks Dataflow can also adapt on demand. With its visual real time interface, operators can manually start, stop, reroute, change, or adjust a data source. That change takes effect immediately. And HDF is scalable. Data can come from large scale enterprise servers or small JVM sensors. Big data or small data, HDF scales to support diverse quantities or types of data. [NEXT SLIDE]


Hortonworks Data Platform for Data at Rest Powered by Open Enterprise Hadoop

Open

Interoperable

Ready

Central

Presenter

Presentation Notes

TALK TRACK Now let’s move to the other side of Hortonworks Connected Data Platforms. Our company was founded on Hortonworks Data Platform’s unique ability to manage Big Data at rest. This is Open Enterprise Hadoop, a platform that is: 100% Open Source Centrally architected with YARN at its core Interoperable with existing technology and skills, AND Enterprise-ready, with data services for operations, governance and security [NEXT SLIDE]


YARN : Data Operating System

DATA ACCESS SECURITY GOVERNANCE & INTEGRATION OPERATIONS

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

N

Administration Authentication Authorization Auditing Data Protection Ranger Knox Atlas HDFS Encryption

Data Workflow Sqoop Spark Flume Kafka NFS WebHDFS

Provisioning, Managing, & Monitoring Ambari Cloudbreak Zookeeper

Scheduling Oozie Falcon

Batch

MapReduce

Script

Pig

Search

Solr

SQL

Hive

NoSQL

HBase Accumulo Phoenix

Stream

Storm

In-memory

Spark

Others

ISV Engines

Tez Tez Tez Slider Slider

HDFS Hadoop Distributed File System

DATA MANAGEMENT

Hortonworks Data Platform 2.4

Deployment Choice Linux Windows On-Premise Cloud

Data Lifecycle & Governance Falcon Atlas

Source Systems

Clickstream

Social/Web

Geolocation

Machine/Sensor

Server Log

Unstructured

CRM/ERP

ODS

EDW

Target Systems

ODS

EDW

Visualization & Reporting

Business Applications

Data Marts

Hortonworks Data Platform


OPERATIONS SECURITY

GOVERNANCE

STORAG

E STO

RAG

E

Machine Learning Batch

Streaming Interactive

Search

Ready for Consistent Operations

OPERATIONS

YA R N D A T A O P E R A T I N G S Y S T E M

Shared Big Data Platform across applications, business groups, functions and users

Centralized management and monitoring of Hadoop clusters

Automated Provisioning either on-premises or in the cloud with the Cloudbreak API for clusters in minutes

Managed Services for high availability and consistent lifecycle controls, with dashboards and alerts

Presenter

Presentation Notes

TALK TRACK YARN is the architectural center of Open Enterprise Hadoop. It: Coordinates cluster-wide service for operations, data governance and security. It allocates resource amongst diverse applications that process the data It maximizes your data ingest, by helping ingest all types of data And it allows you to confidently extend your big data assets to the largest possible audience within your organization Open Enterprise Hadoop provides consistent operations, with: Centralized management and monitoring of clusters through a single pane of glass Automated provisioning, either on-premises or in the cloud with the Cloudbreak API. You can mange one huge data lake, or spin up and spin down multiple clusters as needed. You choose. And also, Managed services to make sure that your cluster is highly available. [NEXT SLIDE] SUPPORTING DETAIL Consistent Operations Why this matters to our customers: From the launch of your first cluster, to the changing access patterns as more users come online, through expansion to contain the growing amounts of data—your operations team needs to keep Hadoop working to meet your business objectives. Proof point: Hortonworks Data Platform includes Apache Ambari, Cloudbreak and Hortonworks SmartSense—a complete set of simplified tools to make the most of your investment in Open Enterprise Hadoop. Citation: “At The Mobile Majority, we have been using Hortonworks Data Platform to optimize ad performance on behalf of our customers. We’re excited to look into Hortonworks SmartSense as a way to continuously optimize our HDP cluster as it grows over time,” said Cheolho Minale, vice president of technology. | http://hortonworks.com/blog/introducing-availability-of-hdp-2-3-part-3/


Ready for Trusted Governance

OPERATIONS SECURITY

GOVERNANCE

STORAG

E STO

RAG

E



Search

GOVERNANCE


Data Management along the entire data lifecycle with integrated provenance and lineage capability

Modeling with Metadata enables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilities

Interoperable Solutions across the Hadoop ecosystem, through a common metadata store

Presenter

Presentation Notes

TALK TRACK Open Enterprise Hadoop enables trusted governance, with: Data lifecycle management along the entire lifecycle Modeling with metadata, and Interoperable solutions that can access a common metadata store. [NEXT SLIDE] SUPPORTING DETAIL Trusted Governance Why this matters to our customers: As data accumulates in an HDP cluster, the enterprise needs governance policies to control how that data is ingested, transformed and eventually retired. This keeps those Big Data assets from turning into big liabilities that you can’t control. Proof point: HDP includes 100% open source Apache Atlas and Apache Falcon for centralized data governance coordinated by YARN. These data governance engines provide those mature data management and metadata modeling capabilities, and they are constantly strengthened by members of the Data Governance Initiative. The Data Governance Initiative (DGI) is working to develop an extensible foundation that addresses enterprise requirements for comprehensive data governance. The DGI coalition includes Hortonworks partner SAS and customers Merck, Target, Aetna and Schlumberger. Together, we assure that Hadoop: Snaps into existing frameworks to openly exchange metadata Addresses enterprise data governance requirements within its own stack of technologies Citation: “As customers are moving Hadoop into corporate data and processing environments, metadata and data governance are much needed capabilities. SAS participation in this initiative strengthens the integration of SAS data management, analytics and visualization into the HDP environment and more broadly it helps advance the Apache Hadoop project. This additional integration will give customers better ability to manage big data governance within the Hadoop framework,” said SAS Vice President of Product Management Randy Guard.” | http://hortonworks.com/press-releases/hortonworks-establishes-data-governance-initiative/


Ready for Comprehensive Security

OPERATIONS SECURITY

GOVERNANCE

STORAG

E STO

RAG

E



Search

SECURITY


Comprehensive Security through a platform approach

Fine-Grained, Flexible Authorization controlling access based on roles or data tags

Encrypt Data at rest and in motion

Centralized Administration of security policies and user authentication

Presenter

Presentation Notes

TALK TRACK And of course, enterprise-readiness means comprehensive security through a platform approach. This includes: Encryption of data at rest and in motion Centralized administration of polices for authentication, and Fine-grain authorization to control data access. These are the four pillars of the emerging solution category known as Open Enterprise Hadoop. But why is Hortonworks uniquely positioned to lead this category? HDF offer secure data flows from the point of origin, through to destination. However, security is much more than just securing the data itself. There are layers of security involved- it is important to be able to decide in real time if a person or a system is allowed in real-time to access a specific piece of data within a dataflow – represented by the pieces of the pie within the hexagon, along with the ability to trace the data chain of custody (provenance) from source to destination. Beyond that, to make security more seamless - HDF 1.2 supports centralized Kerberos authentication capability. Comprehensive Security Why this matters to our customers: Data is valuable, and like any other valuable asset, it must be secure from corruption or theft. Enterprises need easy, centralized tools for protecting their Big Data across their entire ecosystems. Proof point: HDP provides comprehensive security through a platform approach with Apache Ranger as a single pane of glass for security policy administration and Apache Knox protects the perimeter. HDP can encrypt data at rest or in motion. These integrated components let security administrators administer policies, authenticate and authorize users, and protect the data from misuse. Citation: “Chris Twogood, Teradata vice president of products and services, said in an interview with CRN that "security is obviously a very important component" of big data systems, and he praised Hortonworks for pushing security improvements in its software, especially the new encryption and authorization capabilities.” | �Source: http://www.crn.com/news/applications-os/300077188/new-hortonworks-hadoop-release-offers-bulked-up-security-enhanced-data-governance-capabilities.htm


Fast SQL with Apache Hive at Scale

H I V E O N YA R N

OPERATIONS SECURITY

GOVERNANCE

STORAG

E STO

RAG

E Pluggable Architecture supports Apache Hive, Pivotal HAWQ and other leading SQL engines

Familiar SQL Query Semantics enable transactions and SQL:2011 Analytics for rich reporting

Unprecedented Speed at Extreme Scale returns query results in interactive time, even as data sets grow to petabytes

Presenter

Presentation Notes

TALK TRACK While Spark at Scale is new and hot, SQL is still the lingua franca of data analysis. HDP includes a tool for those millions of analysts: Apache Hive on YARN. It provides: A pluggable architecture for Hive and other SQL engines like Pivotal HAWQ, Familiar semantics, that enable transactions and rich reporting via SQL:2011 Analytics, AND Unprecedented speed at extreme scale, returning query results in interactive time—even as the data set grows towards petabyte size. [NEXT SLIDE]


Agile Analytics with Enterprise Spark 1.6 at Scale S PA R K O N YA R N

OPERATIONS SECURITY

GOVERNANCE

STORAG

E STO

RAG

E

Powering Agile Analytics via Zeppelin data science notebooks and automation for most common analytics (including Geospatial analysis and entity resolution)

Seamless Data Access that brings together as many data types as possible

Unmatched Economics combining the speed of in-memory processing with HDP’s cost efficiencies at scale

Ready for the Enterprise with robust security, governance and operations coordinated centrally by Apache Hadoop and YARN

Presenter

Presentation Notes

TALK TRACK Aside from enterprise readiness for operations, security and governance, YARN’s multi-tenancy makes HDP ready to access and process all the data. One of the hottest access modules is Apache Spark, which ships as an integrated component of HDP. Spark at Scale: Powers agile analytics for developers through Data science notebooks and automation for the most common analysis scenarios -- including support for geospatial analysis and entity resolution. With seamless data access, Spark has access to the widest possible array of data sources. The combination of Spark and Hadoop yield unmatched economics for Spark’s in-memory processing using data stored at a low cost in Hadoop, AND Spark at Scale is rock solid at the core with RDD sharing with the HDFS Memory Tier and YARN-based enhancements to Spark operations, governance and security Spark 1.6 Data Science Acceleration: 10x Faster Spark Streaming Seamless Data Access ‐ Dataset API Automatic Memory Tuning [NEXT SLIDE]


Presenter

Presentation Notes

Apache Zeppelin Technical Preview


We Power The Future of Data

1600+ Partners

4000+ members

15,000+ Unique Weekly visitors

A P A C H E H A D O O P C O M M I T T E R S

Thank You

@matsjo66 [email protected]

data in motion - data at rest - hortonworks a modern architecture

Data & Analytics