About Hortonworks Customer Momentum +800 customers
Public company - NASDAQ: HDP
The Leader in Connected Data Platforms Hortonworks DataFlow for data in motion
Hortonworks Data Platform for data at rest
Partner for Customer Success Leader in open-source community, focused
on innovation to meet enterprise needs
Unrivaled support & professional services
Founded in 2011
Original 24 Architects, Developers, Operators of Hadoop from Yahoo!
850+ E M P L O Y E E S
1600+ E C O S Y S T E M P A R T N E R S
Presenter
Presentation Notes
TALK TRACK When you choose a vendor, you choose a platform that will be with you for a long time. As the originators of both the Apache Hadoop and Apache NiFi technologies as well as the Open Enterprise Hadoop category, Hortonworks is uniquely positioned to help you transform your business with actionable intelligence. We are the only publicly traded pure-play Hadoop company Our momentum is accelerating. We’ve added about half of our subscribers in the last two quarters. Those customers come to us for all the reasons I’ve described: our superior technology that evolves at the pace of open innovation, our proven model for partnering with our customers and our dedication to helping them succeed. I’d like to come back for a larger meeting and a use case workshop to start the journey together. Would next week work for you? [NEXT SLIDE] SOURCE: http://hortonworks.com/about-us/quick-facts/
TALK TRACK After all, every business is a data business. Tomorrow’s champions are already mastering the value of data and embracing an open approach Hortonworks understands what these future champions need – we work arm in arm with hundreds of these Open, Innovative Enterprises as they create products, services and intelligence. We support these companies that use our products, but more importantly, we partner with them to realize the future of data. This is a future without car crashes or medical accidents. In this future, trains never derail. There is no more money laundering, no more computer viruses, wasted food or dropped calls. That is the future of data. [NEXT SLIDE]
Blind Spots Block Your Ability to Use All the Data
GROUP 3
GROUP 2 GROUP 4
GROUP 1 INTERNET
OF ANYTHING
Fragmented Data-at-rest increases the cost of insight
Data-in-motion streams through your blind spots
Various Data types and massive Volumes
Presenter
Presentation Notes
TALK TRACK Hortonworks customers want to harness the disruptive power of this data flood. They come to us because they know they have Big Data blind spots. They may have petabytes of data at rest, but it’s fragmented across many groups and storage platforms. It’s expensive and time consuming to locate, move and combine the fragmented data. They know that valuable data in motion is streaming by them, but they have no easy way to capture it or analyze it. At Hortonworks, we help our customers manage both data at rest and in motion. [NEXT SLIDE]
TALK TRACK Hortonworks is powering the future of data. Whether from data at rest or data in motion, we help our customers tap into all the data. We give the world’s leading companies and government agencies actionable intelligence to do things that were never before possible. Actionable intelligence means that you can capture perishable insights in real-time by analyzing data in motion. It means drilling into terabytes or petabytes of data at rest for historical insights. And, in turn, those historical insights help you tune your streaming analytics and data flows. Modern data applications live and breath at the intersect between those Connected Data Platforms and the data they manage. Those are the innovative killer applications that deliver actionable intelligence for data discovery, a single view of the data or predictive analytics. [NEXT SLIDE]
Hortonworks® customers leverage our Connected Data Platforms to transform their industries – renovating their IT architectures and innovating with their Data in Motion or Data at Rest to power actionable intelligence through Modern Data Applications.
TALK TRACK At Hortonworks, we partner with our customers and guide them on their Journey to Actionable Intelligence. You can start your journey anywhere you want. You can renovate your IT architecture to reduce costs and boost functionality. Or you can innovate modern data applications that you use at your own company or sell on the open market. You can start with the most sophisticated use cases if your team is experienced, or you can build your expertise by beginning with less complex use cases that bring quick results. As you build your team’s expertise and comfort with Hortonworks Data Platform and Hortonworks DataFlow, you can then tackle more challenging aspects of your road map. We will help you plan the right path to meet your objectives. Now I’d like to tell you about one Hortonworks customer, Progressive Insurance, and their journey towards eliminating auto accidents. [NEXT SLIDE] DISCUSSION STRATEGY [For Business Prospects] Focus questions What business problems can we help you solve? Which use case would you like to tackle first? What type of challenge is most important: data discovery, building a single view or creating predictive analytics? Calls to action Recommend the Jumpstart package: http://hortonworks.com/services/jumpstart/ Schedule a use case workshop and plan your journey across your most important use cases. Give them an industry-specific White Paper to read. [For IT Prospects] Focus questions Where do you face the most cost pressure to store and process data? Which use case would you like to tackle first: active archive, ETL offload or data enrichment? Calls to Action Recommend that they download Hortonworks Sandbox: http://hortonworks.com/products/hortonworks-sandbox/ Schedule a use case workshop Give them the EDW Optimization White Paper to read: http://hortonworks.com/info/hadoop-and-a-modern-data-architecture/
TALK TRACK I’m about to go over the products, consulting and training that Hortonworks offers, and I want you to keep this image in mind. Remember: TALK TRACK Hortonworks is powering the future of data. Whether from data at rest or data in motion, we help our customers tap into all the data. We give the world’s leading companies and government agencies actionable intelligence to do things that were never before possible. Actionable intelligence means that you can capture perishable insights in real-time by analyzing data in motion. It means drilling into terabytes or petabytes of data at rest for historical insights. And, in turn, those historical insights help you tune your streaming analytics and data flows. Modern data applications live and breath at the intersect between those Connected Data Platforms and the data they manage. Those are the innovative killer applications that deliver actionable intelligence for data discovery, a single view of the data or predictive analytics. Here are just a few of the modern data apps that convert yesterday’s impossible challenges into today’s new products, cures, conveniences and life saving innovations. These apps are either custom-built by our customers or they come of the shelf, created by Hortonworks or one of of our ecosystem partners to solve a particular problem. Symantec and other cyber security leaders have built powerful apps to detect threats to digital information. Leading pharma, automotive, consumer electronics and packaged goods companies are building their factories of the future that use actionable intelligence to improve manufacturing yields. And age-old industries like automotive, agriculture and retail are taking connected data platforms on the road, through the field or to the cash register to do things that have never before been possible. Capturing perishable �insights from data in motion Ensuring rich, historical insights�on data at rest Necessary for modern�data applications Actionable intelligence means that you can capture perishable insights in real-time by analyzing data in motion. It means drilling into terabytes or petabytes of data at rest for historical insights. And, in turn, those historical insights help you tune your streaming analytics and data flows. Modern data applications live and breath at the intersect between those Connected Data Platforms and the data they manage. Those are the innovative killer applications that deliver actionable intelligence for data discovery, a single view of the data or predictive analytics. The Internet of Anything is doubling the amount of data in the world every 2 years. Connected Data Platforms deliver an open-architected solution to manage data, both in motion and at rest, empowering your organization to gain Actionable Intelligence delivered to your end users through Modern Data Apps. Hortonworks DataFlow (aka HDF) manages your data in motion—bringing it to where you need it for real-time analysis to capture perishable insights or into storage for historical analysis. Hortonworks Data Platform (aka HDP) stores the data at rest and provides historical insights through deep, detailed analysis of everything that’s already happened. Those historical insights from HDP help optimize your data ingest with HDF, which in turn optimizes your data at rest. This is how HDF, HDP, and Modern Data Applications deliver actionable intelligence to your end users. And Actionable Intelligence is the beating heart animating the Future of Data. [NEXT SLIDE]
Reality of Data in Motion: Complex, Chaotic, Messy
Store Data
Process and Analyze Data
Acquire Data
Store Data Store Data
Store Data
Store Data
Acquire Data
Acquire Data
Acquire Data
Dataflow
Presenter
Presentation Notes
In reality, dataflows move all over. Data is moved and stored in multiple places – sometimes interim, sometimes longterm. Data is procesed in different places, and then moved again. Complicated, convoluted, messy.
• Traceability (Data Lineage) • Prioritization of Resources
• Multi-Directional Flow
• Recoverability and Replay • Transparency of DataFlow • Scaling Down
• Enrichment/Transformation
• Unreliable Comms
GATHER
DELIVER
PRIORITIZE
Track from the edge Through the datacenter
Presenter
Presentation Notes
Typical Answer to Challenges Add Systems…. Add new systems to handle the protocol differences Add new systems to convert the data Add new systems to reorder the data Add new systems to filter the unauthorized data Add new system to slow down or speed up data Add new topics to represent ‘stages of the flow’ And Complexity….
Complicated, messy, and takes weeks to months to move the right data into Hadoop
HDP HORTONWORKS DATA PLATFORM
Streamlined, Efficient, Easy
Presenter
Presentation Notes
Talk Track: Easy button for data ingest with real-time, interactive visual control of dataflows “Data logistics” - like Fedex or UPS for transport and logistics of goods, but for data Accelerates ROI of big data by removing manual labor time and costs of data collection and transport by removing the need for any kind of coding - MONTHS to MINUTES Move data into HDP in 7 minutes or less (refers to this video), also Click Demo Maximize value of HDP/CDW/MAPR/Storm/Spark by making it easy to get data into it
Hortonworks DataFlow for Data in Motion Powered by Apache NiFi
Presenter
Presentation Notes
TALK TRACK As part of the Hortonworks Connected Data Platforms, Hortonworks DataFlow manages data in motion. HDF is powered by the 100% open source Apache NiFi project which has its origins at the United States NSA. For managing data in motion, HDF is: real-time, integrated, secure and adaptive. [NEXT SLIDE]
Add and Adjust Data Sources to maximize the opportunity that you capture from perishable insights
Visually Trace the Data Path to manage the what, who, where and how around data in motion
Dynamically Adjust the Pipeline to match the dataflow with your bandwidth
H O R T O N W O R K S D ATA F L O W Add and adjust
data sources
Visually trace the data path
Dynamically adjust the pipeline
Presenter
Presentation Notes
TALK TRACK First real-time. Hortonworks Dataflow provides real time, interactive control of live data flows. This accelerates value from your big data solution, increases ROI, and allows you to capture insights that may be perishable. Either you act now or you forever lose the opportunity. With HDF, you have the ability to add and adjust data sources and also manage the connection between the sources and destination. You can visually trace the data path to determine if the data is valuable, trustworthy, and usable. And if your data needs change you can dynamically adjust your data flow pipeline by changing which information to collect from the source or the priorities of data within the flow. That way you only collect what you need when you need it. [NEXT SLIDE]
The Apache NIFi component of HDF is a data logistics platform that connects any data source to any destination, and provides a universal translation system so to speak - to allow different systems to connect with each other, transforming and delivering previously incompatible data formats and protocols into usable, and easily ingested data for analysis. Through a rdag Apache NiFi has been undergoing tremendous growth and community involvement. There are now 66 contributors and over 130+ processors – an increase of 30% since HDF was first released in Sep 2015. In 6 months time, the community has been growing to expand a wide variety of needs – for different data sources, different transformations, file formats and types. New processors added since Sept include processors supporting: Elastic Search, Splunk, Couchbase, Microsoft Event Hubs, Amazon S3 Details: For instance, one social media commentary stated “Apache NiFi is a relatively new data processing system with a plethora of general-purpose processors and a point and click interface. “ You are going to love it!” In comparison, Since Dec 2014 Streamsets has received code contributions from 11 people whereas Apache NiFi has received contributions from more than 66 contributors.
Drag and drop processors to build a flow Start, stop, and configure components in real time View errors and corresponding error messages View statistics and health of data flow Create templates of common processor & connections Powerful and reliable system to process and distribute data Directed graphs of data routing and transformation Web-based User Interface for creating, monitoring, & controlling data flows Highly configurable - modify data flow at runtime, dynamically prioritize data Data Provenance tracks data through entire system Easily extensible through development of custom components
Optimize Your Architecture • Reduce cost and complexity with the
most efficient data collection technologies
Assure Efficient Operations • Via real-time control of data inputs,
outputs, transportation and transformations
Rely on a Common Foundation • Eliminating dependence on multiple
customized systems
C O M M O N A R C H I T E C T U R E
W I T H O U T H O R T O N W O R K S D A T A F L O W
W I T H H O R T O N W O R K S D A T A F L O W
Ingest
Scripts
Messaging
Scripts
HORTONWORKS DATAFLOW
Presenter
Presentation Notes
TALK TRACK The second important characteristic of HDF is that it is integrated. That integration helps you move from slow, laborious ingest processes involving multiple engines and scripts to one seamless, efficient, bi-directional data ingest engine. Hortonworks DataFlow optimizes your architecture for faster, easier, data movement and control. HDF assures efficient operations to reduce costs and optimize your use of precious human resources. And it allows you to rely on a common foundation for all systems to interact with and a single system for users to learn, maintain and manage. [NEXT SLIDE]
End-to-End Security Apply security rules to encrypt, decrypt, filter and replace data from the point of collection at the jagged edge to its final destination
Granular Control and Sharing Move beyond role-based access and dynamically share an entire dataflow
Real-Time Traceability Rich metadata and contextual detail helps troubleshoot security issues and informs timely decisions
Presenter
Presentation Notes
TALK TRACK Of course, you also need to know that your data is secure. Can you trust it? How do you ensure that the right people see the right data, when they need it? Hortonworks DataFlow comes with end-to-end security to encrypt, decrypt, filter and replace data from origin at the jagged edge all the way through to its final destination. HDF allows granular control and sharing to share appropriate bits of data, with the right parties, without creating inadvertent risks. And it gives you to real-time traceability and provenance for the data, to see the data chain of custody. This gives you useful information on what should be discarded, or not even collected in the future. [NEXT SLIDE]
Automated Bi-directional communication between source and destination adapts data flows automatically, according to current priorities
On-Demand Operational control to adapt to changing conditions and requirements
Scalable By incorporating data from any device—small machine sensors to enterprise data centers—HDF connects you to the broadest set of disparate data sources
Presenter
Presentation Notes
TALK TRACK And finally, you need a data flow solution that can adapt to a broad range of demands. The original architects of Apache NiFi now work at Hortonworks. From the very beginning at the NSA, these architects designed it to meet the changing needs of an evolving data landscape. Hortonworks Dataflow can adapt automatically. If the connection is poor, HDF automatically prioritizes data and skinnies down what data is sent, and saves the rest for later. If the connection is good, it will automatically readjust and send on everything it has been holding on to. Hortonworks Dataflow can also adapt on demand. With its visual real time interface, operators can manually start, stop, reroute, change, or adjust a data source. That change takes effect immediately. And HDF is scalable. Data can come from large scale enterprise servers or small JVM sensors. Big data or small data, HDF scales to support diverse quantities or types of data. [NEXT SLIDE]
Hortonworks Data Platform for Data at Rest Powered by Open Enterprise Hadoop
Open
Interoperable
Ready
Central
Presenter
Presentation Notes
TALK TRACK Now let’s move to the other side of Hortonworks Connected Data Platforms. Our company was founded on Hortonworks Data Platform’s unique ability to manage Big Data at rest. This is Open Enterprise Hadoop, a platform that is: 100% Open Source Centrally architected with YARN at its core Interoperable with existing technology and skills, AND Enterprise-ready, with data services for operations, governance and security [NEXT SLIDE]
Shared Big Data Platform across applications, business groups, functions and users
Centralized management and monitoring of Hadoop clusters
Automated Provisioning either on-premises or in the cloud with the Cloudbreak API for clusters in minutes
Managed Services for high availability and consistent lifecycle controls, with dashboards and alerts
Presenter
Presentation Notes
TALK TRACK YARN is the architectural center of Open Enterprise Hadoop. It: Coordinates cluster-wide service for operations, data governance and security. It allocates resource amongst diverse applications that process the data It maximizes your data ingest, by helping ingest all types of data And it allows you to confidently extend your big data assets to the largest possible audience within your organization Open Enterprise Hadoop provides consistent operations, with: Centralized management and monitoring of clusters through a single pane of glass Automated provisioning, either on-premises or in the cloud with the Cloudbreak API. You can mange one huge data lake, or spin up and spin down multiple clusters as needed. You choose. And also, Managed services to make sure that your cluster is highly available. [NEXT SLIDE] SUPPORTING DETAIL Consistent Operations Why this matters to our customers: From the launch of your first cluster, to the changing access patterns as more users come online, through expansion to contain the growing amounts of data—your operations team needs to keep Hadoop working to meet your business objectives. Proof point: Hortonworks Data Platform includes Apache Ambari, Cloudbreak and Hortonworks SmartSense—a complete set of simplified tools to make the most of your investment in Open Enterprise Hadoop. Citation: “At The Mobile Majority, we have been using Hortonworks Data Platform to optimize ad performance on behalf of our customers. We’re excited to look into Hortonworks SmartSense as a way to continuously optimize our HDP cluster as it grows over time,” said Cheolho Minale, vice president of technology. | http://hortonworks.com/blog/introducing-availability-of-hdp-2-3-part-3/
Data Management along the entire data lifecycle with integrated provenance and lineage capability
Modeling with Metadata enables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilities
Interoperable Solutions across the Hadoop ecosystem, through a common metadata store
Presenter
Presentation Notes
TALK TRACK Open Enterprise Hadoop enables trusted governance, with: Data lifecycle management along the entire lifecycle Modeling with metadata, and Interoperable solutions that can access a common metadata store. [NEXT SLIDE] SUPPORTING DETAIL Trusted Governance Why this matters to our customers: As data accumulates in an HDP cluster, the enterprise needs governance policies to control how that data is ingested, transformed and eventually retired. This keeps those Big Data assets from turning into big liabilities that you can’t control. Proof point: HDP includes 100% open source Apache Atlas and Apache Falcon for centralized data governance coordinated by YARN. These data governance engines provide those mature data management and metadata modeling capabilities, and they are constantly strengthened by members of the Data Governance Initiative. The Data Governance Initiative (DGI) is working to develop an extensible foundation that addresses enterprise requirements for comprehensive data governance. The DGI coalition includes Hortonworks partner SAS and customers Merck, Target, Aetna and Schlumberger. Together, we assure that Hadoop: Snaps into existing frameworks to openly exchange metadata Addresses enterprise data governance requirements within its own stack of technologies Citation: “As customers are moving Hadoop into corporate data and processing environments, metadata and data governance are much needed capabilities. SAS participation in this initiative strengthens the integration of SAS data management, analytics and visualization into the HDP environment and more broadly it helps advance the Apache Hadoop project. This additional integration will give customers better ability to manage big data governance within the Hadoop framework,” said SAS Vice President of Product Management Randy Guard.” | http://hortonworks.com/press-releases/hortonworks-establishes-data-governance-initiative/
Comprehensive Security through a platform approach
Fine-Grained, Flexible Authorization controlling access based on roles or data tags
Encrypt Data at rest and in motion
Centralized Administration of security policies and user authentication
Presenter
Presentation Notes
TALK TRACK And of course, enterprise-readiness means comprehensive security through a platform approach. This includes: Encryption of data at rest and in motion Centralized administration of polices for authentication, and Fine-grain authorization to control data access. These are the four pillars of the emerging solution category known as Open Enterprise Hadoop. But why is Hortonworks uniquely positioned to lead this category? HDF offer secure data flows from the point of origin, through to destination. However, security is much more than just securing the data itself. There are layers of security involved- it is important to be able to decide in real time if a person or a system is allowed in real-time to access a specific piece of data within a dataflow – represented by the pieces of the pie within the hexagon, along with the ability to trace the data chain of custody (provenance) from source to destination. Beyond that, to make security more seamless - HDF 1.2 supports centralized Kerberos authentication capability. Comprehensive Security Why this matters to our customers: Data is valuable, and like any other valuable asset, it must be secure from corruption or theft. Enterprises need easy, centralized tools for protecting their Big Data across their entire ecosystems. Proof point: HDP provides comprehensive security through a platform approach with Apache Ranger as a single pane of glass for security policy administration and Apache Knox protects the perimeter. HDP can encrypt data at rest or in motion. These integrated components let security administrators administer policies, authenticate and authorize users, and protect the data from misuse. Citation: “Chris Twogood, Teradata vice president of products and services, said in an interview with CRN that "security is obviously a very important component" of big data systems, and he praised Hortonworks for pushing security improvements in its software, especially the new encryption and authorization capabilities.” | �Source: http://www.crn.com/news/applications-os/300077188/new-hortonworks-hadoop-release-offers-bulked-up-security-enhanced-data-governance-capabilities.htm
E Pluggable Architecture supports Apache Hive, Pivotal HAWQ and other leading SQL engines
Familiar SQL Query Semantics enable transactions and SQL:2011 Analytics for rich reporting
Unprecedented Speed at Extreme Scale returns query results in interactive time, even as data sets grow to petabytes
Presenter
Presentation Notes
TALK TRACK While Spark at Scale is new and hot, SQL is still the lingua franca of data analysis. HDP includes a tool for those millions of analysts: Apache Hive on YARN. It provides: A pluggable architecture for Hive and other SQL engines like Pivotal HAWQ, Familiar semantics, that enable transactions and rich reporting via SQL:2011 Analytics, AND Unprecedented speed at extreme scale, returning query results in interactive time—even as the data set grows towards petabyte size. [NEXT SLIDE]
Agile Analytics with Enterprise Spark 1.6 at Scale S PA R K O N YA R N
OPERATIONS SECURITY
GOVERNANCE
STORAG
E STO
RAG
E
Powering Agile Analytics via Zeppelin data science notebooks and automation for most common analytics (including Geospatial analysis and entity resolution)
Seamless Data Access that brings together as many data types as possible
Unmatched Economics combining the speed of in-memory processing with HDP’s cost efficiencies at scale
Ready for the Enterprise with robust security, governance and operations coordinated centrally by Apache Hadoop and YARN
Presenter
Presentation Notes
TALK TRACK Aside from enterprise readiness for operations, security and governance, YARN’s multi-tenancy makes HDP ready to access and process all the data. One of the hottest access modules is Apache Spark, which ships as an integrated component of HDP. Spark at Scale: Powers agile analytics for developers through Data science notebooks and automation for the most common analysis scenarios -- including support for geospatial analysis and entity resolution. With seamless data access, Spark has access to the widest possible array of data sources. The combination of Spark and Hadoop yield unmatched economics for Spark’s in-memory processing using data stored at a low cost in Hadoop, AND Spark at Scale is rock solid at the core with RDD sharing with the HDFS Memory Tier and YARN-based enhancements to Spark operations, governance and security Spark 1.6 Data Science Acceleration: 10x Faster Spark Streaming Seamless Data Access ‐ Dataset API Automatic Memory Tuning [NEXT SLIDE]