tibco statistica big data analytics...big-data analytics to spark servers. datasheet | 2 a large...

3
TIBCO Statistica Big Data Analytics With version 13, TIBCO Statistica (currently in release 13.3) continues comprehensive support for popular open-source capabilities and tooling for handling unstructured data, big data, and in-memory, and in-database analytics. To be specific, Statistica supports: APACHE ® HADOOP ® /HDFS Statistica provides direct file import from HDFS, and data aggregation workflows can be made into a reusable template with Statistica Enterprise for repeatable analytic and data prep workflows. Statistica can also retrieve data from Apache Hive ® and Hive-on-Spark ® using Statistica Server and Statistica Monitoring and Alerting Server (MAS). STATISTICA BIG DATA PROCESSING ARCHITECTURE Statistica architecture for big data processing and in-memory computational frameworks is summarized in this graphic: Statistica Analytics User Server / Applications Tier Analytic Engines H20 Sparkling Water SparkML Data Virtualization Data Sources Statistica Server Historian Hadoop RDBMS IoT Cloud Flat Files Statistica Big Data Processing Architecture Statistica supports multiple workspace processing nodes that define computational pipelines for H20 (Sparkling Water) and/or Apache Spark . In short, analytic workflows deployed to Spark can be executed in Statistica using data managed in Spark as resilient distributed datasets (RDDs). Results can then be retrieved by Spark and published again to Statistica (either TIBCO Statistica Server or client). This workflow has the advantage of enabling version control of analytic and data prep workflows and steps executed in Statistica, while deploying big-data analytics to Spark servers.

Upload: others

Post on 15-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TIBCO Statistica Big Data Analytics...big-data analytics to Spark servers. DATASHEET | 2 A large number of analytics are supported, both for H20 and Spark ML, as well as for deep learning

TIBCO Statistica Big Data Analytics

With version 13, TIBCO Statistica™ (currently in release 13.3) continues comprehensive support for popular open-source capabilities and tooling for handling unstructured data, big data, and in-memory, and in-database analytics. To be specific, Statistica supports:

APACHE® HADOOP®/HDFS Statistica provides direct file import from HDFS, and data aggregation workflows can be made into a reusable template with Statistica Enterprise for repeatable analytic and data prep workflows. Statistica can also retrieve data from Apache Hive® and Hive-on-Spark® using Statistica Server and Statistica Monitoring and Alerting Server (MAS).

STATISTICA BIG DATA PROCESSING ARCHITECTUREStatistica architecture for big data processing and in-memory computational frameworks is summarized in this graphic:

Statistica Analytics User

Server / Applications Tier

Analytic Engines

H20SparklingWater

SparkML Data Virtualization

Data Sources

Statistica Server

Historian Hadoop RDBMS IoT Cloud Flat Files

Statistica Big Data Processing Architecture

Statistica supports multiple workspace processing nodes that define computational pipelines for H20 (Sparkling Water) and/or Apache Spark™. In short, analytic workflows deployed to Spark can be executed in Statistica using data managed in Spark as resilient distributed datasets (RDDs). Results can then be retrieved by Spark and published again to Statistica (either TIBCO Statistica™ Server or client). This workflow has the advantage of enabling version control of analytic and data prep workflows and steps executed in Statistica, while deploying big-data analytics to Spark servers.

Page 2: TIBCO Statistica Big Data Analytics...big-data analytics to Spark servers. DATASHEET | 2 A large number of analytics are supported, both for H20 and Spark ML, as well as for deep learning

DATASHEET | 2

A large number of analytics are supported, both for H20 and Spark ML, as well as for deep learning.

Spark and H20 Nodes

Statistica supports the common interfaces and scripting languages that are used by data scientists today, including Python, R, Scala, and C#. Custom nodes and programs can also be scripted in these languages, so the Statistica installation can be configured or customized with proprietary analytics pipelines.

Data Science Workflow with Open Source Code Node

Page 3: TIBCO Statistica Big Data Analytics...big-data analytics to Spark servers. DATASHEET | 2 A large number of analytics are supported, both for H20 and Spark ML, as well as for deep learning

DATASHEET | 3

Global Headquarters 3307 Hillview Avenue Palo Alto, CA 94304+1 650-846-1000 TEL +1 800-420-8450 +1 650-846-1005 FAXwww.tibco.com

TIBCO fuels digital business by enabling better decisions and faster, smarter actions through the TIBCO Connected Intelligence Cloud. From APIs and systems to devices and people, we interconnect everything, capture data in real time wherever it is, and augment the intelligence of your business through analytical insights. Thousands of customers around the globe rely on us to build compelling experiences, energize operations, and propel innovation. Learn how TIBCO makes digital smarter at www.tibco.com.©2017, TIBCO Software Inc. All rights reserved. TIBCO, the TIBCO logo, and Statistica are trademarks or registered trademarks of TIBCO Software Inc. or its subsidiaries in the United States and/or other countries. Apache, Hadoop, Hive, Hive-on-Spark, and Spark are trademarks of The Apache Software Foundation in the United States and/or other countries. All other product and company names and marks in this document are the property of their respective owners and mentioned for identification purposes only.

11/30/17

Two Data Science Workflows (top) Used as Reusable Template (bottom)

In addition, Statistica provides point-and-click interfaces and implementations to the most effective machine learning algorithms, making it the ideal platform for managing advanced analytics throughout the enterprise. Analytics can be performed on Statistica or Spark servers using open-source tools or Statistica algorithms. Analytic workflows can be validated and managed using best practices for version control, electronic signatures, and audit trails, and analytic and data prep templates can be deployed to groups of users in a role-based environment. Results can be infused into business processes and/or delivered to self-service visualization tools and dashboards for interactive exploration and drill-down analytics.