dell emc streaming data platform installation and ... · build the infrastructure ... monitor...

174
Dell EMC Streaming Data Platform Version 1.0 Installation and Administration Guide April 2020

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Dell EMC Streaming Data PlatformVersion 1.0

Installation and Administration GuideApril 2020

Page 2: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Copyright © 2020 Dell Inc. or its subsidiaries. All rights reserved.

Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.” DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND

WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF

MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED

IN THIS PUBLICATION REQUIRES AN APPLICABLE SOFTWARE LICENSE.

Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property

of their respective owners. Published in the USA.

Dell EMCHopkinton, Massachusetts 01748-91031-508-435-1000 In North America 1-866-464-7381www.DellEMC.com

2 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 3: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

7

9

Introduction to Dell EMC Streaming Data Platform 11Product summary.............................................................................................. 12Product components......................................................................................... 12

About Pravega...................................................................................... 13About Apache Flink

®............................................................................. 14

Management plane............................................................................... 14Kubernetes cluster............................................................................... 15

Supporting infrastructure components..............................................................15Features ........................................................................................................... 16

Product highlights................................................................................ 16Security Features................................................................................. 18Additional features................................................................................19

Interfaces.......................................................................................................... 21Basic terminology................................................................................. 21User Interface...................................................................................... 22Grafana dashboards............................................................................. 23Apache Flink Web UI............................................................................ 24APIs..................................................................................................... 24

What you get with Streaming Data Platform.................................................... 25Use case examples............................................................................................25Documentation list............................................................................................26

Site Prerequisites 27Installation prerequisites................................................................................... 28Prerequisite infrastructure................................................................................28

Reference architecture........................................................................ 28Planning............................................................................................... 30Build the infrastructure........................................................................ 30

Compatibility and minimum version requirements ............................................. 31Obtain and save the license file......................................................................... 31Deploy Dell EMC SRS Gateway.........................................................................32Set up local DNS server.................................................................................... 32Configure Isilon storage.................................................................................... 33

Installation 35About configuration values files........................................................................ 36Prepare configuration values files..................................................................... 36TLS configuration details.................................................................................. 43

Enable or disable TLS...........................................................................43Configure TLS using certificates from Let's Encrypt ...........................43Configure TLS using a private certificate............................................. 44Configure TLS using signed certificates from a well-known CA .......... 45

Prepare the working environment.....................................................................46

Figures

Tables

Chapter 1

Chapter 2

Chapter 3

CONTENTS

Dell EMC Streaming Data Platform Installation and Administration Guide 3

Page 4: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Set up UAA and required UAA admin account...................................................48Required tasks for UAA federation....................................................................49Create the Kubernetes cluster.......................................................................... 50Set up a docker registry....................................................................................52Obtain installation files......................................................................................53Extract the installer tool................................................................................... 53Push images into the registry............................................................................54Remove SRS Gateway from the manifest (if needed).......................................54Run prereqs script ........................................................................................... 55Run pre-install script.........................................................................................56Run validate-values script.................................................................................56Source control the configuration values files.................................................... 57Apply and synchronize the configuration.......................................................... 57Run post-install script.......................................................................................59What's next...................................................................................................... 59

Connections 61Configure connections for users ...................................................................... 62

Configure connections to master node................................................ 62Configure connections to the local DNS.............................................. 63Alternative endpoint configuration for non-production deployments ...63

Web UI endpoints and logins............................................................................. 64Obtain connection URLs ..................................................................... 64Connect and login to web UI ............................................................... 65

kubectl logins....................................................................................................66Log in to kubectl for cluster-admins.....................................................66Log in to kubectl for non-admin users.................................................. 67

User password changes.................................................................................... 68Change password (UAA)......................................................................68Change password (Keycloak)...............................................................68

Post-install Configuration and Maintenance 71Obtain default admin credentials.......................................................................72Set up UAA federation...................................................................................... 73Set up LDAP integration....................................................................................74Enable periodic telemetry upload to SRS ......................................................... 75Add Pravega alerts to event collection ............................................................. 76Change applied configuration............................................................................78Uninstall applications........................................................................................ 79Upgrade software..............................................................................................81Update the default password for SRS remote access ...................................... 83

Manage Projects, Scopes, and Users 85Naming requirements........................................................................................86Manage projects............................................................................................... 86

About projects and clusters................................................................. 86Create a project .................................................................................. 87Create a project manually.................................................................... 88Add or remove project members ......................................................... 90List projects and view project contents................................................90Create Flink cluster ............................................................................. 92Edit Flink cluster attributes.................................................................. 93Delete a cluster.................................................................................... 94What's next with projects.................................................................... 94

Chapter 4

Chapter 5

Chapter 6

Contents

4 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 5: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Manage scopes and streams.............................................................................95About scopes and streams................................................................... 95Stream access rules.............................................................................96Create a scope independent of projects...............................................96Add or remove scope members............................................................96Create a stream independent of projects............................................. 97Stream configuration attributes........................................................... 97Start and stop stream ingestion........................................................... 99Monitor stream ingestion..................................................................... 99

Manage users.................................................................................................. 100Summary of ways to add a new user ..................................................100Assign roles ....................................................................................... 103Assign admin role................................................................................106

Monitor Health 107Monitor licensing............................................................................................. 108Monitor or change SRS registration status......................................................109Monitor and manage events............................................................................. 110Run health-check..............................................................................................111Monitor Pravega health....................................................................................112Monitor stream health......................................................................................112Monitor application health................................................................................112Logging............................................................................................................ 112

Use Grafana Dashboards 113Grafana dashboards overview.......................................................................... 114Connect to the Grafana UI............................................................................... 114Retention policy and time range ...................................................................... 116Pravega System dashboard.............................................................................. 117Pravega Operation Dashboard..........................................................................119Pravega Scope dashboard............................................................................... 124Pravega Stream dashboard............................................................................. 125Pravega Alerts dashboard................................................................................129Custom queries and dashboards ..................................................................... 129InfluxDB Data ................................................................................................. 130

Kubernetes Resources 133Namespaces.................................................................................................... 134Components in the nautilus-system namespace.............................................. 134Components in the nautilus-pravega namespace.............................................135Components in project namespaces................................................................ 135

Authentication and authorization 137Authentication overview.................................................................................. 138

About UAA federation.........................................................................138About Keycloak...................................................................................139

Authorization ...................................................................................................141Summary of roles................................................................................ 141

Application authentication and authorization................................................... 142Additional Keycloak services ........................................................................... 142

Configuration values file reference 145

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Appendix A

Contents

Dell EMC Streaming Data Platform Installation and Administration Guide 5

Page 6: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Overview ........................................................................................................ 146Template of configuration values file............................................................... 146

Summary of scripts 153Summary of scripts......................................................................................... 154

Troubleshooting 157Log files...........................................................................................................158Useful troubleshooting commands...................................................................158

bosh commands.................................................................................158pks commands...................................................................................159helm commands.................................................................................160kubectl commands.......................................................................... 160

FAQs................................................................................................................ 161Application connections when TLS is enabled................................................. 165Online and remote support.............................................................................. 165

Installer command reference 167Overview......................................................................................................... 168decks-install apply........................................................................................... 168decks-install config set.................................................................................... 170decks-install push............................................................................................ 170decks-install sync............................................................................................. 171decks-install unapply....................................................................................... 172

Appendix B

Appendix C

Appendix D

Contents

6 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 7: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Streaming Data Platform software components................................................................ 13Streaming Data Platform and supporting infrastructure.................................................... 16Initial UI after login............................................................................................................23Apache Flink Web UI ........................................................................................................ 24Reference architecture .................................................................................................... 29Licensing information in the UI........................................................................................ 108SRS Gateway information in the UI................................................................................. 109Events list in the UI.......................................................................................................... 110Time range on Grafana dashboards.................................................................................. 116UAA federation................................................................................................................ 139

12345678910

FIGURES

Dell EMC Streaming Data Platform Installation and Administration Guide 7

Page 8: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Figures

8 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 9: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

List of available interfaces................................................................................................. 21Streaming Data Platform documentation.......................................................................... 26Reference architecture descriptions for v1.0.................................................................... 29Configure external connection and TLS............................................................................ 37Configure NFS storage..................................................................................................... 39Configure monitoring and licensing................................................................................... 39Configure SRS.................................................................................................................. 40Configure Keycloak to UAA connection ............................................................................ 41Provide initial password values..........................................................................................42Streaming Data Platform Rolenames................................................................................141PKS and UAA rolenames.................................................................................................. 141

1234567891011

TABLES

Dell EMC Streaming Data Platform Installation and Administration Guide 9

Page 10: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Tables

10 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 11: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 1

Introduction to Dell EMC Streaming DataPlatform

This chapter describes the components and features of Dell EMC Streaming Data Platform.

l Product summary...................................................................................................................12l Product components..............................................................................................................12l Supporting infrastructure components.................................................................................. 15l Features ................................................................................................................................16l Interfaces...............................................................................................................................21l What you get with Streaming Data Platform......................................................................... 25l Use case examples................................................................................................................ 25l Documentation list................................................................................................................ 26

Dell EMC Streaming Data Platform Installation and Administration Guide 11

Page 12: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Product summaryDell EMC Streaming Data Platform is an auto-scaling platform for ingesting, storing, andprocessing continuously streaming unbounded data in real time. The platform can process bothreal-time and collected historical data in the same application.

Streaming Data Platform ingests and stores streaming data such as Internet of Things (IoT)devices, web logs, industrial automation, financial data, live video, social media feeds, andapplications, as well as event-based streams. It can process multiple data streams from multiplesources while ensuring low latencies and high availability.

The platform manages stream ingestion and storage, and it hosts the analytic applications thatprocess the streams. It dynamically distributes processing related to data throughput andanalytical jobs over the available infrastructure. It also dynamically auto-scales storage resourcesto handle requirements in real time as the streaming workload changes.

Streaming Data Platform supports the concept of projects and project isolation, or multi-tenancy.Multiple teams of developers and analysts each have their own working environments where theirapplications and streams are protected from access by others outside of their team.

Streaming Data Platform integrates the following capabilities into one user-friendly softwareplatform:

l Stream ingestion—The platform is an auto-scaling ingesting engine. It ingests all types ofstreaming data, including unbounded byte streams and event-based data in real-time.

l Stream storage—Elastic tiered storage provides instant access to real-time data, access tohistorical data, and infinite storage.

l Stream processing—Real-time stream processing is possible with an embedded analyticsengine. The applications that you develop and run on Streaming Data Platform can processreal-time and historical data, create and store new streams, send notifications to enterprisealerting tools, and send output to third party visualization tools.

l Platform management—Integrated management provides data security, configuration, accesscontrol, resource management, easy upgrade process, stream metrics collection, and healthand monitoring features.

l Run-time management—A web-based User Interface (UI) lets authorized users configurestream properties, view stream metrics, run applications, view job status, and monitor systemhealth.

l Application development—APIs are included in the distribution. The web UI supportsapplication deployment and artifact storage.

This platform provides the ability to ingest and store continuously streaming data and process thatdata in real time, in addition to or in combination with processing of historical data in the samestream.

Product componentsStreaming Data Platform is a software-only platform consisting of three integrated components,plus supporting APIs and Kubernetes Custom Resource Definitions (CRDs). This product runs in aKubernetes environment.

Pravega

Pravega is the stream store in Streaming Data Platform. It handles ingestion and storage forcontinuously streaming unbounded byte streams. Pravega is an Open Source Softwareproject, sponsored by Dell EMC.

Introduction to Dell EMC Streaming Data Platform

12 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 13: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Apache Flink®

Apache Flink is the embedded stream processing engine in Streaming Data Platform. Wedistribute Docker images from the Apache Flink Open Source Software project.

Management platform

The management platform is Dell EMC proprietary software. It integrates the othercomponents and adds security, performance, configuration, and monitoring features. Itincludes a web-based user interface for administrators, application developers, and dataanalysts.

Additional analytic engines

In addition to processing streaming data with the embedded Apache Flink engine, you may developPravega software connectors or use community connectors that integrate with other engines.

Note: Community contributions are not supported by Dell EMC.

Figure 1 Streaming Data Platform software components

About PravegaThe Open Source Pravega project was created specifically to support streaming applications thathandle large amounts of continuously arriving data.

In Pravega, the stream is a core primitive. Pravega ingests unbounded streaming data in real timeand coordinates permanent storage.

Pravega user applications are known as Writers and Readers. Pravega Writers are applicationsusing the Pravega API to collect streaming data, create the ingress streams, and connect toStreaming Data Platform. Streaming Data Platform ingests and stores the streams. PravegaReaders read data from the Pravega store.

Pravega streams are based on an append-only log data structure. By using append-only logs,Pravega rapidly ingests data into durable storage. Pravega handles all types of streams, including:

l Unbounded or bounded streams of data.

l Streams containing discrete events or a continuous stream of bytes.

l Sensor data, server logs, video streams or any other type of information.

Pravega seamlessly coordinates a 2-tiered storage system for each stream. Tier 1 stores therecently ingested tail of a stream temporarily. Tier 2 provides long-term storage. Streams can beconfigured with specific data retention periods.

An application, such as a Java program reading from an IoT sensor, writes data to the tail of thestream. Apache Flink applications can read from any point in the stream. Multiple applications canread and write the same stream in parallel. Elasticity, scalability, support for large volumes ofstreaming data, preserved ordering, and exactly-once semantics are the highlights of Pravega'sdesign.

Introduction to Dell EMC Streaming Data Platform

Dell EMC Streaming Data Platform Installation and Administration Guide 13

Page 14: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Applications can access data in real time or past time in a uniform fashion. The same paradigm (thesame API call) accesses both real-time and historical data stored in Pravega. Applications can alsowait for data associated with any arbitrary time in the future.

Specialized software connectors provide access to Pravega. For example, a Flink connectorprovides Pravega data to Flink jobs. Because Pravega is an Open Source project, it can potentiallyconnect to any analytics engine with community contributions providing connectors.

Pravega is unique in its ability to handle unbounded streaming bytes. It is a high-throughput, auto-scaling real-time store that preserves key-based ordering of continuously streaming data andguarantees exactly-once semantics. It infinitely tiers ingested data into Tier 2 storage.

For more information about Pravega, see http://www.pravega.io.

About Apache Flink®

Apache Flink is a high throughput, stateful processing engine with precise control of time andstate. It is an emerging market leader for processing stream data.

Apache Flink provides a framework and distributed processing engine for stateful computationsover unbounded and bounded data streams. It performs computations at in-memory speed and atany scale. Preserving the order of data during processing is guaranteed.

The Flink engine accommodates many types of stream processing models, including:

l Continuous data pipelines for real-time analysis of unbounded streams

l Batch processing

l Publisher/subscriber pipelines

The Streaming Data Platform distribution includes Apache Flink APIs that can process continuousstreaming data, sets of historical batch data, or combinations of both.

For more information about Apache Flink, see https://flink.apache.org/.

Management planeThe Streaming Data Platform management plane coordinates the interoperating functions of theother components.

The management plane deploys and manages components in the Kubernetes environment,provides configuration and monitoring support, coordinates security, authentication, andauthorization support, and coordinates Flink application execution. The web-based UI provides acommon interface for developers to upload and update application images, for project members tomanage streams and Flink jobs, and for administrators to manage resources and user access.

Some of the features of the management plane are:

l Integrated data security, including TLS encryption, multi-level authentication and role-basedaccess control (RBAC).

l Project-based isolation for team members and their respective streams and applications.

l Flink cluster management.

l Pravega streams management.

l DevOps oriented platform supports modern software development and delivery practices.

l Kubernetes container environment, with all platform and user applications running in theKubernetes cluster.

l Application monitoring and direct access to the Apache Flink Web UI.

Introduction to Dell EMC Streaming Data Platform

14 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 15: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l For administrators, direct access to predefined Grafana dashboards for monitoring systemresources and Pravega streams.

Kubernetes clusterStreaming Data Platform is a set of Kubernetes applications running in the same Kubernetescluster.

Streaming Data Platform runs in Kubernetes and therefore gains all the advantages of Kubernetes.

Supporting infrastructure componentsThe supporting infrastructure for Streaming Data Platform supplies storage and computeresources and the network infrastructure.

Streaming Data Platform is a software-only solution. The customer obtains the components forthe supporting infrastructure independently.

For each Streaming Data Platform version, we describe reference architectures comprisingspecific products that are tested and verified. The reference architecture is an end-to-endenterprise solution for streaming analytic use cases. Dell EMC provides setup and configurationguidelines and requirements for the reference architecture.

The reference architecture components for the current version are listed in Referencearchitecture on page 28.

A general description of the supporting infrastructure components follows.

Reference Hardware

Streaming Data Platform runs on a reference hardware platform that provides high-poweredprocessing, high availability, and elastic scale-out storage capacity for Tier 1 storage.

Reference software.

This includes operational software for managing the nodes, the network, and the storagevolumes.

Kubernetes container environment

Streaming Data Platform must run in a Kubernetes container environment. The containerenvironment isolates projects, efficiently manages resources, and provides authentication andRBAC services.

Persistent Tier 2 storage.

Tier 2 storage is required and is configured during installation. Tier 2 storage is either of thefollowing:

l For production solutions with long term storage requirements, an elastic scale out storagesolution is required.

l For testing and development, or for use cases where only temporary storage is needed,Tier 2 storage may be a file system. The solution can use a local mount or an NFS mountpoint.

The following figure shows the supporting infrastructure in context with Streaming Data Platform.

Introduction to Dell EMC Streaming Data Platform

Dell EMC Streaming Data Platform Installation and Administration Guide 15

Page 16: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Figure 2 Streaming Data Platform and supporting infrastructure

FeaturesThe following sections summarize Streaming Data Platform capabilities.

Product highlightsThis section describes the major innovations and unique capabilities in Streaming Data Platform.

Enterprise-ready deployment

Streaming Data Platform is a cost effective, enterprise-ready product. This software platform,running on a recommended reference architecture, is a total solution for processing and storingstreaming data. With Streaming Data Platform, an enterprise can avoid the complexities ofresearching, testing, and creating an appropriate infrastructure for processing and storingstreaming data. Our reference hardware and software infrastructure is scalable, secure,manageable, and verified. Dell EMC defines the infrastructure and provides guidance in setting itup. In this way, Streaming Data Platform dramatically reduces time to value for an enterprise.

Streaming Data Platform provides built-in support for a robust and secure total solution, includingfault tolerance, easy scalability, and replication for data availability.

With Streaming Data Platform, Dell EMC provides the following deployment support:

l Recommendations for the underlying hardware and software infrastructure, verified andtested.

l Sizing guidance for compute and store needs to handle your intended use cases.

l End-to-end guidance for setting up the reference infrastructure, including switching andnetwork configuration (trunks, VLANs, management and data IP routes, and load balancers.)

l Configuration requirements for the underlying software components (Pivotal ContainerServices and VMware) to align them with Streaming Data Platform requirements andinstallation defaults.

l Software image for Streaming Data Platform and API distributions for developers.

The result is an ecosystem ready to ingest and store streams, and ready for your developers tocode and upload applications that process those streams.

Unbounded byte stream ingestion, storage, and analytics

Pravega was designed from the outset to handle unbounded byte stream data.

Introduction to Dell EMC Streaming Data Platform

16 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 17: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

In Pravega, the unbounded byte stream is a primitive structure. Pravega stores each stream (anytype of incoming data) as a single persistent stream, from ingestion to long term storage, like this:

l Recent tail. The real-time tail of a stream exists on Tier 1 storage.

l Long-term. Most of a stream is stored on Tier 2 storage.

Applications use the same API call to access real-time data (the recent tail on Tier 1 storage) andall historical data on Tier 2 storage.

In an Apache Flink application, the basic building blocks are streams and transformations.Conceptually, a stream is a potentially never-ending flow of data records. A transformation is anoperation that takes one or more streams as input and produces one or more output streams. InFlink, non-streaming data is treated internally as a stream.

By integrating these products, Streaming Data Platform creates a solution specifically optimizedfor processing unbounded streaming bytes, as well as bounded streams and more traditional staticdata.

High throughput stream ingestion

Pravega enables the ingestion capacity of a stream to grow and shrink according to workload.During ingestion, Pravega splits a stream into partitions to handle a heavy traffic period, and thenmerges partitions when traffic is less. Splitting and merging occurs automatically and continuouslyas needed. Throughout, Pravega preserves order of data.

Apache Flink can be rescaled by the user while preserving exactly-once semantics. A Flink streamhas one or more stream partitions, with operators and subtasks. The operator subtasks areindependent of each another. They execute in different threads and possibly on different machinesor containers.

Exactly-once semantics

Pravega is designed with exactly-once semantics as a goal. Exactly-once semantics means that, ina given stream processing application, no event is skipped or duplicated with respect to thecomputations. Applications are trusted to produce correct results every time.

Key-based guaranteed order

Pravega and Apache Flink guarantee key-based ordering. Information in a stream is generallykeyed (for example, by sensor or other application-provided key). Streaming Data Platformguarantees that values for the same key are stored and processed in order. At the same time, theplatform is free to scale the storage and processing across keys without concern for ordering.

The ordering guarantee supports use cases that require order for accurate results, such as infinancial transactions.

Massive data volume

Pravega accommodates massive data ingestion. The reference architecture components ensurethat processing and storage is backed by Dell EMC processing and storage solutions. Theprocessing and storage reference hardware are easily scaled out by adding additional nodes.

Batch and publish/subscribe models supported

Both Pravega and Apache Flink support the more traditional batch and publish/subscribe pipelinemodels. Processing for these models includes all the advantages and guarantees described for thecontinuous stream models.

Pravega ingests and stores any type of stream, including:

l Unbounded byte streams, such as data streamed from IoT devices

l Bounded streams, such as movies and videos

l Unbounded append-type log files

Introduction to Dell EMC Streaming Data Platform

Dell EMC Streaming Data Platform Installation and Administration Guide 17

Page 18: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l Event-based input, streaming or batched

In Apache Flink, all input is a stream. Apache Flink processes table-based input and batch input asa type of stream.

ACID-compliant transaction support

A Pravega transaction is part of Pravega's Writer API. The Writer can collect events, persist them,and decide later whether to commit them as a unit to a stream. When the transaction iscommitted, all data written to the transaction is atomically appended to the stream.

The Writer might be an Apache Flink or other application. As an example, an application mightcontinuously process data and produce results, using a Pravega transaction to durably accumulatethe results. At the end of a time window, the application might commit the transaction into thestream, making the results of the processing available for downstream processing. In the case ofan error, the application can abort the transaction and the accumulated processing resultsdisappear.

Combining transactions and other features of Pravega, developers can chain Flink jobs together,having one job's Pravega-based sink be the source for a downstream Flink job. In this way, anentire pipeline of Flink jobs can have end-to-end exactly once, guaranteed ordering of dataprocessing.

In addition, applications can coordinate transactions across multiple streams, so that a Flink jobcan use two or more sinks to provide source input to downstream Flink jobs.

Pravega achieves ACID compliance as follows:

l Atomicity and Consistency are achieved in the basic implementation. A transaction is a set ofevents that is collectively either added into a stream (committed) or discarded (aborted) as abatch.

l Isolation is achieved because the transactional events are never visible to any readers until thetransaction is committed into a stream.

l Durability is achieved when an event is written into the transaction and acknowledged back tothe writer. Because transactions are implemented in the same way as stream segments, datawritten to a transaction is just as durable as data written directly to a stream.

Security FeaturesAccess to Streaming Data Platform configurations and to data is strictly controlled.

Authentication

Authentication is required for all access into Streaming Data Platform data and platformcomponents. Authentication secures against unauthorized changes to Pravega configurations andaccess to data it stores as well as uploaded applications.

Streaming Data Platform relies on Pivotal Container Service User Account and Authentication(PKS UAA) for authentication, to provide a single sign-on (SSO) experience across the data andcontrol planes. The platform also supports federation with existing identity providers, such as anLDAP database.

RBAC Authorization

Authorization is handled by Kubernetes for most control-plane operations and by Keycloak fordata-plane operations. Both use a role-based access control (RBAC) scheme for authorization.

RBAC controls access to:

l A Pravega scope; users are granted read or read/write access to all streams within a givenscope.

l The associated Tier 2 storage for a stream.

Introduction to Dell EMC Streaming Data Platform

18 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 19: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l The projects that have access to stream data.

l The output generated by an analytics job.

TLS for external connections

Transport Layer Security (TLS) can optionally be enabled for external connections. StreamingData Platform supports several certificate options. Let's Encrypt is incorporated into the platformand offers fully automated certificate handling. For certificates from well-known or enterprisecertificate authorities (CAs), the installer generates the certificate signing requests (CSRs) andwe provide a command line tool for extracting the CSRs for submission to your CA.

Data security

Pivotal Container Service (PKS) protects Streaming Data Platform by restricting access to theKubernetes cluster in which the platform is running. RBAC restricts access to resources, includingdata in streams, for both human users and applications.

Note: Streaming Data Platform cannot protect an enterprise from malicious code uploaded byauthorized users. Your enterprise-specific internal procedures must mitigate that risk.

Network security

Streaming Data Platform supports perimeter security with public and private networks. The twonetwork types are separated by ACLs on the network switches in the underlying infrastructure.

l Public networks are all network interfaces that are accessible externally to the StreamAnalytics solution (e.g. API, UI) and where traffic is authenticated & encrypted for security.

l Private networks are all network traffic that is internal to the Streaming Data Platform solutionand where traffic may or may not be authenticated & encrypted.

Note: This security plan guards against malicious attacks external to the Streaming DataPlatform platform. It does not, however, protect against users who have access to StreamAnalytics and hence can run malicious code as part of a processing job. Your enterprise-specific internal procedures must mitigate that risk.

Project isolation

Groups of analysts work within a Streaming Data Platform project. A project defines and isolatesresources for a specific analytic purpose, and RBAC authorization ensures that users areconfigured as project members to access project resources. A dedicated Kubernetes namespaceexists per project, which enhances project isolation.

A project's resources are protected using RBAC implemented by Keycloak middleware. Project-level resources protected by RBAC include:

l Flink clusters and applications

l Flink checkpoint data

l Application program artifacts

l Pravega scopes

Additional featuresHere are additional important capabilities in Streaming Data Platform Version 1.0.

Fault tolerance

The platform is fault tolerant in the following ways:

l All components use persistent volumes to store data. In Pivotal Container Service, persistentvolumes are backed by vSAN volumes.

Introduction to Dell EMC Streaming Data Platform

Dell EMC Streaming Data Platform Installation and Administration Guide 19

Page 20: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l Kubernetes abstractions organize containers in a fault-tolerant way. Failed pods restartautomatically.

Data retention & data purge

Pravega includes the following ways to purge data, per stream:

l Manual trigger in an API call specifying a point in a stream beyond which data is purged.

l Automatic purge based on size of stream.

l Automatic purge based on time.

Stream cuts for historical data processing

Historical stream processing can:

l Set a reading start point.

l Write specific offsets into the stream, such as a quarter end, and read those offsets.

Flink job management

Authorized users can monitor, start, stop, and restart Flink jobs from the Streaming Data PlatformUI. The Flink savepoint feature permits a restarted job to continue processing a stream from whereit left off, guaranteeing exactly-once semantics.

Monitoring and reporting

From the Streaming Data Platform UI, administrators can monitor the state of all projects andstreams. Other users (project members) can monitor their specific projects. Monitoring includes:

l Dashboard views on Streaming Data Platform UI show recent Pravega ingestion metrics, readand write metrics on streams, and Tier 2 storage metrics.

l For administrators, additional Pravega metrics are available on predefined Grafana dashboards.Administrators can see Pravega JVM statistics, and drill down to investigate streamthroughputs and latency metrics.

l Heat maps of Pravega streams show segments as they are split and merged, to help withresource allocation decisions.

l Stream metrics include throughput, latencies, readers and writers per stream, andtransactional metrics like commits and aborts.

l The Apache Flink Web UI monitors Flink jobs as they are running.

Logging

Kubernetes logging is implemented in all Streaming Data Platform components. The PivotalContainer Service (PKS) in the reference architecture provides enhanced logging features,including log collection and searching. For information, see https://docs.pivotal.io/pks/1-5/monitor.html.

Remote support

The Dell Technologies Secure Remote Services (SRS) and call home features are supported forStreaming Data Platform. These features require an SRS Gateway server configured to monitorthe platform. Detected problems are forwarded to Dell Technologies as actionable alerts, andsupport teams can remotely connect to the platform to help with troubleshooting.

Event reporting

Services in Streaming Data Platform connect to the SRS Gateway to collect events and displaythem in the Streaming Data Platform UI. The UI offers search and filtering on the events, includinga way to mark them as acknowledged.

Introduction to Dell EMC Streaming Data Platform

20 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 21: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

InterfacesDell EMC Streaming Data Platform provides the following interfaces for developers,administrators, and data analysts.

Table 1 List of available interfaces

Interface Purpose

Streaming Data Platform UserInterface

Configure and manage streams and analytic jobs. Uploadanalytic applications.

Grafana custom dashboards Drill into metrics for Pravega.

Apache Flink Web UserInterface

Drill into Flink job status.

Keycloak User Interface Configure security features.

Pravega and Apache Flink APIs Application development.

In addition, users may download the Kubernetes CLI (kubectl) for research and troubleshootingfor the Streaming Data Platform cluster and its resources. This includes support for the StreamingData Platform custom resource definitions (CRDs) such as projects and scopes.

Basic terminologyThe following terms are basic to understanding the workflows supported by Streaming DataPlatform.

Pravega scope

The Pravega concept for a collection of stream names. RBAC for Pravega operates at thescope level.

Pravega stream

A durable, elastic, append-only, unbounded sequence of bytes that has good performance andstrong consistency. A stream is uniquely identified by the combination of its name and scope.Stream names are unique within their scope.

Pravega event

A collection of bytes within a stream. An event has identifying properties, including a routingkey, so it can be referenced in applications.

Pravega Writer

A software application that writes data to a Pravega stream.

Pravega Reader

A software application that reads data from a Pravega stream. Reader groups supportdistributed processing.

Flink application

An analytic application that uses the Apache Flink API to process one or more streams. Flinkapplications may also be Pravega Readers and Writers, using the Pravega APIs for readingfrom and writing to streams.

Flink job

Represents an executing Flink application. A job consists of many executing tasks.

Introduction to Dell EMC Streaming Data Platform

Dell EMC Streaming Data Platform Installation and Administration Guide 21

Page 22: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Flink task

A Flink task is the basic unit of execution. Each task is executed by one thread.

Project

A Streaming Data Platform concept. A project defines and isolates resources for a specificanalytic purpose, enabling multiple teams of people to work within Streaming Data Platform inseparate project environments.

Project members

A Streaming Data Platform user with permission to access the resources in a specific project.

Kubernetes environment

The underlying container environment in which all Streaming Data Platform services run. TheKubernetes environment is abstracted from end-user view. Administrators can access theKubernetes layer for authentication and authorization settings, to research performance, andto troubleshoot application execution.

User InterfaceThe Dell EMC Streaming Data Platform provides the same user Interface for all personasinteracting with the platform.

The views and actions available to a user depend on that user's RBAC role. For example:

l Logins with admin role see data for all existing streams and projects. In addition, the UIdisplays buttons that let them create projects, add users to projects, and create Pravegascopes. Those options are not visible to other users.

l Logins with specific project roles can see their projects and the streams, applications, andother resources that are associated with their projects.

Here is a view of the initial UI window that all users see when they first log in. An admin would seeall metrics for all of the streams in the platform. The non-admin users would see metrics for thestreams in their projects.

Introduction to Dell EMC Streaming Data Platform

22 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 23: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Figure 3 Initial UI after login

Grafana dashboardsStreaming Data Platform includes the collection, storage, and visualization of detailed Pravegametrics. By monitoring metrics on these dashboards, administrators can drill into problems oridentify developing memory problems, stream-related inefficiencies, or problems with storageinteractions.

In Streaming Data Platform, metrics reported by Pravega are stored in InfluxDB. InfluxDB is anopen source database built specifically for storing time series data. Some of the Pravega metricsfrom InfluxDB are shown in the Streaming Data Platform UI. More detail is available on Grafanadashboards.

Grafana is an open source metrics visualization tool. The Streaming Data Platform installationdeploys InfluxDB and Grafana, along with the following predefined Grafana dashboards:

l Pravega System Dashboard shows system level metrics related to memory and threads.

l Pravega Operation Dashboard shows system-wide latencies and read/write throughputs.

l Pravega Scope Dashboard shows scope metrics, including some stream level comparisons.

l Pravega Stream Dashboard shows stream-specific throughput, segment metrics, andtransaction metrics.

l Pravega Alerts Dashboard shows metrics about alerts related to Pravega.

You can create your own custom Grafana dashboards as well, accessing any of the data stored inInfluxDB.

The Grafana dashboards are easily available to users with admin role from a link on the StreamingData Platform UI. The link is in the upper left corner of the Dashboard page.

Introduction to Dell EMC Streaming Data Platform

Dell EMC Streaming Data Platform Installation and Administration Guide 23

Page 24: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Apache Flink Web UIThe Apache Flink Web UI shows details about the status of Flink jobs and tasks. This UI helpsdevelopers and administrators to verify Flink application health and troubleshoot runningapplications.

The Streaming Data Platform UI contains direct links to the Apache Flink Web UI in two locations:

l From the Analytics Project page, navigate to a project and then click a Flink Cluster name.The name is a link to the Flink Web UI which opens in a new browser tab. It displays theOverview screen for the Flink cluster you clicked. From here, you can drill into status for alljobs and tasks.Figure 4 Apache Flink Web UI

l From the Analytics Project page, navigate to a project and then click a Flink Cluster name.Continue navigating to the applications running in the cluster. Each application name is a link toa Flink Web UI page that shows the running Flink Jobs in that application. These pages alsoopen in a new browser tab.

APIsThe following developer resources are included in a Streaming Data Platform distribution.

Streaming Data Platform includes these application programming interfaces (APIs):

l Pravega APIs, required to create the following Pravega applications:

n Writer applications, which write stream data into the Pravega store.

n Reader applications, which read stream data from the Pravega store.

l Apache Flink APIs, used to create applications that process stream data.

Stream processing applications typically use both Pravega and Apache Flink APIs to read datafrom Pravega, process or analyze the data, and perhaps even create new streams that requirewriting into Pravega.

Introduction to Dell EMC Streaming Data Platform

24 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 25: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

What you get with Streaming Data PlatformThe Streaming Data Platform distribution includes the following software.

l Pravega data store and API

l Apache Flink framework, processing engine, and APIs

l Dell EMC Streaming Data Platform management plane software

l InfluxDB for storing Pravega metrics

l Grafana for presenting Pravega metrics

l Keycloak software

l A collection of scripts and tools, including the installer.

Use case examplesFollowing are some examples of streaming data use cases that Dell EMC Streaming Data Platformis especially designed to process.

Industrial IoT

l Detect anomalies and generate alerts.

l Collect operational data, analyze the data, and present results to real-time dashboards andtrend analysis reporting.

l Monitor infrastructure sensors for abnormal readings that can indicate faults, such asvibrations or high temperatures, and recommend proactive maintenance.

l Collect real-time conditions for later analysis. For example, determine optimal wind turbineplacement by collecting weather data from multiple test sites and analyzing comparisons.

Streaming Video

l Store and analyze streaming video from drones in real time.

l Surveillance and security monitoring.

l Serve on-demand video.

Automotive

l Process data from automotive sensors to support predictive maintenance.

l Detect and report on hazardous driving conditions based on location and weather.

l Provide logistics and routing services.

Financial

l Monitor for suspicious sequences of transactions and issue alerts.

l Monitor transactions for legal compliance in real-time data pipelines.

l Ingest transaction logs from market exchanges and analyze for real-time market trends.

Healthcare

l Ingest and save data from health monitors and sensors.

l Feed dashboards and trigger alerts for patient anomalies.

Introduction to Dell EMC Streaming Data Platform

Dell EMC Streaming Data Platform Installation and Administration Guide 25

Page 26: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

High-speed events

l Collect and analyze IoT sensor messages.

l Collect and analyze Web events.

l Collect and analyze logfile event messages.

Batch applications

Batch applications that collect and analyze data are supported.

Documentation listUse these resources for more information.

Table 2 Streaming Data Platform documentation

Subject Reference

Dell EMC Streaming Data Platformdocumentation

l Dell EMC Streaming Data Platform Developer's Guideat https://www.dellemc.com/en-us/collaterals/unauth/technical-guides-support-information/2020/01/docu96951.pdf

l (This guide) Dell EMC Streaming Data PlatformInstallation and Administration Guide at https://www.dellemc.com/en-us/collaterals/unauth/technical-guides-support-information/2020/01/docu96952.pdf

l Dell EMC Streaming Data Platform SecurityConfiguration Guide at https://www.dellemc.com/en-us/collaterals/unauth/technical-guides-support-information/2020/01/docu96953.pdf

l Dell EMC Streaming Data Platform Release Notes at https://support.emc.com/docu96954

Note: You must log onto a Dell support accountto access release notes.

Pravega concepts, architecture, usecases, and Pravega APIdocumentation.

Pravega Open Source Project documentation:

http://www.pravega.io

Apache Flink concepts, tutorials,guidelines, and Apache Flink APIdocumentation.

Apache Flink Open Source Project documentation:

https://flink.apache.org/

Introduction to Dell EMC Streaming Data Platform

26 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 27: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 2

Site Prerequisites

This chapter describes prerequisite hardware, networking, and software configurations that mustbe in place before you can install Streaming Data Platform.

l Installation prerequisites........................................................................................................28l Prerequisite infrastructure.................................................................................................... 28l Compatibility and minimum version requirements ................................................................. 31l Obtain and save the license file..............................................................................................31l Deploy Dell EMC SRS Gateway............................................................................................. 32l Set up local DNS server.........................................................................................................32l Configure Isilon storage.........................................................................................................33

Dell EMC Streaming Data Platform Installation and Administration Guide 27

Page 28: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Installation prerequisitesThe following items are prerequisites to using these installation instructions.

Item Description

☐ Infrastructure setup The underlying hardware, network infrastructure, Isilon cluster forTier 2 storage, and supporting software must be configured andoperational. See topics in Prerequisite infrastructure on page 28.

☐ PKS Streaming Data Platform requires a Kubernetes environment. Thesupported reference architecture uses Pivotal Container Service(PKS). Other distributions of Kubernetes may also be compatible,but are not officially supported. See Prerequisite infrastructure onpage 28. For PKS version information, see Compatibility andminimum version requirements on page 31.

☐ License You must have a Streaming Data Platform license file. See Obtainand save the license file on page 31.

☐ Secure RemoteServices (SRS)Gateway

Typical production deployments require a Dell EMC SRS Gateway.See Deploy Dell EMC SRS Gateway on page 32.

Note: Dark sites and evaluation deployments may skip thisprerequisite.

☐ Local Domain NameSystem (DNS)Server

Streaming Data Platform requires a local DNS server that canresolve external connection requests (from your distributed users)to endpoints inside the Kubernetes cluster. Technically, a local DNSserver is optional. However, convenient connections from externalpoints require a local DNS server. See Set up local DNS server onpage 32.

☐ Configure Tier 2storage

Tier 2 storage must be defined before installation. Configure Isilonstorage on page 33.

Prerequisite infrastructureBefore installing the Streaming Data Platform software, all infrastructure planning and setup mustbe completed.

Reference architectureThis section describes the tested and verified reference architecture for Streaming Data Platform1.0.

The following figure shows Streaming Data Platform deployed on the verified referencearchitecture for 1.0. Brief explanations of the reference architecture components follow the figure.

Site Prerequisites

28 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 29: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Figure 5 Reference architecture

Table 3 Reference architecture descriptions for v1.0

Category Verified components for 1.0 Purpose

Underlying hardware l Dell EMC VxRail servers,with at least the numberof nodes prescribed in ourStarter Pack.

l Dell EMC 4048T-ONswitches. A minimum of 2switches is required forfault tolerance and highavailability.

l A network interface nodeof customer choice.

Stream processing, analyticsprocessing, and Tier 1 storage

Continuously expandingpersistent storage

l Isilon cluster required formost productionscenarios.An existing Isilon clusterwith NFSv4 enabled andGen5 hardware isacceptable.

Isilon is a scale-out NAS,with easily expandablestorage capacity. Startwith enough nodes tohandle your initial streamstorage use cases.

l A file system isacceptable for testingand other scenarioswhere data is shortlived.

Tier 2 storage

Site Prerequisites

Dell EMC Streaming Data Platform Installation and Administration Guide 29

Page 30: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Table 3 Reference architecture descriptions for v1.0 (continued)

Category Verified components for 1.0 Purpose

Underlying softwareSoftware Purpose

VMware ESXi Operating system

VMware vSAN Storage

vSphere with NSX-T Storage volumemanagement &networking

VMware Harbor or otherDocker registry

Image registry

Pivotal Container Service Kubernetes environment

Note: For a support matrix of required versions, see Compatibility and minimum versionrequirements on page 31.

PlanningDell Technologies helps you size and plan an appropriate hardware infrastructure for a StreamingData Platform implementation.

Sizing recommendations are based on your intended analytic use cases and predicted volume ofstreaming data.

Streaming Data Platform is designed as an easily scalable solution. For Streaming Data Platform1.0, the reference architecture includes Dell EMC Isilon and Dell EMC VxRail servers. Both systemsscale out easily by adding more nodes.

The reference architecture enables you to start with a smaller hardware setup and scale-outprogressively.

Build the infrastructureThe prerequisite hardware and software infrastructure must be configured before installingStreaming Data Platform.

Hardware prerequisites

A Streaming Data Platform customer must acquire the underlying hardware infrastructure.

When you have the hardware onsite, your Dell Technologies account team can providerecommendations and guidance for setting up and configuring the components to create theappropriate infrastructure required for Streaming Data Platform.

Software prerequisites

A Streaming Data Platform customer must acquire and install the underlying software thatprovides basic networking support, volume management for Tier 1 storage, the Isilon cluster forTier 2 storage properly configured, and the Kubernetes environment.

The Dell Technologies account team can provide recommendations and guidance for installing andconfiguring the software to comply with Streaming Data Platform requirements.

Site Prerequisites

30 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 31: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Compatibility and minimum version requirementsThe following component versions are supported for Streaming Data Platform V1.0.

Kubernetes

Streaming Data Platform requires a Kubernetes environment. The product is tested forcompatibility with Pivotal Container Service (PKS) 1.5.1 and its included Kubernetes version.

Isilon

OneFS 8.2.x (minimum) cluster with NFSv4 enabled on Gen5 or later Isilon hardware.

SRS Gateway

SRS Gateway v3.38 or greater.

For more information:

l PKS Product Compatibility Matrix: https://docs.pivotal.io/resources/product-compatibility-matrix.pdf

l Enterprise PKS Release Notes: https://docs.pivotal.io/pks/1-5/release-notes.html

l For Enterprise PKS on vSphere: VMware Product Interoperability Matrices

Obtain and save the license fileStreaming Data Platform is a licensed product and requires the license file for installation. Anevaluation license is available. The license file is an XML file.

Before you begin

l Obtain the license activation code (LAC) for Streaming Data Platform. Customers typicallyreceive the LAC in an email when they purchase a license.

l Have your site's support.dell.com account credentials available.

Procedure

1. In a browser, search for Dell Software Licensing Center.

2. Log in using your support.dell.com account credentials.

3. In the Software Licensing Center (SLC), perform a search for your LAC number.

4. Follow the SLC wizards to assign your license to a system and activate the license.

5. Use the SLC feature to download the license file.

6. Save the license file to a location accessible to the PKS host machine, where you will installStreaming Data Platform.

7. Do not alter the license file in any way. Altering the file invalidates the signature and theproduct will not be licensed.

Note: Be careful if you FTP the license file. Always use a binary FTP; otherwise, FTPmay cause a file alteration and invalidate the license.

After you finish

In a later step, you will insert the entire XML license file contents into the configuration values file,in the dellemc-streamingdata-license section. See Prepare configuration values files onpage 36 for details.

Site Prerequisites

Dell EMC Streaming Data Platform Installation and Administration Guide 31

Page 32: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Deploy Dell EMC SRS GatewayFor typical production deployments, you must have a Dell EMC Secure Remote Services (SRS)Gateway at your site. Dark sites and sites with evaluation licenses may skip this step.

Features

The SRS Gateway supports the following features in Streaming Data Platform:

l Collection and forwarding of events to Dell technical support.

l Call home features to alert Dell EMC technical support of problems.

l Remote access to your Streaming Data Platform cluster by Dell EMC technical support for logcollection and other troubleshooting activities, if you authorize such access.

Before you begin

l Obtain the Streaming Data Platform license file (as described here) before registering andconfiguring the SRS Gateway.

l Have your site's support.dell.com account credentials available.

Deployment

Streaming Data Platform requires an SRS Gateway v3.38 or greater.

Follow standard Dell EMC procedures for deploying the SRS Gateway. See the Secure RemoteServices 3.38 Installation Guide at https://www.dellemc.com/en-us/collaterals/unauth/technical-guides-support-information/2019/08/docu95325.pdf.

During the SRS Gateway deployment, you will use your support.dell.com account to registerthe SRS Gateway to the Dell EMC SRS backend services. At that time, record the followinginformation about the SRS Gateway:

l IP address or Fully Qualified Domain Name of the SRS Gateway.

l Credentials (username and password) of the support.dell.com user account.

In a later step, you will provide the above connection information in the Streaming Data Platformconfiguration values file, in the srs-gateway: section. See Prepare configuration values files onpage 36 for details.

Set up local DNS serverA local DNS server is required for seamless connections to Streaming Data Platform endpointsfrom external requests.

While a local DNS server is not technically required to install Streaming Data Platform. However, ina production deployment, a local DNS server enhances your user experiences. The local DNSserver resolves requests that originate outside the cluster to endpoints running inside theKubernetes cluster. The installation and connection instructions later in this guide assume that youhave a local DNS server set up.

You can set up the local DNS server on the vSphere where Streaming Data Platform will beinstalled or use another local DNS server elsewhere in your network. You may use a cloud DNS. Donot use the corporate DNS server.

The following paragraphs provide some guidelines for setting up the local DNS server.

Site Prerequisites

32 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 33: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Local DNS installed in the internal network (CoreDNS or BIND)

On the PKS host machine (where you will be installing Streaming Data Platform), add an entryinto the etc/hosts file that maps to the local DNS server. For example, see the second linebelow:

cat /etc/hosts127.0.0.1 localhost172.16.0.7 coredns.myserver.local coredns10.240.125.247 test-infra7.myserver.local10.240.124.5 harbor.myserver.local

Cloud DNS

To use cloud DNS solutions, such as AWS Route53 or Google Cloud DNS, you must have anaccount with the cloud provider. You must know the account name and credentials to theaccount.

In a later step, you will provide the connection details for the local DNS server in the configurationvalues file, in the external-dns: section. See Prepare configuration values files on page 36 fordetails.

Configure Isilon storageUsing Isilon standard procedures, configure the Tier 2 storage on your Isilon cluster.

To use NFSv4.0 as an option for your Tier 2 storage, the Isilon cluster must be configured toenable 4.0.

Gather the following information about your Tier 2 storage:

l Isilon cluster ip

l Mount point

l Mount options

In a later step, you will provide the above information in the configuration values file, in the nfs-client-provisioner: section. See Prepare configuration values files on page 36 for details.

Site Prerequisites

Dell EMC Streaming Data Platform Installation and Administration Guide 33

Page 34: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Site Prerequisites

34 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 35: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 3

Installation

This chapter is intended for administrators who install and maintain the Dell EMC Streaming DataPlatform software in the Kubernetes cluster.

l About configuration values files.............................................................................................36l Prepare configuration values files..........................................................................................36l TLS configuration details.......................................................................................................43l Prepare the working environment......................................................................................... 46l Set up UAA and required UAA admin account....................................................................... 48l Required tasks for UAA federation........................................................................................ 49l Create the Kubernetes cluster.............................................................................................. 50l Set up a docker registry........................................................................................................ 52l Obtain installation files.......................................................................................................... 53l Extract the installer tool........................................................................................................53l Push images into the registry................................................................................................ 54l Remove SRS Gateway from the manifest (if needed)........................................................... 54l Run prereqs script ................................................................................................................55l Run pre-install script............................................................................................................. 56l Run validate-values script..................................................................................................... 56l Source control the configuration values files.........................................................................57l Apply and synchronize the configuration...............................................................................57l Run post-install script........................................................................................................... 59l What's next...........................................................................................................................59

Dell EMC Streaming Data Platform Installation and Administration Guide 35

Page 36: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

About configuration values filesConfiguration values files contain configuration settings for Streaming Data Platform. These filesare required input to the installation command.

Purpose

Streaming Data Platform configuration and deployment options must be planned for and specifiedin configuration files before the installation takes place. The installer tool uses the configurationvalues during the installation process.

Note: Some settings cannot be changed after installation, requiring an uninstall and reinstall.

Specifically, the configuration values serve the following purposes:

l Enable/disable features

l Set high-level customer-specific values such as server host names and required licensing files

l Set required secrets for component communication

l Configure required storage for the platform

l Configure features

l Override default values that the installer uses for sizing and performance resources

l Override default names that the installer users for some components

Template location

See Template of configuration values file on page 146. The template contains most of theconfiguration settings used by the Streaming Data Platform installer.

File format

A configuration values file contains key-value pairs in YAML syntax. Spacing and indentation isimportant in YAML files.

The sections in the values file are named according to the component that they are configuring.For example, the section that contains configuration values for the SRS Gateway is named srs-gateway.

If you copy from the template, notice that the entire template comments out all of the sections. Besure to remove the # characters from the beginnings of lines to uncomment sections that youcopy.

Multiple configuration values files

The Streaming Data Platform installer accepts a string of configuration value file names, separatedby commas. Some sites prefer using one large file that contains all of the values, and others prefera set of files. With multiple files, you can isolate sensitive values and separate permanent valuesfrom those that might require more frequent updates

Prepare configuration values filesThis procedure describes the values that are essential to a successful Streaming Data Platforminstallation. You may optionally add other values that you see documented in the templates orelsewhere throughout the documentation.

Procedure

1. Create one or more text files to hold the configuration values.

The installation command accepts multiple file names for values.

Installation

36 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 37: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

2. (Required) Set external connection values and configure TLS.

Copy the global: external: section from the template, or copy the following example:

global: external: # fqdn of this cluster, this has to be unique host: "<clustername>.abc-lab.com" tls: true certManagerIssuerName: letsencrypt-production

Table 4 Configure external connection and TLS

Name Description

host: The top-level domain name you want to assign to Streaming DataPlatform master node. This value is visible to end users in the URLs theyuse to access the UI and other endpoints running in the cluster.The format is "<name>.<host-fqdn>"where:

l <name> is your choice.

l <host-fqdn> is the fully-qualified domain name of the server

hosting Streaming Data Platform.

For example, in xyz.desdp.example.com, xyz is <name> and

desdp.example.com is the host-fqdn.

This field is setting the top-level domain name (TLD) from theperspective of Streaming Data Platform. The product UI is served off ofhttps://<TLD>. The Grafana UI is served off of https://grafana.<TLD>, and so on for the other endpoints.

For example, a TLD of xyz.desdp.example.com serves the UI off of

https://xyz.desdp.example.com and Grafana off of https://grafana.xyz.desdp.example.com. The DNS server will have

authority to serve records for *.xyz.desdp.example.com.

tls: Enables TLS. Set to true or false.

certManagerIssuerName:

Leave as is if you enable TLS and want to use certificates that are auto-generated by Let's Encrypt. See TLS configuration details on page 43for other TLS options and how to configure them.

3. (Required) Configure connections to the local DNS server.

For reference, see Set up local DNS server on page 32 in the Site Prerequisites chapter.

The following examples show settings for three types of local DNS server. Copy one of thefollowing external-dns: section examples as appropriate for your setup and supply therequired values.

AWS Route53 option

external-dns: aws: credentials:

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 37

Page 38: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

secretKey: "<AWS Secret Access Key Value>" accessKey: "<AWS Secret Access Key ID>"

CoreDNS option

external-dns: provider: coredns coredns: etcdEndpoints: "http://10.243.NN.NNN:2379" extraArgs: ['--source=ingress','--source=service','--provider=coredns','--log-level=debug'] rbac: # create & use RBAC resources create: true apiVersion: v1 # Registry to use for ownership (txt or noop) registry: "txt" # When using the TXT registry, a name that identifies this instance of ExternalDNS txtOwnerId: "<host>.<domain>" ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) policy: sync domainFilters: [<host>.<domain>] logLevel: debug

Bind option

external-dns: provider: rfc2136 rfc2136: host: "10.243.NN.NNN" port: 53 zone: "nautilus-lab-ns.lss.emc.com" tsigSecret: "ooDG+GsRmsrryL5g9eyl4g==" tsigSecretAlg: hmac-md5 tsigKeyname: externaldns-key tsigAxfr: true rbac: # create & use RBAC resources create: true apiVersion: v1 # Registry to use for ownership (txt or noop) registry: "txt" # When using the TXT registry, a name that identifies this instance of ExternalDNS txtOwnerId: "<host>.<domain>" ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) policy: sync domainFilters: [<host>.<domain>] logLevel: debug

4. (Required) Configure connection to the NFS (Tier 2) storage.

For reference, see Configure Isilon storage on page 33 in the Site Prerequisites chapter.Streaming Data Platform uses NFS storage for two purposes:

l Storage for analytics projects, such as for application artifacts and checkpoint states.

l Tier 2 storage for Pravega streams.

Installation

38 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 39: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Copy the nfs-client-provisioner: section from the template, or start with thefollowing example:

nfs-client-provisioner: nfs: server: 1.2.3.4 path: /data/path mountOptions: - nfsvers=4.0 - sec=sys - nolock storageClass: archiveOnDelete: "false"

Table 5 Configure NFS storage

Name Description

nfs.server The NFS server hostname or address. For example, the Isiloncluster IP address.

nfs.path The NFS export path.

nfs.mountOptions The NFS mount options (in fstab format).

storageClass.archiveOnDelete

Indicates whether to delete NFS data upon uninstallation.Values are:

l "false" does not save any data

l "true" archives the data. However, this archive is notreadable in a new installation of Streaming Data Platform

5. (Required) Provide the licensing file.

For reference, see Obtain and save the license file on page 31 in the Site Prerequisiteschapter.

Copy the monitoring:license: and the dellemc-streamingdata-license:sections from the template, or copy the following example:

monitoring: license: name: dellemc-streamingdata-licensedellemc-streamingdata-license: licensefile: |- <insert the entire XML license file here> product: streamingdata

Table 6 Configure monitoring and licensing

Name Description

monitoring: license: name:

The value must be dellemc-streamingdata-license.

This value enables accurate reporting of usage for Flinklicensing. Flink usage monitoring is required.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 39

Page 40: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Table 6 Configure monitoring and licensing (continued)

Name Description

dellemc-streamingdata-license: licensefile:

The entire XML license file must be copied here. Wheneverthe license changes, such as if you upgrade from an EVAL to aregular license, or you upgrade the license with additionalcores or extended dates, copy your new license file here andreapply the configuration.

Note: Do not alter the license file in any way. Doing so willinvalidate the signature, and the product will not belicensed.

product: The value must be streamingdata

6. Configure the SRS Gateway connection, or remove the SRS Gateway from the manifest.

You need to either:

l Provide the SRS Gateway connection information in the configuration values file. Forreference, see Deploy Dell EMC SRS Gateway on page 32 in the Site Prerequisiteschapter.

l Remove SRS Gateway deployment from the manifest.

The following sections describe each option.

Configure SRS Gateway connection information

Copy the srs-gateway: section from the template, or copy the following example:

srs-gateway: gateway: hostname: <FQDN or IP> port: <9443> login: <srs-username>:<srs-password> product: STREAMINGDATA

Table 7 Configure SRS

Name Description

hostname: The fully qualified domain name or IP address of your SRSGateway.

port: The value must be 9443.

login: Your dell.support.com account credentials.

product: The value must be STREAMINGDATA.

Remove SRS Gateway deployment from the manifest

Note: You cannot complete this task until later in the installation process, after youunzip the installation files. The task is described in Remove SRS Gateway from themanifest (if needed) on page 54.

7. (Required for UAA federation) Enable Keycloak to UAA connections.

Installation

40 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 41: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

In the Keycloak section, define certificate information for the PKS api https endpoints.

Note: You cannot complete this section until later in the installation process. Thedocumentation points you back here when appropriate. You will obtain the PKS APIhostname and the pem file in Required tasks for UAA federation on page 49.

Copy the keycloak: keycloak: egress: section from the template, or copy thefollowing example:

keycloak: keycloak: egress: enabled: true trusted_certs_yaml: certs: - cn: "*.abb.lab.emc.com" pemdata: | -----BEGIN CERTIFICATE----- MIIDeTCCAmGgAwIBAgIUUkUmw/qiSaBxaMmJtalSDHLfzl8wDQYJKoZIhvcNAQEL BQAwHzELMAkGA1UEBhMCVVMxEDAOBgNVBAoMB1Bpdm90YWwwHhcNMTgwNzAyMjMz NDE4WhcNMjAwNzAyMjMzNDE4WjA7MQswCQYDVQQGEwJVUzEQMA4GA1UECgwHUGl2 b3RhbDEaMBgGA1UEAwwRKi5lY3MubGFiLmVtYy5jb20wggEiMA0GCSqGSIb3DQEB AQUAA4IBDwAwggEKAoIBAQC0zhhOZiIn56yUNm4TxUjeLHJ8ixO5RPwExw/rkKMI SE7EscZEJheM/f085Bz3BNmaM9TQZDdcwNU6aElSrtIXAe32h8tX64K3yFA9pqWZ 9tpn50skd0S8rWEWZBE8xjOL40p3cVjYJDCGPnwUfGDpAWcjSrH2bPrXsm1Ia5v5 97F1tg29K9BfcGX11E7nthDKG8xNdpl+RnISHXG7TLumkvMSRPTsMQFdGXjp3uL/ 6cpKFx0eadytuZBSDzhEzAp4jU2UL1RFsGwt8QdoLcIekfRRMXux98mRuQEaki8B cA/Idpizt48MfpuxcOZP+hzp9BHwUUDX0YtWnOr33LXTAgMBAAGjgZAwgY0wHQYD VR0OBBYEFJ1cvl6ISQvj9J91V7p33Fh0AoTLMB8GA1UdIwQYMBaAFDSkYYznL+oi UpoK3ghNCVePwlAFMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcDATAOBgNV HQ8BAf8EBAMCB4AwHAYDVR0RBBUwE4IRKi5lY3MubGFiLmVtYy5jb20wDQYJKoZI hvcNAQELBQADggEBAHZX8A5zkrF4tzW0Q1QybDF0AQPiYBdtYBm29UwgTxrtgMoz 4Fh6JIRfX6TUnDXPBNiq2rWJ/DW/xtCr6fo4lkPuv2I8bqz3QpvzQclEpBmqiE5B JXEM/DblyeKtWvJ9wYBCBKg/QkuS54nWpSbbszDAfJxbvbGCqcynNYkNfXp8jlUH pTGlYg+Tar4kUYBJ2KYl7+8H3raVTXtFEb8tl1YUbsk/BXbUwGImzS21PYS+pHTQ tsYyAmlvFMwZYDiLikI9N74KMh7hCLh5vcH08LW3laBe9TPjzjjE8gEjzWFFEz45 pnoW5GUzNIT2ogrBlc2Kl4H5rNzaccoYEDr5BE4= -----END CERTIFICATE-----

Table 8 Configure Keycloak to UAA connection

Name Description

enabled Set to true.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 41

Page 42: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Table 8 Configure Keycloak to UAA connection (continued)

Name Description

cn: The PKS API hostname with an asterisk substituted for the firstcomponent, enclosed in double quotes. For example, if the API hostnameis pks-api.env1.local, the CN to provide here is

"*.env1.local".

pem: The certificate (contents of the .pem file) copied in whole as shown inthe example.

8. (Required) Decide how to assign passwords for the default administrator accounts.

The passwords may be created in either of the following ways:

l You may skip this step and allow the installer to autogenerate passwords. Afterinstallation, see Obtain default admin credentials on page 72 to retrieve the generatedvalues.

l You may provide the initial password values by adding a keycloak:keycloak: sectioninto the configuration values file.

Copy the keycloak: section from the template, or copy the following example:

keycloak: keycloak: password: <master admin password> DESDPPassword: <nautilus realm admin password>

Table 9 Provide initial password values

Name Description

password: "" Add the desired password value enclosed in doublequotes.

DESDPPassword: "" Add the desired password value enclosed in doublequotes.

9. Save the values files in a secure location. You will specify the location of the configurationvalues files when you run the decks-install command to install Streaming Data Platform.

l The files are typically named values.yaml or similar, but that name is not required.

l Some secrets may be in plain text. For this reason, we recommend that you source-control the values files.

l The installer tool accepts multiple values file names in a comma-separated string with nospaces. Therefore, you can split the secrets into separate files and strictly control accessto them.

Installation

42 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 43: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

TLS configuration detailsStreaming Data Platform supports Transport Layer Security (TLS) for external connections to theplatform.

TLS is optional. The feature is enabled or disabled with a true/false setting in the configurationvalues file. You have the following certificate authority (CA) options.

Let's Encrypt

Let's Encrypt is an open certificate authority, provided by the Internet Security ResearchGroup (ISRG). It provides free digital certificates for HTTPS and TLS access. If you configurethis option in Streaming Data Platform, then obtaining, securing, and renewing certificates isautomated.

Private CA

You can generate a private certificate key and certificate, add the certificate to a trust storeand make it available to the Streaming Data Platform installer.

Enterprise or well-known CA

For these options, you extract the Certificate Signing Requests (CSRs) from the StreamingData Platform installer and send them to the CA. The CA will issue the signed certificateswhich you will need to install. Streaming Data Platform provides a cli-issuer tool to facilitatehandling of CSRs and installing signed certificates.

The following sections provide configuration details for each of the TLS options.

Enable or disable TLSEnable or disable TLS in the initial installation.

About this task

You cannot reconfigure the tls: {true | false} setting after initial installation. The changewould require an uninstall and reinstall procedure. You may change the type of TLS certificatesthat you are using after initial configuration.

Procedure

1. In the configuration values file, set the following entry to either true or false.

global:external:tls: {true | false}2. If tls: is true, also supply additional required information under

global:external:tls:. The following sections describe how to configure the additionalinformation for the different types of CAs.

Configure TLS using certificates from Let's EncryptThe Let's Encrypt certificates are the easiest to use because certificate generation is automatedwithin the platform.

Procedure

1. In the configuration values file, configure the global section as follows:

global: external: # fqdn of this cluster, this has to be unique host: "<clustername>.abc-lab.com"

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 43

Page 44: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

tls: true certManagerIssuerName: letsencrypt-production

2. Configure the cert-manager-resources section as follows:

cert-manager-resources: certManagerSecrets: - name: value: <paste-secret-here>

clusterIssuer: name: letsencrypt-production server: https://acme-v02.api.letsencrypt.org/directory email: [email protected] acmeSecretKey: issuer-letsencrypt-dns-auth-secret solvers: - dns01: #see DNS section#

cert-manager: webhook: enabled: false

Configure TLS using a private certificateThis procedure generates a certificate on the command line using openssl and configures TLS touse the certificate.

Procedure

1. Generate a private key.

Here is an example using openssl.

openssl genrsa -out tls.key 2048

2. Create a certificate.

The following example uses openssl to create a certificate valid for 10 years with the signingoption set:

openssl req -x509 -new -nodes -key tls.key -subj "/CN=<domain>" -days 3650 -reqexts v3_req -extensions v3_ca -out tls.crt

3. In the configuration values file, configure the global section as follows:

global: external: # fqdn of this cluster, this has to be unique host: "<clustername>.abc-lab.com" tls: true certManagerIssuerName: cli wildcardSecretName: cluster-wildcard-tls-secret certManagerIssuerGroup: my-group.dellemc.com

Installation

44 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 45: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

wildcardSecretName: Identifies the secret name.

certManagerIssuerGroup:

Any value that identifies the issuer.

4. Configure the cert-manager-resources section as follows:

cert-manager-resources: certManagerSecrets: - name: tls.crt value: | <REDACTED> - name: tls.key value: | <REDACTED>

5. Add the tls.crt to the truststore in the Streaming Data Platform images.

Use the following command:

./decks-install config set registry ./decks-install push --input decks-images-.tar --ca-certs-dir

Note: The certificate file name extension must be .crt. The decks-install pushcommand runs update-ca-certificates and expects the .crt file extension. Itignores any other file extension.

Configure TLS using signed certificates from a well-known CAThis procedure obtains a signed certificate from a well-known CA and configures TLS to use thecertificate.

About this task

The Streaming Data Platform installer creates certificate signing requests (CSRs). We provide atool to extract the CSRs from the cluster and then later, to import the signed certificates into thecluster.

The following process includes starting the installer so it can create the CSRs, stopping theinstallation to extract the CSRs, submitting the CSRs to the CA, installing the signed certificates,and finally, restarting the installation process.

Procedure

1. Extract the cli-issuer tool from the Streaming Data Platform installation files.

Extract the cli-issuer-<version>.zip archive and navigate into the expandeddirectory. There are three binary executables for different platforms, named cli-issuer-<platform>. For convenience, create a symlink or rename the one you will use to cli-issuer.

2. In the configuration values file, prepare the global section as follows:

global: external: # fqdn of this cluster, this has to be unique host: "<clustername>.abc-lab.com"

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 45

Page 46: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

tls: true certManagerIssuerName: cli wildcardSecretName: cluster-wildcard-tls-secret certManagerIssuerGroup: nautilus.dellemc.com

3. Start the installation using the decks-install apply command as described in the Apply and synchronize the configuration on page 57.

4. In another window, monitor for CSR generation.

Enter the following command:

./bin/cli-issuer list -A --cluster <cluster-name> --insecure-skip-tls-verify

In the output, you are looking for messages that state Certificate signing request(CSR) is available.

./bin/cli-issuer list -A --cluster <cluster-name> --insecure-skip-tls-verify

NAMESPACE NAME SUBJECT STATUS MESSAGEnautilus-pravega pravega-native-tls-certificate-763176170 *.pravega.cluster1.desdp.dell.com Pending Certificate signing request (CSR) is availablenautilus-system wildcard-tls-certificate-80022116 *.cluster1.desdp.dell.com

5. When all CSRs are available, return to the install window and stop the installation by usingCTRL-C.

6. Use the cli-issuer tool to extract the CSRs from the cluster.

The following command saves the CSR in desdp.csr.

./bin/cli-issuer export -A -f </path/to/desdp.csr>

7. Submit or upload the CSRs to your selected well-known CA or follow internal procedures foran enterprise CA.

8. When you receive the signed certificates from the CA, save locally.

9. Use the cli-issuer tool to import the signed certificates into the cluster.

./bin/cli-issuer import -A -f </path/to/cert> -ca </path/to/ca-cert> -n <namespace>

10. Resume the install using the same command you used to start the install.

Prepare the working environmentYour working environment requires some software tools and interfaces befoe you can begin theinstallation steps. The working environment is the command line environment you will use to run

Installation

46 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 47: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

the installer command. It could be your laptop or workstation, the PKS host machine, or amanagement node.

Procedure

1. Install the Pivotal Container Service CLI (pks).

You need pks commands to create the cluster for Streaming Data Platform, give otherusers access to it, and connect to the cluster later.

Instructions are at https://docs.pivotal.io/pks/1-5/installing-pks-cli.html.

It is best to match the version of the CLI that you download with the PKS version, althoughearlier CLI versions will work.

2. Install the native Kubernetes CLI (kubectl).

You need kubectl commands to interact directly with the Kubernetes cluster.

Instructions are https://docs.pivotal.io/pks/1-5/installing-kubectl-cli.html.

It is important to match the version of the CLI that you download with the version ofKubernetes that comes with PKS.

3. Install the Helm version 2.13.1 client CLI.

The Streaming Data Platform installer tool uses Helm charts to install applications into theKubernetes cluster. You must have the Helm client installed on the machine where PKS isinstalled, where you will install Streaming Data Platform.

Note: Helm version 2.13.1 is required. Other version may be incompatible.

l Download Helm from its release page on GitHub https://github.com/helm/helm/releases/tag/v2.13.1.

l Follow installation instructions on the Helm website, https://helm.sh/docs/intro/install/.

4. Install a local Docker daemon and CLI.

See Docker documentation at https://docs.docker.com/install/ to get the correct daemonfor your local OS.

5. Install software required to run scripts that are included with the product.

Software and version Notes

Java Runtime Environment1.8+

JRE versions 1.8. or greater. JRE 1.9, for example isacceptable.

Python2 and Python3 Both are required

The following Pythonmodules:

l kubernetes

l pyyaml

l requests

An example package request is:

pip2 install kubernetes pyyaml requests

6. Install a modern desktop browser such as Chrome, Firefox, or Edge for accessing theStreaming Data Platform UI.

7. Install PuTTY or other software that provides a way to connect to the intended StreamingData Platform host machine and establish a Linux shell.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 47

Page 48: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Set up UAA and required UAA admin accountThe PKS User Authentication and Authorization (UAA) service is required by Streaming DataPlatform. At least one UAA user account is required to create clusters in PKS.

About this task

This procedure does the following:

l Installs the UAA CLI (uaac).

l Configures UAA in the PKS environment.

l Creates one UAA user account with the required administrative roles.

Procedure

1. Using connection software such as PuTTY, log into the virtual machine where PKS wasinstalled and establish a Linux shell there. You do not need root permissions.

2. Install the Cloud Foundry UAA CLI (uaac) and set up UAA.

You need uaac to create at least one cluster administrator for the Streaming Data Platformcluster.

a. Install and setup instructions are at https://docs.pivotal.io/pks/1-5/vsphere-configure-pks-users.html. Scroll down to Option 2: Connect through a Non-Ops ManagerMachine . Follow the instructions.

b. Complete the instructions for Step 2: Log in to PKS as a UAA Admin.

This step uses an existing admin client (a service account) to log in:

uaac token client get admin -s <client secret>

c. Complete Step 3: Assign Enterprise PKS Cluster Scopes.

Note: In UAA, roles are called scopes.

3. Create at least one uaa user and assign administrative roles.

The new usernames must follow naming conventions described here.

Note: These usernames will eventually become Streaming Data Platform users withadmin role. All Streaming Data Platform user names must conform to the referencedconventions.

The uaa admin user that you create must be granted either the pks.clusters.admin orpks.clusters.manage scope.

Here are examples of the commands to add a new user and to grant a scope to the user.

uaac user add lab-admin --emails [email protected] --password passworduaac member add pks.clusters.admin lab-admin

Installation

48 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 49: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Required tasks for UAA federationUAA federation is required for production deployments. For test and development deployments,you may skip UAA federation and this procedure.

About this task

The procedure performs two goals:

l Configures UAA to act as an OIDC provider for the Keycloak instance running in StreamingData Platform.

l Obtains the UAA certificate required for communication between UAA and Keycloak.

Procedure

1. Log into PKS Ops Manager with a UAA user account that has pks.clusters.admin role.

2. Click the Settings tab.

3. Scroll down in the list, and click UAA.

4. On the UAA configuration screen:

a. Ensure that Enable UAA as OIDC provider is checked.

b. Ensure that UAA OIDC Username Prefix is empty ( - ). Streaming Data Platform doesnot support prefixes.

c. Under Configure your UAA user account store..., choose either option:

l Choose Internal UAA to integrate with the PKS UAA store of users. With this option,all new users are added into UAA.

l Choose LDAP Server to integrate with an LDAP server.

d. If you choose LDAP Server, complete the LDAP server configuration fields that appear.For help, see the PKS UAA documentation.

e. Click Save.

5. On the Ops Manager banner, click the PKS API tab.

Obtain the following information on this tab:

a. Note the PKS API host name, stated on the page.

b. Download and save the certificate (a .pem file) from that page.

6. Use the information from the previous step to complete the keycloak: keycloak:egress: section in the configuration values file. See Prepare configuration values files onpage 36.

l For the cn: field, use the PKS API host name, substituting an asterisk for the firstcomponent in the hostname. For example, if the API hostname is pks-api.env1.local, the CN in the configuration values file is *.env1.local.

l For the pem: field, use the contents of the downloaded pem file. Copy the entiredownloaded certificate into the configuration values file.

7. You may log out of Ops Manager.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 49

Page 50: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Create the Kubernetes clusterThis procedure creates a Kubernetes cluster for Streaming Data Platform and defines a defaultStorageClass in the cluster.

About this task

Streaming Data Platform must run in its own Kubernetes cluster. This procedure creates thatcluster in the PKS environment.

A default StorageClass is a required feature for Streaming Data Platform. A default StorageClasssupports dynamic (on-demand) provisioning of storage volumes, eliminating the need for clusteradministrators to pre-provision storage.

PKS does not automatically define a default StorageClass for its clusters.

Procedure

1. Connect to the host machine where PKS is installed.

Login directly or start a ssh using Putty or similar tool.

The login user does not matter. However, the owner of the installation files will become theuser with default permission to run the installation scripts and tools. This user will own themanifests, the log files, and the tools. Ownership and permissions are platform-specific.System administrators can change ownership and permissions as needed to allow thecorrect set of users to run the installer tool and scripts.

For example, on a Linux system, if a user logs in as root and transfers the installation fileswith SCP or unzips the files, then all subsequent steps in this installation procedure and anysubsequent reconfigurations would require the root user or the use of sudo.

2. Log into PKS.

pks login -a pks-api.<server> -u <username> -p <password> --ca-cert <location-of-cert>

For example:

pks login -a pks-api.server.local -u lab-admin -p password --ca cert /var/folders/rv/xxxxxxxxxxx

3. Create a Kubernetes cluster.

Use the procedure here (in the Pivotal documentation) and the notes below to create aKubernetes cluster.

a. In the step that provides a user with access to the cluster you are creating, make note ofthe user credentials. You will need them later to access the cluster.

b. In the step that runs the pks create-cluster command, use the following hints todetermine appropriate parameter values:

l PLAN-NAME is the plan for your cluster. Run pks plans to list your available plans.The plans were created during PKS installation. You should see a list similar to thefollowing: .

n POC - This is a minimal plan for proof of concept and testing.

Installation

50 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 51: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

n SMALL - This is a small plan for getting started.

n MEDIUM - This plan has additional nodes for larger throughput.

l (Optional) WORKER-NODES is the number of worker nodes in the cluster. This valuewill depend on the Starter Pack that you have purchased.

l (Optional) NETWORK-PROFILE-NAME is the NSX-T network profile to use for thecluster. See Using Network Profiles (NSX-T Only) in the Pivotal documentation formore information.

An example cluster create command follows:

pks create-cluster cluster1 -e cluster1.myserver.local -p medium -n 3 [-wait]

The -wait flag waits for the cluster to be successfully created before returning control. Itmay take up to 20 minutes to create the cluster.

4. You can monitor the cluster creation with the following command if you did not use the -wait flag.

pks cluster cluster1

Wait until Last Action State is succeeded.

5. Add the cluster's master IP address into the /etc/hosts file on the installation machine.

The output from the previous step includes the following information:

Kubernetes Master Host: cluster1.myserver.localKubernetes Master IP: 10.247.113.44

a. Edit the etc/hosts file.

b. Add an entry for the master IP and host. For example:

10.247.113.44 cluster1.myserver.local

6. Get credentials for the cluster you just created:

pks get-credentials <cluster-name>

The pks get-credentials command also puts you into the cluster context.

7. Provision a default StorageClass.

You must define a default StorageClass in PKS. Use the procedure in the Pivotaldocumentation and the following notes to define a default StorageClass. Be sure to accessthe Pivotal documentation that matches your Pivotal release.

a. Download the StorageClass specification for vSphere. This assumes that your site isusing the Streaming Data Platform reference architecture, which includes vSphere.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 51

Page 52: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

b. Follow all steps in the PKS documentation to create a default StorageClass. For example,create a vsanDatastore.

Results

Streaming Data Platform creates all other required resources, including a persistent volume claim(PVC).

Set up a docker registryThis procedure configures a docker registry.

About this task

A container registry is required to hold images for Streaming Data Platform. The registry must beavailable to the Kubernetes cluster and to your local machine.

You can use any container registry. Our recommendation is Harbor, the docker registry that comeswith PKS. The following procedure assumes that you installed Harbor.

Procedure

1. Log in to the Harbor GUI and authenticate. See the PKS documentation for information.

2. Create a Project (the Harbor terminology for a repository) to hold Streaming Data Platformimages.

a. Click New Project.

b. Name the project.

c. Set the Access Level to Public.

d. Click OK.

3. Download the certificate for communicating with Harbor.

a. In the list of Projects, click your new project name.

b. On the project page, click the Repositories tab and then click Registry Certificate.

The certificate is downloaded to your system.

4. In your local environment, add the Harbor certificate to your docker trusted store.

For example, on Linux:

Create a directory for the Harbor domain under /etc/docker/certs.d/. Then copy thecertificate into the directory.

$ mkdir -p /etc/docker/certs.d/harbor.<install-server>.local/$ cp ~/nautilus/harbor-certs/ca-<install-server>.crt /etc/docker/certs.d/harbor.<install-server>.local/ca.crt

5. On your local machine, log in to docker using the docker CLI:

$ docker login <install-server>/<repository>

For example:

$ docker login docker.myserver.local/sdp

Installation

52 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 53: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Provide your Harbor credentials at the prompts.

Obtain installation filesThe Streaming Data Platform installation files are available from the Dell online support website.

Procedure

1. In a web browser, navigate to the following web page:

https://www.dell.com/support/home/product-support/product/streaming-data-platform-family/overview

Note: Log in using your Dell EMC support account.

2. Use links on the page to download the Streaming Data Platform installation files.

3. Copy or transfer the files to the installation host machine (the VM where PKS is installed).

Consider carefully the user name you are using for this step. The owner of the installationfiles will become the user with default permission to run the installation scripts and tools.This user will own the manifests, the log files, and the tools.

For example, on a Linux system, if a user logs in as root and transfers the installation fileswith SCP or unzips the files, then all subsequent steps in this installation chapter and anysubsequent reconfigurations would require the root user or the use of sudo.

Ownership and permissions are platform-specific. System administrators can changeownership and permissions as needed to allow the correct set of users to run the installertool and scripts.

Extract the installer toolThe installer distribution includes several executables for different operating systems.

Procedure

1. Unzip the decks-installer-<version>.zip file.

2. List the directory contents.

You will see distributions for different OS environments.

3. Create a link to the executable that is suitable for your working OS.

a. Name the link decks-install.

b. Set executable permissions on the link.

For example, on Linux:

$ link decks-install-linux-amd64 decks-install$ chmod +x decks-install

On Windows, create an alias.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 53

Page 54: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Push images into the registryThis procedure uploads the installation images into the docker registry.

About this task

The last step in this procedure (uploading images to the registry) can take some time (up to anhour) to complete.

Procedure

1. Navigate to the location where you extracted and created a link to the Streaming DataPlatform installer tool.

2. Configure the installer to use the registry:

$ ./decks-install config set registry <registry/repository>

where <registry/repository> identifies the registry server and the repository (Projectname in Harbor) you created for Streaming Data Platform images.

3. Verify the configured registry path:

$ ./decks-install config list

4. Push required images to the registry.

$ ./decks-install push --input <path/to/tar/images> [--ca-certs-dir <LOCAL_DIR_TO_CERTIFCATE>]

where:

l <path/to/tar/images> is the path to the decks-images-<version>.tar fileincluded in the original set of installation files. This is a separate file, not part of the zipfile that was extracted previously.

Note: Do not extract the contents from decks-images-<version>.tar. Theinstaller tool works directly with the .tar file.

l The optional --ca-certs-dir option injects a custom certificate into each image. Usethis option if your company requires a certificate bundle for security purposes withinyour internal network.

This push operation may take up to an hour to complete.

Remove SRS Gateway from the manifest (if needed)Most installations should skip this task. This task is intended only for dark sites or sites using anevaluation license. In those scenarios, an SRS Gateway is not deployed, and you must removeentries from the manifest that reference it.

Procedure

1. Navigate to the unzipped decks-install folder.

2. Edit manifests/production/kustomization.yaml.

Installation

54 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 55: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

3. Comment out the following two lines:

../serviceability-licensing

../serviceability

4. Save the file.

Run prereqs scriptThe prereqs.sh script ensures that your local environment and the Kubernetes cluster have allthe tools needed for a successful installation. The script also installs the required Tiller version intothe cluster.

About this task

Run this script before running the decks-install apply command the first time (or the firsttime on a new local machine). You can run the script at any time. It does the following:

l Checks your local environment for the required tools and versions of those tools.

l Checks the Streaming Data Platform cluster for a default storage class definition.

l Checks the Streaming Data Platform cluster for the required version of the Tiller server.Installs it if needed, with all required roles and permissions.

Note: The script installs Tiller v2.13.1. Other versions may be incompatible with StreamingData Platform.

Procedure

1. Navigate to the folder where you unzipped the decks-installer-<version>.zip file.

2. Run the script.

$ ./scripts/prereqs.sh

The script by default obtains the Tiller v2.13.1 files from its official GitHub repository.

If your site does not have Internet access, an alternate way to run this script obtains thefiles from the Streaming Data Platform images that you just loaded into the Docker registry.

$ ./scripts/prereqs.sh -tr=<registry/repository>

where registry identifies your docker registry and repository is the repository nameyou created for Streaming Data Platform. For example:

$ ./scripts/prereqs.sh -tr=harbor.myserver.local/library

3. Check the script output.

If it reports any errors about incorrect minimum versions of components, you must correctthe condition before proceeding with Streaming Data Platform installation.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 55

Page 56: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

After you finish

The script installs Tiller in the kube-system namespace. If you install it in a different namespace,the decks-install apply command must always include the --tiller-namespaceparameter to indicate the namespace.

Run pre-install scriptThe pre-install.sh script must be run one time before installing Streaming Data Platform.

About this task

This script creates credentials required for the internal communication of Streaming Data Platformcomponents. It creates a values.yaml file containing these credentials. This yaml file is arequired input to every execution of the decks-install apply command. The generated yamlfile must be listed as one of the values files in the --values parameter of decks-installapply.

Procedure

1. Navigate to the folder where you unzipped the decks-installer-<version>.zip file.

2. Run the pre-install.sh script.

$ ./scripts/pre-install.sh

3. Verify that the script ran successfully and that the values.yaml file exists.

l The output shows the pathname of the generated values.yaml file. It exists in adirectory called pre-install. For example, scripts/pre-install/values.yaml.

l The output also shows a passwd file which you may safely delete.

l The output initially shows results from Pravega indicating user names and passwordsthat will look unfamiliar. You can ignore this output. This script is, in essence, replacinghardcoded Pravega credentials with secure credentials.

4. Consider renaming the generated values.yaml file and moving it to the same locationwhere all the other configuration values files are stored.

For example, rename values.yaml to preinstall.yaml.

Results

The generated yaml file must be listed as one of the values files in the --values parameter ofdecks-install apply, along with all of your other configuration values files.

Run validate-values scriptThe validate-values.py script is part of the installation process. It should also be run beforeany reapply of the configuration values files.

About this task

This script reads the configuration values files provided to it and validates the values over certaincriteria. For example, it validates the values used for external connectivity and serviceability, inaddition to many other validations.

Installation

56 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 57: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Procedure

1. Navigate to the folder where you unzipped the decks-installer-<version>.zip file.

2. Run the script, providing file path names for all of the configuration values files that you planto use in the actual installation command.

Note the following:

l Separate the file path names with spaces.

l The yaml file generated by the pre-install script is required.

l When the same field is included in more than one of the values files, the value in theright-most file overrides the values in the left-most files in the list.

For example:

$ ./scripts/validate-values.py preinstall.yaml values.yaml

3. If the script indicates errors in your configuration values files, edit the files to correct theproblems and rerun the script.

Source control the configuration values filesWe recommend using your enterprise source control procedures to protect the configurationvalues files.

Access to the configuration values must be limited and protected for the following reasons:

l The values files are your record of your current configuration. To make adjustments to yourconfiguration, you will want to edit the current configuration values, making needed changes tothe current configuration.

Note: Values are not carried over internally. Every reapply of the configuration uses thevalues that are provided in the values files that you use in the command.

l A running record of changes that were made to the configuration might be useful for researchpurposes when you are fine-tuning some of the operational values.

l The values files may contain secrets.

Apply and synchronize the configurationThis procedure installs the Streaming Data Platform applications into the Kubernetes cluster.

Before you begin

The following items must be accessible to the installer.

Item Description

manifest bundle The manifest bundle is an artifact delivered in the root of the installerzip file, under manifests/. You can move it elsewhere if needed.

The files in the directory are Kubernetes Kustomize files. For an initialinstallation of Streaming Data Platform, you should not need to alterthe manifest bundle contents.

Helm charts The Helm chart directory is an artifact delivered in the root of theinstaller zip file, under charts/. You can move it elsewhere if needed.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 57

Page 58: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Item Description

For an initial installation of Streaming Data Platform, you should notneed to alter the Helm chart contents.

configuration valuesfiles

See Prepare configuration values files on page 36.

Procedure

1. If you are not already, log in to PKS as cluster-admin for the Streaming Data Platformcluster.

pks login -a pks-api.<server> -u <username> -p <password> --ca-cert <location-of-cert>

For example:

pks login -a pks-api.server.local -u lab-admin -p password --ca cert /var/folders/rv/xxxxxxxxxxx

2. Get credentials for the Streaming Data Platform cluster.

For the cluster-admin, this step also puts you in context of the cluster.

pks get-credentials <cluster-name>

3. If you are not already, log in to docker:

docker login

Follow the prompts for credentials.

4. Change directories to the location where you set up the link to the extracted installerexecutable.

5. Run the decks-install apply command.

$ ./decks-install apply --kustomize <path/to/manifest-bundle-directory/> --repo <path/to/charts/directory/> --values <path/to/preinstall.yaml>,<path/to/values.yaml[,path/to/additional/values/file]

The --values parameter accepts one or more yaml files in a comma-separated list with nospaces between the file names.

For example:

$ ./decks-install apply --kustomize ./manifests/ --repo ./charts/ --values preinstall.yaml,values.yaml

The decks-install apply command does the following:

Installation

58 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 59: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l Installs the applications and resources described in the manifest bundle into the cluster,in a pending state.

l By default, starts the synchronization process that reconciles the state of eachapplication to the desired final state according to the Helm charts.

See decks-install apply on page 168 for additional optional command options.

6. Monitor the command output to ensure that the synchronization process completessuccessfully.

The pods will cycle through running until all are completed. You may see some pods reporterrors for a short time. This is normal. Allow Kubernetes to handle restarts and let thesynchronization process cycle through.

7. When processing stops, verify that all pods report a status of Completed or Succeeded.

It is normal for the synchronization process to end before all pods are in the Completed orSucceeded state. If this occurs, you can continue synchronization using the followingcommand.

decks-install sync --kustomize <path/to/manifest-bundle-directory/> --repo <path/to/charts/directory/>

It is safe to run the decks-install sync command at any time.

Run post-install scriptRun this script after running the decks-install apply command.

About this task

This script confirms that your latest run of the decks-install apply command left the clusterin a healthy state. This script invokes the health check script.

Procedure

1. Navigate to the folder where you unzipped the decks-installer-<version>.zip file.

2. Change directories into /scripts, and run the script.

Note: You must run this script from within the scripts directory.

$ cd scripts$ post-install.sh

3. If the script indicates errors in your cluster, fix the issues reported, rerun decks-installapply, and then rerun this script.

What's nextSee the following sections of this guide for post-installation administrator responsibilities.

l User Connections describes how to obtain connection endpoints and log onto the UserInterface.

Installation

Dell EMC Streaming Data Platform Installation and Administration Guide 59

Page 60: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l Monitor Health on page 107 and Use Grafana Dashboards on page 113 describe various waysto make sure that Streaming Data Platform is healthy and efficient.

l Manage Projects, Scopes, and Users on page 85 describes how to create, maintain, anddelete users, projects, and scopes.

l Authentication and Authorization describes how to change default passwords, add new users,and configure access rights for users.

l Post-install Configuration and Maintenance on page 71 describes optional post-installationconfiguration tasks. It also describes how to change the configuration by either reapplying thevalues file or uninstalling and reinstalling.

Installation

60 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 61: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 4

Connections

This chapter describes Streaming Data Platform connection configurations and procedures. Itincludes information for administrators and for project members.

l Configure connections for users ...........................................................................................62l Web UI endpoints and logins................................................................................................. 64l kubectl logins........................................................................................................................ 66l User password changes.........................................................................................................68

Dell EMC Streaming Data Platform Installation and Administration Guide 61

Page 62: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Configure connections for usersThese topics are intended for administrators to configure resolvable connections for all users.

Configure connections to master nodeUse this procedure in production deployments to configure connections for kubectl.

About this task

Users of kubectl require connection to the Streaming Data Platform cluster's master node.

Procedure

1. Log in if needed. See Log in to kubectl for cluster-admins on page 66.

2. Get the IP address for the master node.

pks cluster <clustername>

3. Get the master node's host name from the configuration values file, in theglobal:external:host field.

This value is the top-level domain name (TLD) from the perspective of Streaming DataPlatform. The product UI is served off of https://<TLD>. The Grafana UI is served off ofhttps://grafana.<TLD>, and so on for the other endpoints.

For example, a TLD of xyz.desdp.example.com serves the UI off of https://xyz.desdp.example.com and Grafana off of https://grafana.xyz.desdp.example.com. The DNS server has authority to serve records for*.xyz.desdp.example.com.

Note: The master node TLD is not the same (must not be the same) as the cluster'sTLD. The cluster's TLD is returned by the pks cluster <cluster-name>command.

4. Add the routing entry to configure connection access.

Do either of the following :

Option Description

(Recommended) Usersadd the entry to /etc/hosts

Each user who needs access can add the master node's IP andhost name entry into their /etc/hosts file.Be sure to start each IP address on a new line, in position 1, inthe form ip hostname. For example:

10.247.114.85 xyz.desdp.example.com

On Windows, the location of the hosts file is: .

\C:Windows\System32\drivers\etc\hosts

Connections

62 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 63: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Option Description

On Windows, save the edited file, open a cmd window, and issuethe following command:

ipconfig /flushdns

Administrator adds theentry to the CorporateDNS

This process can be a difficult in some environments.

Configure connections to the local DNSUse this procedure to configure connections to endpoints in Streaming Data Platform.

About this task

User-facing endpoints include the Streaming Data Platform UI, the Grafana dashboards, theKeycloak Administration Console, and various entry points related to projects. For each analyticproject created in the platform, multiple project-related endpoints are generated.

As a prerequisite, a local DNS was set up specifically for resolving the above entry points fromexternal requests. Streaming Data Platform is configured to connect to this local DNS, and itmaintains the routing entries.

Users need to connect to the local DNS. If the local DNS was set up on vSphere specifically forStreaming Data Platform, some configuration is required to enable user connections to that DNS.

Procedure

1. Get the routing information for the local DNS server that was configured for Streaming DataPlatform.

2. Configure the environment so that users can reach the local DNS server.

Do either of the following:

Option Description

Administrator adds the entryto the Corporate DNS

This option is recommended as the most convenient for allusers.

Users configure the DNSconnection in their localenvironment.

Each user can configure the local DNS's IP and host nameas appropriate for their OS. For example, some Linux OS'suse a resolv.conf file for this purpose. Windowsenvironments may have specific network settings.

Alternative endpoint configuration for non-production deploymentsUse this procedure In development or test deployments to configure resolutions of externalrequests for endpoints in the Streaming Data Platform cluster.

About this task

Developers can add entries into their environment's etc/hosts file.

Procedure

1. Log into pks and set context to the Streaming Data Platform cluster.

Connections

Dell EMC Streaming Data Platform Installation and Administration Guide 63

Page 64: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

2. Get the information for etc/hosts.

The following command extracts the endpoints and their addresses in the correct format forthe hosts file.

$ kubectl get ing --all-namespaces | awk '{print $4" "$3}'

10.247.114.85 grafana.cluster1.desdp.example.com 10.247.114.85 pravega-controller.cluster1.desdp.example.com 10.247.114.85 pravega-controller-api.cluster1.desdp.example.com 10.247.114.85 keycloak.cluster1.desdp.example.com 10.247.114.85 cluster1.desdp.example.com 10.247.114.85 repo.project1.cluster1.desdp.example.com 10.247.114.85 flink1.project1.cluster1.desdp.example.com

In this example:

l desdp.example.com is the top-level domain name (tld) for the cluster. It wasspecified in the configuration values file, in the external:dns:host field.

l cluster1.desdp.example.com is the connection point for the UI. The name cluster1was specified in the configuration values file.

l The last two entries are related to a project named project1. Developers can choose theentries for the projects that they have authorizations to work in.

3. Edit your local local etc/hosts file, copying and pasting the IP and host information from theprevious step into the file.

When adding entries, be sure to start each IP address on a new line, in position 1.

Following are notes for Windows platforms.

l The file location on Windows is \C:Windows\System32\drivers\etc\hosts.

l After saving the file, you may need to open a cmd window and issue the ipconfig /flushdns command.

Web UI endpoints and loginsThe following topics describe how to list the Streaming Data Platform endpoint URLs and how tolog onto the UI.

Obtain connection URLsAuthorized cluster administrators can obtain the connection URLs using kubectl.

Before you begin

You must be a cluster administrator for the Streaming Data Platform cluster.

Procedure

1. Log in to kubectl as cluster admin. See Log in to kubectl for cluster-admins on page 66.

2. Configure DNS to connect to the master host IP in the cluster.

Configure the corporate DNS server with the IP and host name obtained in the previousstep.

Connections

64 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 65: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Note: Alternatively, you can ask each user to add this information into their local hostsfile.

3. List access points into the cluster. Run kubectl get ingress --all-namespaces tolist all access points into the cluster.

For example:

kubectl get ingress --all-namespacesNAMESPACE NAME HOSTS ADDRESS PORTS AGEmy-project my-flinkcluster my-flinkcluster.my-project.test-psk.nautilus-lab-test.com 10.240.125.14... 80, 443 6d my-project repo repo.my-project.test-psk.nautilus-lab-test.com 10.240.125.14... 80, 443 6d nautilus-pravega nautilus-pravega-grafana grafana.test-psk.nautilus-lab-test.com 10.240.125.14... 80, 443 8d nautilus-pravega pravega-controller pravega-controller.test-psk.nautilus-lab-test.com 10.240.125.14... 80 8d nautilus-pravega pravega-controller-api pravega-controller-api.test-psk.nautilus-lab-test.com 10.240.125.14... 80 8d nautilus-system keycloak keycloak.test-psk.nautilus-lab-test.com 10.240.125.14... 80, 443 8d nautilus-system nautilus-ui test-psk.nautilus-lab-test.com 10.240.125.14... 80, 443 8d

All of the values in the HOSTS column are valid access points for authorized users.

In the NAME column, locate nautilus-ui, and take note of the value in the HOSTScolumn. This is the URL for external connections to the User Interface, and is the valuespecified in the configuration values file.

For example, from the list above, users can connect from external locations with thefollowing URL:

https://test-psk.nautilus-lab-test.com

Connect and login to web UIThe Streaming Data Platform User Interface is a web interface, available for external connectionsover HTTPS.

Procedure

1. Type the URL of the Streaming Data Platform User Interface in a web browser.

Obtain the URL from your Platform Administrator or see Obtain connection URLs on page64.

The Streaming Data Platform login window appears.

2. Using one of the following credential sets, provide a Username and Password.

Connections

Dell EMC Streaming Data Platform Installation and Administration Guide 65

Page 66: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l If your platform administrator provided credentials, use those.

l For environments where an existing Identity Provider is integrated, use your enterprisecredentials.

3. Click Log In.

If your user name and password are valid, you are authenticated to Streaming DataPlatform. You will see one of the following windows:

If you see this window Explanation

The Dashboard section of theStreaming Data Platform UI.

The dashboard appears if the username hasauthorizations associated with it. It shows streams andmetrics that you are authorized to view.

A welcome message asking youto see your Administrator.

The welcome message appears if there are noauthorizations associated with the username. You areauthenticated but not authorized to see any data. Askan Administrator to provide authorizations to yourusername.

4. If you see the welcome message directing you to an Administrator, ask for one of thefollowing authorizations:

Purpose Authorization Request

View and administer all projects and streams. Platform Administrator

View or develop applications or streams within aproject.

Project member.

View or develop applications or streams within aproject that does not yet exist.

A Platform Administrator must createthe new project, and then add you asa member.

View or develop streams in a scope that isindependent of any project.

Scope member.

View or develop streams in a scope that isindependent of any project, and does not yet exist.

A Platform Administrator must createthe new scope, and then add you as amember.

kubectl loginsThese topics describe how to log in to Streaming Data Platform on the command line.

Log in to kubectl for cluster-adminsThe following login procedure is for Streaming Data Platform admin users to manage the cluster.

About this task

Note: A Streaming Data Platform admin user may successfully login using the non-admin loginprocedure described Log in to kubectl for non-admin users on page 67. However, that

Connections

66 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 67: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

procedure does not obtain the cluster-admin role binding; the user is prohibited fromcluster administrative tasks, such as listing pods in the cluster or creating new resources.

Procedure

1. Log in to PKS with a UAA account that has cluster-admin role.

pks login -a pks-api.<server> -u <username> -p <password> --ca-cert <location-of-cert>

For example:

pks login -a pks-api.server2.local -u lab-admin -p password --ca cert /var/folders/rv/xxxxxxxxxxx

2. Get credentials.

pks get-credentials <cluster-name>

The pks get-credentials command gets your kubeconfig file, sets context to thecluster, and performs the cluster-admin role binding.

You can now issue kubectl commands.

Log in to kubectl for non-admin usersThe following login procedure is for Streaming Data Platform project members to use kubectlcommands to manage their project resources.

Procedure

1. Log in to the cluster using your Streaming Data Platform account.

l The account is a UAA account if UAA federation is enabled.

l The account is a Keycloak account otherwise.

pks get-kubeconfig -u chrisx -p password <cluster-name> --skip-ssl-validation -a pks-api.<hostname>.local

That command logs into PKS, gets your kubeconfig file, and sets context to the cluster.

For example:

sdp-ui> pks get-kubeconfig -u chrisx -p password blue-blade --skip-ssl-validation -a pks-api.nightshift.local

Fetching kubeconfig for cluster blue-blade and user chrisx.You can now use the kubeconfig for user chrisx:$kubectl config use-context blue-blade

sdp-ui>

Connections

Dell EMC Streaming Data Platform Installation and Administration Guide 67

Page 68: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

User password changesThese topics describe how users change passwords assocaited with their Streaming Data Platformusernames.

Change password (UAA)Use this procedure to change a password or other profile attributes in UAA.

About this task

When UAA federation is enabled, password changes must be made in UAA.

The UAA credentials are used for authentication for all operations in the Streaming Data PlatformUI and kubectl.

Note: The password change is not recognized by Gradle. Developers who use Gradle tocheckin application artifacts to Maven should consider also changing their password inKeycloak to keep the password values aligned. There is a shadow account in Keycloak thatGradle uses. Checkins to Maven from the Streaming Data Platform UI are authenticated withUAA credentials.

Procedure

1. In a browser, go to the Cloud Foundry UAA UI.

2. On the sign in screen, type your user name and existing password, and click the Resetpassword link on the screen.

3. On the pop-up menu, click Account Settings.

4. On the Account Setting screen, click Change password.

5. Complete the next screen and save.

6. To verify the password change, log into the Streaming Data Platform UI. Be sure to click theUAA button on the login screen.

Change password (Keycloak)Use this procedure to change a password or other profile attributes in the local Keycloak system.

About this task

When UAA federation is not enabled, users should make password changes in Keycloak.

Procedure

1. Log in to the Streaming Data Platform UI using the username whose password or otherprofile attributes you want to change.

2. In the banner, click the User icon.

3. Verify that the username at the top of the menu is the username whose profile you want tochange.

Connections

68 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 69: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

4. Choose Edit Account.

5. To change the password, complete the password-related fields.

6. Edit other fields in the profile if needed.

7. Click Save.

8. To verify the password change:

a. Click the User icon and choose Logout.

b. Log back in using the new password.

Connections

Dell EMC Streaming Data Platform Installation and Administration Guide 69

Page 70: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Connections

70 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 71: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 5

Post-install Configuration and Maintenance

This chapter describes configuration procedures that are performed after installation. Thisincludes required procedures, optional procedures, software upgrades, and maintenance such aschanges to the installed configuration.

l Obtain default admin credentials........................................................................................... 72l Set up UAA federation...........................................................................................................73l Set up LDAP integration........................................................................................................ 74l Enable periodic telemetry upload to SRS ..............................................................................75l Add Pravega alerts to event collection ................................................................................. 76l Change applied configuration................................................................................................ 78l Uninstall applications.............................................................................................................79l Upgrade software.................................................................................................................. 81l Update the default password for SRS remote access ...........................................................83

Dell EMC Streaming Data Platform Installation and Administration Guide 71

Page 72: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Obtain default admin credentialsThe installation process creates two default administrator accounts.

About this task

The two accounts are:

Type User Name Description

Keycloak nautilusrealm administrator

desdp Authorized to create new Pravega scopes, newAnalytic Projects, and has wildcard access to allscopes and projects and all associated resourceswithin the Streaming Data Platform. Access tothose resources is granted for both theKubernetes cluster and the Streaming DataPlatform UI.Prevented from logging into the KeycloakAdministrator Console

Keycloak master realm

administrator

admin Authorized to log into the Keycloak AdministrationConsole and create new users in Keycloak.

In the configuration values file, either password values were configured for the above accounts, ornot. You can consult the owner of the configuration values files used during installation if you arenot sure. If not, passwords were automatically generated by the installer and inserted into secrets.

l If password values were specified in the configuration values file, use those values.

l If password values were not specified, use this procedure to obtain the secrets and extract thepasswords.

Procedure

1. Log in to the cluster on the kubectl command line, as described in Log in to kubectl forcluster-admins on page 66.

2. Obtain the auto-generated password for the desdp user in the nautilus realm:

kubectl get secret keycloak-desdp -n nautilus-system -o jsonpath='{.data.password}' | base64 --decode

3. Obtain the auto-generated password for the admin user in the master realm:

kubectl get secret keycloak-http -n nautilus-system -o jsonpath='{.data.password}' | base64 --decode

4. Verify that you can log into both the Keycloak Administrator Console and the StreamingData Platform UI.

See Obtain connection URLs on page 64.

5. (Optional) You may discard the two secrets that contain the passwords after you haveverified that you can log in to both Keycloak realms.

The two K8 secrets that contain the admin and desdp user passwords are only created onceat install time. Any modifications of the user accounts (such as changing their passwords,deleting them, or renaming them) and product upgrades do not update these secrets. They

Post-install Configuration and Maintenance

72 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 73: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

are only used as an initial means to retrieve the passwords for bootstrapping purposes. Youmay safely discard the secrets.

Set up UAA federationUAA federation is required for production deployments. For test and development deployments,you may skip this step.

Before you begin

1. All of the steps in Required tasks for UAA federation on page 49 must be completedsuccessfully.

2. All of the Streaming Data Platform installation steps must be completed. That is, the cluster isestablished and the desa-install apply command executed successfully.

3. The configuration values files used in the latest run of the decks-install apply commandmust have included the certificate information for the PKS UAA connection with Keycloak inthe keycloak:keycloak:egress section, as described in Prepare configuration values fileson page 36.

4. You must be able to log into PKS with a username that has cluster-admin role.

5. You must be able to log into Streaming Data Platform with a username that has admin role.The installed default admin user is acceptable.

About this task

This procedure establishes a trust between the Streaming Data Platform cluster's Keycloakinstance and the cluster's PKS UAA instance. It also adds the UAA cluster admin user as a shadowuser in Keycloak. The shadow user has admin role in the nautilus realm.

Procedure

1. Navigate to the unzipped location of the product's distribution files.

2. Locate the ./scripts/post-install-idp-federation/vars.env.template filein the distribution, and copy it to create a custom vars.env file. Save it to $HOME/desdp/idp.

3. Edit $HOME/desdp/idp/vars.env file, providing values for the variables contained withinit.

Comments in the file explain each variable and how to set the values.

4. Log into kubectl as cluster-admin. See Log in to kubectl for cluster-admins on page 66.

5. Get credentials for the Streaming Data Platform cluster.

6. Export the installer shell image from the distribution into your local image registry andlaunch an installer shell.

For example:

nautilus-dist $ export INSTALLER_SHELL_IMAGE=devops-repo.isus.emc.com:8116/nautilus/decks-installer:0.14nautilus-dist $ ./localshell.sh run

7. At the shell prompt, run federate.sh.

Post-install Configuration and Maintenance

Dell EMC Streaming Data Platform Installation and Administration Guide 73

Page 74: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

For example:

root@me: /opt/dellemc-stream-data-platform# ./federate.sh

This script does the following:

l In UAA, the script creates a Keycloak client.

l In Keycloak, the script configures UAA as an identity provider.

8. Connect to Streaming Data Platform. See Obtain connection URLs on page 64.

9. Verify that the login screen now contains the uaa button on the right side.

Set up LDAP integrationYou can integrate an existing enterprise identity provider into Streaming Data Platform.

Before you begin

Complete all steps to configure UAA federation for Streaming Data Platform. See Set up UAAfederation on page 73.

Procedure

1. Log into PKS Ops Manager.

2. Click the Settings tab.

3. Scroll down in the list, and click UAA.

4. On the UAA configuration screen:

a. Ensure that Enable UAA as OIDC provider is checked.

b. Under Configure your UAA user account store..., choose LDAP Server.

c. Complete the LDAP server configuration fields that appear. For help, see PKSdocumentation.

Post-install Configuration and Maintenance

74 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 75: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

d. Click Save.

5. Connect to Streaming Data Platform. See Obtain connection URLs on page 64.

6. Verify that an LDAP username can login and be authenticated.

The new user will not have authorization to view or do anything until it is made a member ofa scope or a project.

7. To log out, click the User icon on the right side of the Streaming Data Platform banner, andchoose Log out.

Enable periodic telemetry upload to SRSBy default, the Streaming Data Platform deployment does not upload any information to SRS. Youneed to manually enable this feature.

About this task

Read the following E-EULA before proceeding with the changes.

TELEMETRY SOFTWARE NOTICE

If you are acting on behalf of a U.S. Federal Government agency or if Customer has an expresswritten agreement in place stating that no remote support shall be performed for this machine,please stop attempting to enable the Software and contact your sales account representative.

By continuing to install this Software, you acknowledge that you understand the informationstated below and accept it.

Privacy

Dell, Inc and its group of companies may collect, use and share information, including limitedpersonal information from our customers in connection with the deployment of this telemetrysoftware ("Software"). We will collect limited personal data when you register the Software andprovide us with your contact details such as name, contact details and the company you work for.For more information on how we use your personal information, including how to exercise yourdata subject rights, please refer to our Dell Privacy Statement which is available online atwww.dell.com/learn/policies-privacy-country-specific-privacy-policy.

Telemetry Software

This Software gathers system information related to this machine, such as diagnostics,configurations, usage characteristics, performance, and deployment location (collectively,"System Data"), and it manages the remote access and the exchange of the System Data with DellInc. or its applicable subsidiaries (together, "Dell"). By using the Software, Customer consents toDell's connection to and remote access of the machine and acknowledges that Dell will use theSystem Data transmitted to Dell via the Software as follows ("Permitted Purposes"):

l remotely access the machine and Software to install, maintain, monitor, remotely support,receive alerts and notifications from, and change certain internal system parameters of thismachine and the Customer's environment, in fulfillment of applicable warranty and supportobligations;

l provide Customer with visibility to its actual usage and consumption patterns of the machine;

l utilize the System Data in connection with predictive analytics and usage intelligence toconsult with and assist Customer, directly or through a reseller, to optimize Customer's futureplanning activities and requirements; and

l "anonymize" (i.e., remove any reference to a specific Customer) and aggregate System Datawith that from machines of other Customers and use such data to develop and improveproducts.

Post-install Configuration and Maintenance

Dell EMC Streaming Data Platform Installation and Administration Guide 75

Page 76: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Customer may disable the Software at any time, in which case all the above activities will stop.Customer acknowledges that this will limit Dell's ability and obligations (if any) to support themachine.

The Software does not enable Dell or their service personnel to access, view, process, copy,modify, or handle Customer's business data stored on or in this machine. System Data does notinclude personally identifiable data relating to any individuals.

Procedure

1. Read the E-EULA above before proceeding with the changes.

2. Log in, if needed. See Log in to kubectl for cluster-admins on page 66.

3. Edit the Streaming Data Platform SRS configuration:

$ kubectl edit srsgateways streamingdata -n nautilus-system

4. In the editor, change the spec: section as follows:

From:

spec: configUpload: disable: true

To:

spec: configUpload: disable: false

5. In the editor, save the changes.

Results

Immediately, a new Kubernetes cronjob is created to run every 12 hours and upload configurationinformation to SRS. To verify:

$ kubectl get cronjobs -n nautilus-system NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGEmonitoring */10 * * * * False 0 9m33s 19hstreamingdata-config-upload * */12 * * * False 0 <none> 62m

Add Pravega alerts to event collectionYou may add the alerts that are generated by the Grafana Pravega Alerts Dashboard to the eventdata that is displayed on the Streaming Data Platform Events screen and to the data that isuploaded to Dell EMC by SRS.

Procedure

1. Log on to the Streaming Data Platform User Interface as an admin user.

2. On the Dashboard page, click the Pravega metrics link located above the top left corner ofthe screen.

The Grafana UI opens.

Post-install Configuration and Maintenance

76 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 77: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

3. At the top of the page, click the Home dropdown menu, and choose the Pravega AlertsDashboard.

4. From the Pravega Alerts Dashboard, click the Save Dashboard icon in the Grafana banner.

5. On the Save Dialog that appears, click Save.

6. On the Confirmation dialog that appears, click Override.

Note: The Override selection is required.

7. Wait about 10 to 20 seconds, until the color changes on the heart icons on each of thepanels in the dashboard. The colors change from white to green.

8. To verify rules creation, click the Alerting icon (bell-shaped icon) in the left banner of theGrafana window, and choose Alert Rules.

A list of alert rules appears, similar to the following.

9. To verify communication with SRS components, click the Notification Channels tab andmake sure that the kahm-notifier is listed.

This is the notifier that passes alerts to the event collecting service (kahm).

Results

If Pravega experiences unhealthy issues and alerts are generated based on these rules, those alertsare included on the System > Events screen in the Streaming Data Platform UI. The alerts are alsoincluded in the uploads to Dell EMC, if the SRS upload feature is enabled. See Enable periodictelemetry upload to SRS on page 75

Post-install Configuration and Maintenance

Dell EMC Streaming Data Platform Installation and Administration Guide 77

Page 78: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Change applied configurationSome configuration values can be changed after installation by changing and reapplying the valuesfiles.

Before you begin

l Consult with Dell EMC support personnel about the values that you want to change. Somevalues cannot or should not be changed using this procedure.

l You must be a PKS user with cluster-admin role on the Streaming Data Platform cluster.

l Although not technically required, typically you want to obtain the values files that were usedfor the last configuration and edit those files with the changes you want to make. Values arenever carried over from the existing configuration.

About this task

To change the current configuration, you run the decks-install apply command to reapplyedited configuration values to the cluster.

Every time you run the decks-install apply command, the entire configuration isreconstructed using the installer's default values and any override values you supply in the valuesfiles. If a value is not supplied in the specified configuration value files, the installer's default valuesare used.

Note: Values from the current configuration are not carried over into the new configuration.For this reason, the recommended way to change a configuration is to edit the previousversion of the configuration values files. By starting with the previous configuration values, youensure that all currently configured desirable values remain the same, and the only differencesin the new configuration will be the changes you are currently making in the file.

Note: If the original installation used multiple values files, be sure to specify all of the files inthis reapply procedure.

While it is possible to use kubectl or other Kubernetes tools to patch the resources running onthe cluster, this is not recommended. When you use tools outside of the installer and its values file,you have no record of the current configuration. The next decks-install apply commandoverrides whatever changes you made using other tools.

Kubernetes handles the reconfiguration. The administrator does not need to manually stop orrestart anything. The changed configuration is applied across the cluster as fast as the Kubernetesreconcile loop can apply it. The results may take some time to complete.

Depending on which values you change for which components, some services may be restarted forreconfiguration. As a result, there may be short outages. For example, if a configuration changecauses some Pravega components to restart, then Pravega stream ingestion could stop processingwhile the reconfiguration occurs.

Use the following procedure to change the configuration.

Procedure

1. Prepare the configuration files, remembering that the new configuration is entirelyreconstructed from the values files that you provide.

See Configuration values file reference on page 145 for configuration values descriptions.

2. If the script indicates errors in your configuration values file, edit the files to correct theproblems and rerun the script.

3. Run the validate-values.py script.

Post-install Configuration and Maintenance

78 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 79: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

a. Navigate to the folder where you unzipped the decks-installer-<version>.zipfile.

b. Run the script, providing file path names for all of the values files that you plan to use inthe decks-install apply command.

You are required to use the yaml file generated by the pre-install script. Separate the filepath names with spaces. When the same field is included in more than one of the valuesfiles, the value in the right-most file overrides the values in left-most files.

For example:

$ ./scripts/validate-values.py preinstall.yaml values.yaml

c. If the script indicates errors in your configuration values file, edit the files to correct theproblems and rerun the script.

4. Log in to the cluster. See Log in to kubectl for cluster-admins on page 66.

5. Run the decks-install apply command.

The ---values option points to the configuration values files. You are required to includethe yaml file that was generated by the pre-install script.

For example:

$ ./decks-install apply --kustomize ./manifests/ --repo ./charts/ --values pre-install.yaml,values.yaml

6. Run the post-install script.

a. Navigate to the folder where you unzipped the decks-installer-<version>.zipfile.

b. Run the script.

$ ./scripts/post-install.sh

c. If the script indicates errors in your cluster, fix the issues reported, rerun decks-install apply, and then rerun this script.

Uninstall applicationsUse the decks-install unapply command to uninstall specified platform applications andtheir associated resources from the Streaming Data Platform cluster. These are applicationsmentioned in the Kubernetes manifests.

Before you begin

Consult with Dell EMC support personnel about your intended outcomes before uninstallingapplications from the Streaming Data Platform cluster.

WARNING If you need to delete the Flink or Pravega application, be aware that existing Flinkor Pravega data will be marked for deletion as well.

Post-install Configuration and Maintenance

Dell EMC Streaming Data Platform Installation and Administration Guide 79

Page 80: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l If Flink is listed for removal, you must first delete all existing Flink Projects and associated Flinkapplications from the Streaming Data Platform Analytics page. Delete Projects first, and thendelete applications. To perform these deletions, use either of the following methods:

n The Streaming Data Platform user interface, navigate to the Analytics page and use theDelete buttons for authorized users, for Projects and for applications in the Flink clusters.

n The /scripts/project/clean_projects.sh script, supplied with the distribution,deletes all projects.

l If the Pravega application is listed for removal, be aware that existing streams in Pravega willnot be readable by a newly installed Pravega instance. Even if the nfs-client-provisioner.storageClass.archiveOnDelete setting is "true" in the currentconfiguration, the archived data will not be readable by a new installation of the Pravegaapplication.

About this task

The decks-install unapply command marks applications for removal from the Kubernetescluster, based on a specified manifest bundle. One reason to perform an unapply for an applicationis to prepare to reinstall it with a different set of configuration values.

To uninstall all Streaming Data Platform applications and resources from cluster, so that you canstart over with a new installation, use the decks-install unapply command with the samemanifest that was used for the installation.

Note: Only resources that were initially created by Streaming Data Platform are removed.Other resources are not affected by the uninstall procedure.

Procedure

1. Edit the manifest bundle.

To uninstall all Streaming Data Platform applications and resources from the cluster, so thatyou can start over with a new installation, specify the same manifest bundle that you usedwith the decks-install apply command.

To uninstall only a few selected applications, create a new manifest bundle, using theoriginal one as a guide. Also see https://kustomize.io/ for syntax requirements.

2. Run the decks-install unapply command.

$ ./decks-install unapply --kustomize <path/to/uninstall-manifest-bundle>

For example:

$ ./decks-install unapply --kustomize ./unapplymanifest/

The decks-install unapply command does the following:

l Marks applications and resources in the provided manifest bundle for deletion, in apending state.

l By default, starts the synchronization process, which reconciles the cluster to thedesired terminal state. An optional parameter can defer the synchronization.

See decks-install unapply on page 172 for optional command parameters.

3. Check to ensure that the synchronization completes successfully.

Post-install Configuration and Maintenance

80 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 81: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

4. If the synchronization procedure fails for whatever reason, use the following command tostart it again. It is safe to restart the synchronization procedure at any time.

$ ./decks-install sync --kustomize <path/to/uninstall-manifest-bundle>

Upgrade softwareThis procedure upgrades an existing deployment of Streaming Data Platform to a new version ofStreaming Data Platform software.

Procedure

1. Download the new distribution files from https://www.dell.com/support/home/product-support/product/streaming-data-platform-family/overview.

2. Download the release notes for the new version using a link on the site referenced in theprevious step.

3. Verify that your current version is appropriate for upgrade to the new release.

a. Log in, if needed. See Log in to kubectl for cluster-admins on page 66.

b. Get the version of your current deployment.

kubectl get configmaps global-metadata -n nautilus-system -o yaml

c. Check the release notes.

4. Unzip the new decks-installer-<new-version>.zip file into a different directoryfrom your original installer extraction.

5. Push the new software images to the Docker registry.

$ ./decks-install push --input <path/to/tar/images> [--ca-certs-dir <LOCAL_DIR_TO_CERTIFCATE>]

where:

l <path/to/tar/images> is the path to the decks-images-<version>.tar file.Note: Do not extract the contents from decks-images-<version>.tar. Theinstaller tool works directly with the .tar file.

l The optional --ca-certs-dir option injects a custom certificate into each image. Usethis option if your company requires a certificate bundle for your security purposeswithin your internal network.

This push operation may take up to an hour to complete.

6. Run the pre-upgrade script.

a. Navigate to the folder where you unzipped the newly downloaded decks-installer-<version>.zip file.

b. Change directories into /scripts, and run the script.

Post-install Configuration and Maintenance

Dell EMC Streaming Data Platform Installation and Administration Guide 81

Page 82: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Note: You must run this script from within the scripts directory.

$cd scripts$ pre-upgrade.sh

c. If the script indicates problems in the cluster, fix the issues reported and rerun this scriptbefore proceeding. You should not attempt to upgrade an unhealthy cluster.

Output should look similar to the following:

---- Running Pre-Upgrade Health Check ----Log file is present at ./health_check-2019-12-11-10-34-53.logStarting health check...- Checking pod health - Checking pod health for namespace : nautilus-system - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : catalog - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : nautilus-pravega - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts- Checking pravega cluster health - Pravega-cluster state is healthy- Check for failed helm deployments - No failed helm deployments were detected- Check Tier2 - Tier2 check succeeded

7. Run the decks-install apply command.

l Use the newer location for the manifests and charts.

l Use all of the same configuration values files that you used the last time you ran decks-install apply.

Here is an example:

$ ./decks-install apply --kustomize v1.2/manifests/ --repo v1.2/charts/ --values abc/preinstall.yaml,values2.yaml,values3.yaml

8. Run the post-upgrade script.

a. Navigate to the folder where you unzipped the decks-installer-<new-version>.zip file.

b. Change directories into /scripts, and run the script.

Post-install Configuration and Maintenance

82 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 83: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Note: You must run this script from within the scripts directory.

$cd scripts$ post-upgrade.sh

c. If the script indicates errors in your cluster, fix the issues reported, rerun the decks-install apply command, and rerun this script.

Update the default password for SRS remote accessUse this procedure to change the streaming-data service pod default password for Dellsupport services to use for remote access.

Procedure

1. Log in, if needed. See Log in to kubectl for cluster-admins on page 66.

2. Locate the streamingdata-remote-access service pod:

## kubectl get pod | grep streamingdata-remote-accessstreamingdata-remote-access-bd565d8ff-7fwfr 1/1 Running 0 3d2h

3. If the pod is running and configured, update the password:

## kubectl exec -it streamingdata-remote-access-bd565d8ff-7fwfr passwd rootChanging password for rootNew password: Retype password: passwd: password for root changed by root

4. Store the password in a secure location.

Dell EMC support services will need this password to remote access into the Streaming DataPlatform cluster to provide support.

Post-install Configuration and Maintenance

Dell EMC Streaming Data Platform Installation and Administration Guide 83

Page 84: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Post-install Configuration and Maintenance

84 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 85: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 6

Manage Projects, Scopes, and Users

Administrators create, delete, and configure the major platform resources. These are Flinkprojects, Pravega scopes, and platform users.

l Naming requirements............................................................................................................ 86l Manage projects....................................................................................................................86l Manage scopes and streams................................................................................................. 95l Manage users...................................................................................................................... 100

Dell EMC Streaming Data Platform Installation and Administration Guide 85

Page 86: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Naming requirementsThese requirements apply to user names and resource names in Streaming Data Platform.

User name and scope name requirements

User names and scope names must conform to the following Pravega naming conventions:

l The characters allowed in scope names and user names are: digits ( 0-9 ), lower case letters( a-z ), and hyphen ( - ).

l The names must start and end with an alphanumeric character (not a hyphen).

Note: These requirements affect the user names that will become project and scope membersor admins in Streaming Data Platform, regardless of the registry in which they are defined(Keycloak, UAA, or an LDAP database).

Other resource names

All other resource names must conform to Kubernetes naming conventions:

l The characters allowed in names are: digits ( 0-9 ), lower case letters ( a-z ), hyphen ( - ) ,and period ( . ).

l Names can be up to 253 characters long.

Manage projectsThis section defines the Streaming Data Platform concept of projects and describes administrativetasks for managing them.

About projects and clustersProjects are basic resources in Streaming Data Platform. Clusters are Apache Flink clustersdefined within projects.

Projects

All processing capabilities are contained within projects. Projects provide support for multipleteams working on the same platform, while isolating each team's resources from the others.Project members can collaborate in a secure way. Resources for each project (each team) aremanaged separately.

A Streaming Data Platform project is a Kubernetes custom resource of kind Project. TheProject resource is a Kubernetes namespace enhanced with the following resources andservices:

Resources andservices

Function

Maven repository Stores Flink job artifacts

Zookeeper cluster Allows fault tolerant Flink clusters

Project storage A persistent volume claim (PVC) to provide shared clusters betweenFlink clusters

Pravega credentials Allows analytic jobs to communicate with Pravega

Manage Projects, Scopes, and Users

86 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 87: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Resources andservices

Function

Pravega scope Represents a top level construct under which users can createstreams. The Pravega credentials are configured to have access tothis scope.

Analytic teams, consisting of developers and data analysts, work within projects. Each project hasits own Maven repo, its own set of Flink clusters for analytic processing, its own scope andstreams, and its own set of project members. Only Project members and (platform administrators)are authorized to view the assets in the project, access the streams, upload application artifacts,and run Apache Flink jobs. Project isolation is one way that Streaming Data Platform implementsdata protection and isolation of duties.

A platform administrator creates analytic projects, with either of the following methods:

l Using the web interface. This is the quickest and most convenient method.

l Using Kubernetes commands and a resource file. Use this method if the default configurationsemployed by the web interface do not satisfy the project team's needs.

A platform administrator also adds users to the project as Project members. Project membersupload application artifacts, configure and run Apache Flink jobs, and create and manage streams.

Flink clusters

A Flink cluster provides the compute capabilities for the Apache Flink jobs in a project. StreamingData Platform easily deploys fault tolerant Flink clusters within a project namespace.

In Streaming Data Platform, a Flink cluster is a custom Kubernetes resource added into a project. AProject can have one or multiple Flink clusters to process the project's applications.

A platform administrator or project member creates and manages Flink clusters, with either of thefollowing methods:

l Using the web interface. This is the quickest and most convenient method.

l Using Kubernetes commands and a resource file. Use this method if the default configurationsemployed by the web interface do not satisfy the project team's needs.

Create a projectUse this procedure to create a new project using the Streaming Data Platform web interface.

Procedure

1. Log in to the Streaming Data Platform UI with a username that has admin role.

2. Click the Analytics icon.

The Analytic Projects table appears.

3. Click Create Project at the top of the table.

4. In the Name field, type a name that conforms to Kubernetes naming conventions.

The project name is used for the following:

l Project name in Streaming Data Platform UI.

l The Kubernetes namespace for the project.

l A local Maven repository for hosting artifacts for applications defined in the project.

l The project-specific Pravega scope

l Security constructs that allow any Flink Applications created in the project to haveaccess to all the Pravega streams in the project-specific scope.

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 87

Page 88: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

5. In the Description field, optionally provide a short phrase to help identify the project.

The description appears on the Projects page, which lists all of the projects defined in thecluster.

6. In the Volume Size field, provide the size of the persistent volume claim (PVC) to create forthe project.

The default size is 10 GB.

7. Click Save.

The new project appears in the Analytic Projects table in a Deploying state. It may take afew minutes for the system to create the underlying resources for the project and changethe state to Ready.

Create a project manuallyUse this procedure to create a project on the command line. With this method, you can alter moreof the configuration settings.

About this task

In this procedure, you first create a new namespace and then you add the resource of kind Projectto that namespace. Project is a custom Kubernetes resource.

Note: The following rules are important:

l The names of the namespace and the Project resource must match.

l Only one Project resource can exist in a namespace.

Procedure

1. On the command line, log onto the Streaming Data Platform Kubernetes cluster asplatform_admin.

2. Create a new Kubernetes namespace, using the name you want to use for the project.

$> kubectl create namespace <project-name>

where <project-name> conforms to the Kubernetes naming conventions.

3. Create a yaml file that defines a new resource of kind Project.

a. Copy the following resource file as a template.

apiVersion: nautilus.dellemc.com/v1alpha1kind: Projectmetadata: name: <project-name>spec: maven: persistentVolumeClaim: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: nfs

storage: persistentVolumeClaim: accessModes:

Manage Projects, Scopes, and Users

88 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 89: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

- ReadWriteMany resources: requests: storage: 10Gi storageClassName: nfs

zookeeper: size: 3

b. Edit the following values in the yaml file.

Label Description

apiVersion: The Streaming Data Platform API version.

metadata:name: The name that you assigned to the namespace. This nameand the namespace name must be the same.

spec:maven:persistentVolumeClaim:resources:requests:storage:

The size of PVC storage for the Maven repository that willhold artifacts related to this project.The default used if you create a Project on the web UI is10Gi.

Increase this value if a large amount of artifacts areexpected for this project.

spec:storage:persistentVolumeClaim:resources:requests:storage:

The size of PVC storage for shared storage between allFlink clusters that will be created in the project. This spacestores all checkpoints and savepoints. Configure this valuebased on the expected Flink state size.The default used if you create a Project on the web UI is10Gi.

zookeeper:size: The number of nodes in the Zookeeper cluster. Zookeeperis used by all Flink clusters to provide high availability.Under typical conditions, a setting of 3 is sufficient .

c. Check that the syntax is valid yaml and save the file.

4. Apply the resource file.

$> kubectl create -n <project-name> -f <file-name>.yaml

5. Check the project for readiness.

$> kubectl describe Project <project-name> -n <project-name>

The output is similar to the following. The Status:Ready: flag changes to true when theproject resource is ready for use. This may take several minutes while the frameworkprepares all of the supporting infrastructure for the project.

$> kubectl describe Project myproject -n myproject Name: myproject Namespace: myproject Labels: <none> Annotations: <none>

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 89

Page 90: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

API Version: nautilus.dellemc.com/v1alpha1 Kind: Project Metadata:......... Status: Ready: true Status: Maven: true Zookeeper: true Events: <none>

Add or remove project membersDevelopers and data analysts must be members of a project to create, modify, or view the streamsand applications within the project.

Before you begin

The username that will be added to the project must already exist as a Streaming Data Platformusername.

About this task

The admin users can always access all projects.Note: Never add admin users as project members.

Procedure

1. Log in to the Streaming Data Platform UI with a username that has admin role.

2. Navigate to Analytics > <project_name> > Members.

A table of existing project members appears, with a textbox for entering a new username inthe table header.

3. To add a member, type a new member's username in the Username textbox, and click AddMember.

Note: Do not add admin users as project members.

The username appears in the refreshed table of members.

4. To remove a member, locate the member name in the table, and click Remove in that row.

List projects and view project contentsAdministrators can view summary information about all projects. Other users can view informationonly about the projects of which they are members.

Procedure

1. Click the Analytics icon.

The project table lists the projects that your logon user credentials give you permission toview.

2. Click a project name to drill into that project.

Manage Projects, Scopes, and Users

90 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 91: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The project dashboard appears.

The dashboard shows the following information:

l Number of Flink clusters defined

l Number of task slots used/number of task slots available

l Number of applications defined in the project

l Number of application artifacts uploaded within the project

l (for admins only) Number of members assigned to the project

l Number of streams associated to the project

l Number of Flink jobs running, finished, cancelled, and failed

Under the icons, a Messages section shows Kubernetes event messages pertaining to theproject, if any are available.

Note: Kubernetes events are short-lived. They only remain for one hour.

3. To drill further into aspects of the project, click the tabs along the top of the dashboard:

Tab Description

Flink Clusters View the list of clusters defined in the project.If you are an administrator, click this tab to create a new cluster, edit acluster configuration, or delete a cluster.

Apps View the list of applications defined in the project. Administrators andproject members can use this tab to create new applications, removeapplications, and edit application configurations. and update artifacts.

Artifacts View artifacts uploaded in the project's Maven repository. Administratorsand project members can upload, update, and remove artifacts.

Members Administrators can view project members, add members, and removemembers.For users logged in as a project member, this tab does not appear on thescreen.

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 91

Page 92: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Create Flink clusterUse this procedure to create a Flink cluster within a project.

Before you begin

l The project must already exist.

l Determine if the cluster requires a custom configuration. If so, a project member can create aFlink cluster using the Kubernetes command line.

l For more about Flink clusters, see Dell EMC Streaming Data Platform Developer's Guide at https://www.dellemc.com/en-us/collaterals/unauth/technical-guides-support-information/2020/01/docu96951.pdf.

About this task

Custom configurations are required in these situations:

Custom Flink images for processing

The project developers may want to process their applications using a custom Flink image.Streaming Data Platform supports 1.6.1, 1.6.2, and 1.6.3 out of the box. Any other image wouldbe considered custom.

The supported versions are prebuilt Apache Flink images that come with Streaming DataPlatform. They refer to standard Apache Flink images enhanced with custom classpath Jarfiles to provide seamless integration with Pravega Streaming Storage.

Custom labels for scheduling

The project developers may want to use Flink cluster labels to assign jobs to specific Flinkclusters for processing. Streaming Data Platform uses the Flink image version number toassign the cluster to a job. For more flexibility, developers can incorporate labels.

Procedure

1. Log in to the Streaming Data Platform UI with a username that has admin role or is amember of the project for which you are adding a Flink cluster.

2. Navigate to Analytics > project-name.

3. Click Create Flink Cluster.

4. Complete the cluster configuration screen. The following attributes are configurable.

Section Attribute Description

General Name Type a name for the cluster. The name mustconform to Kubernetes naming conventions.

Flink Version Choose the Flink image version that this cluster willsupport.Streaming Data Platform assigns Flink jobs to acluster for processing based on a matching imageversion number. This means the following:

l To support multiple Flink versions in the project,you need multiple Flink clusters, each configuredwith a different Flink image.

l If all applications in the project use the sameFlink version, you are limited to one Flink cluster

Manage Projects, Scopes, and Users

92 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 93: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Section Attribute Description

for processing, unless the project developersuse custom labels for scheduling.

See the introduction to this procedure for moreinformation.

Task Manager Number of Replicas Enter the number of Apache Flink Task Managersyou want in the cluster.A standalone Flink cluster consists of at least oneJobManager (the master process) and one or moreTask Managers (worker processes) that run on oneor more machines. You can configure multiple taskmanagers here.

Number of TaskSlots

Enter the number of task slots per Task Manager(at least one). Each Task Manager is a JVM processthat can execute one or more subtasks in separatethreads. The number of task slots controls howmany tasks a Task Manager accepts.

Memory Configure memory in MB.

Heap Configure heap in MB.

CPU Configure the number of cores to assign forprocessing in this Flink cluster.

Local Storage Volume Size Configure the local storage size.

Note:

For more information about Flink Task Managers and Task Slots, see the Apache Flinkdocumentation here.

5. Click Save.

In the Flink Cluster view that appears, the cluster State is initially Deploying. After a fewseconds, the State changes to Ready.

Results

The cluster is now ready for project members to create applications and upload related artifacts.

Edit Flink cluster attributesYou can change the number of replicas (Apache Flink Task Managers) in a Flink cluster.

Procedure

1. Log in to the Streaming Data Platform UI with a username that has admin role or is amember of the cluster's project.

2. Click Analytics > project-name > Flink Clusters.

3. Locate the cluster and click Edit in the row's Action column.

4. Edit the Number of Replicas. This is the only Flink Cluster attribute that you can change onthe UI.

5. Click Save.

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 93

Page 94: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Delete a clusterYou can delete a cluster and then define a new one within a project without having to remove anyapplications.

Before you begin

l Check for impact on applications that are currently running in the cluster.

About this task

Deleting a cluster does not delete any Flink applications associated with that cluster. If you deletethe cluster while applications are in a Running state, processing stops for those applications andtheir state changes to Scheduling. When the cluster is replaced, the Running state resumes.

Procedure

1. Log in to the Streaming Data Platform UI with a username that has admin role or is amember of the cluster's project.

2. Navigate to Analytics > project-name.

3. On the project dashboard, click the Flink Clusters tab.

4. Identify the row for the cluster you want to delete, and click Delete in the Action column.

A popup menu presents two buttons: Delete and Cancel.

5. Confirm that you want to delete the cluster by clicking Delete.

Click Cancel to keep the cluster.

What's next with projectsA project is created. This section describes what happens next.

Depending on their role in the organization, users have various interests and responsibilitiestowards projects.

l Administrators maintain the project's member list. A project team usually consists ofdevelopers and data analysts. Platform administrators have access to all projects by default.

l Administrators should also monitor resources associated with application processing andstream storage. They may also monitor stream ingestion and scaling.

l Developers typically upload their application artifacts, choose or create the required streamsassociated with the project, and run and monitor applications. They may monitor streams aswell.

l Data analysts may run and monitor applications. They may also need to monitor or analyzemetrics for the streams in the project.

More information

The Dell EMC Streaming Data Platform Developer's Guide at https://www.dellemc.com/en-us/collaterals/unauth/technical-guides-support-information/2020/01/docu96951.pdf describes howto add Flink applications to Streaming Data Platform, associate streams to applications, start,stop, restart and monitor applications. It also describes how to use the embedded Flink UI.

Monitor Health on page 107 and Use Grafana Dashboards on page 113 describe administrativetasks for ensuring that adequate storage and processing resources are available to handle streamvolume and analytic jobs.

Manage Projects, Scopes, and Users

94 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 95: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Manage scopes and streamsThis section defines Pravega scopes and streams and describes administrative tasks for managingthem.

About scopes and streamsScopes and streams are Pravega constructs that administrators must define and configure inStreaming Data Platform.

About scopes

A Pravega scope provides a namespace for a collection of streams. A stream's full name consistsof scope-name/stream-name. Therefore, a stream name must be unique within its scope.

Operationally in Streaming Data Platform, a scope identifies a Kubernetes service-instance inaddition to the Pravega scope. When you create a scope in Streaming Data Platform, a newservice-instance of the same name is also created. The service-instance provides the means foraccess control to the scope and its streams through KeyCloak.

About streams

A Pravega stream is an unbounded stream of bytes or stream of events. Pravega writer REST APIswrite the streaming data to the Pravega store.

Before running applications that write or read a stream in Streaming Data Platform, the streammust be defined and configured in the system. , These functions are accomplished by creating astream in the UI. Users with platform_admin role or project members can create streams in the UI.

Creating scopes and streams

The Streaming Data Platform UI supports two ways to create new scopes and streams:

l Create scopes and streams in the Pravega section of the UI:

n Users with platform_admin role can create new scopes. Each scope has a list of members(users with access rights to the scope) associated with it.

n Users with platform_admin role can create new streams within these scopes. Theconfigured scope members have access rights to all of the streams in the scope.

n Scope members who are also project members can then select these predefined streamswhen defining applications. The selection occurs in the Analytics section of the UI.

n Streams defined this way may be referenced by applications in a project.

l Create scopes and streams in the Analytics section of the UI:

n Users with platform_admin role create projects. The system automatically creates a newscope for each new project. The scope name is the same as the project name. Scopescreated this way automatically appear in the list of scopes in the Pravega section.

n Within a project, a platform_admin adds members to the project and creates Flink clusters.Then project members define applications and associate streams to the application. Theycan select existing streams or define new ones for the project. Streams defined this wayautomatically appear in the list of streams under the project scope in the Pravega section.

n By definition, streams defined this way (within a project-specific scope) can only bereferenced by applications in that project.

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 95

Page 96: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Stream access rulesStreams are protected resources. Access to streams is protected by scope membership or projectmembership, as follows:

l Scope membership — If the scope was created in the Pravega section of the UI, it has a list ofmembers associated with it. Only those members and all platform_admin users can access thestreams in the scope.

l Project membership — If the scope was created under a Project and application in theAnalytics section of the UI, it has Project members associated with it. Only project membersand admin users can access the streams in the scope.

By access, we mean permission to perform the following activities:

l Adding a stream to an application in the Analytics section of the UI

l Viewing stream metrics on dashboards

l Editing stream attributes, deleting a stream, or defining new streams

l Adding owners to the scope or deleting a scope

Create a scope independent of projectsUse this procedure to define a scope that is not associated with an analytic project.

Before you begin

Log into Streaming Data Platform as a user with platform_admin role.

About this task

Procedure

1. Click the Pravega icon in the banner.

A list of existing scopes appears.

2. Click Create Scope.

3. In the Name field, type a name that conforms to Kubernetes naming conventions.

4. Click Save.

The new scope appears in the Pravega Scopes table. The new entry includes the followingavailable actions:

l Edit — Use this action to add or remove members of the scope.

l Delete — Use this action to remove the Pravega scope and its service-instance from thesystem.

Add or remove scope membersUse this procedure to edit the list of members associated with a scope.

Before you begin

The username that will be added to the scope must already exist as a Streaming Data Platformusername.

About this task

Scope members are non-administrative users who have access rights to the scope. This procedureapplies only to scopes that are independent of projects. Access to project scopes is controlled byproject membership.

Manage Projects, Scopes, and Users

96 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 97: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Procedure

1. Log in to the Streaming Data Platform UI with a username that has admin role.

2. Click the Pravega icon.

3. To add a member to the scope, locate the scope in the list, click Edit in its row, and providethe username.

Scopes that are associated with projects do not have an Edit action.

4. To remove a member from the scope, locate the username in the list and click Remove in itsrow.

Create a stream independent of projectsUse this procedure to define a stream that is not associated with an analytic project.

Procedure

1. Log in to the Streaming Data Platform UI with a username that has admin role.

2. Click the Pravega icon in the banner.

A list of existing scopes appears.

3. Click a scope.

A list of existing streams in that scope appears.

4. Click Create Stream.

5. Complete the configuration screen as described in Stream configuration attributes on page97.

Red boxes indicate errors.

6. When there are no red boxes, click Save.

The new stream appears in the Pravega Streams table. The new entry includes thefollowing available actions:

l Edit — Use this action to change the stream configuration.

l Delete — Use this action to remove the stream from the scope.

Stream configuration attributesThe following tables describe stream configuration attributes, including segment scaling attributesand retention policy.

General

Property Description

Name Identifies the stream. The name must be unique within the scope and conform to Kubernetes naming conventions. The stream's identity is:

scopename/streamname

Scope The scope field is preset based on the scope you selected on the previous screenand cannot be changed.

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 97

Page 98: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Segment Scaling

A stream is divided into segments for processing efficiency. Segment scaling controls the numberof segments used to process a stream.

There are two scaling types:

l Dynamic — With Dynamic scaling, the system determines when to split and merge segmentsfor optimal performance. Choose dynamic scaling if you expect the incoming data flow to varysignificantly over time. This option lets the system automatically create additional segmentswhen data flow increases and to decrease the number of segments when the data flow slowsdown.

l Static — In Static scaling, the number of segments is always the configured value. Choosestatic scaling if you expect a uniform incoming data flow.

You can edit any of the segment scaling attributes at any time. It takes some minutes for changesto affect segment processing. Scaling is based on recent averages over various time spans, withcool down periods built in.

Scalingtype

Scalingattributes

Description

Dynamic Trigger Choose one of the following as the trigger for scaling action:

l Incoming Data Rate — Looks at incoming bytes todetermine when segments need splitting or merging.

l Incoming Event Rate — Looks at incoming events todetermine when segments need splitting or merging.

Minimumnumber ofsegments

The minimum number of segments to maintain for the stream.

SegmentTarget Rate

Sets a target processing rate for each segment in the stream.

l When the incoming rate for a segment consistentlyexceeds the specified target, the segment is consideredhot, and it is split into multiple segments.

l When the incoming rate for a segment is consistentlylower than the specified target, the segment isconsidered cold, and it is merged with its neighbor.

Specify the rate as an integer. The unit of measure isdetermined by the trigger choice.

l MB/sec when Trigger is Incoming Data Rate. Typicalsegment target rates are between 20 and 100 MB/sec.You can refine your target rate after performance testing.

l Events/sec when Trigger is Incoming Event Rate.Settings would depend on the size of your events,calculated with the MB/sec guidelines above in mind.

To figure out an optimal segment target rate (either MB/secor events/sec), consider the needs of the Pravega writer andreader applications.

l For writers, you can start with a setting and watchlatency metrics to make adjustments.

l For readers, consider how fast an individual reader threadcan process the events in a single stream. If individual

Manage Projects, Scopes, and Users

98 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 99: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Scalingtype

Scalingattributes

Description

readers are slow and you need many of them to workconcurrently, you want enough segments so that eachreader can own a segment. In this case, you need to lowerthe segment target rate, basing it on the reader rate, andnot on the capability of Pravega. Be aware that the actualrate in a segment may exceed the target rate by 50% inthe worst case.

Scaling Factor Specifies how many colder segments to create when splittinga hot segment.Scaling factor should be 2 in nearly all cases. The onlyexception would be if the event rate can increase 4 times ormore in 10 minutes. In that case, a scaling factor of 4 mightwork better. A value higher than 2 should only be enteredafter performance testing shows problems.

Static Number ofsegments

Sets the number of segments for the stream. The number ofsegments used for processing the stream will not change overtime, unless you edit this attribute. The value can beincreased and decreased at any time.We recommend starting with 1 segment and increasing onlywhen the segment write rate is too high.

Retention Policy

The toggle button at the beginning of the Retention Policy section turns retention policy On orOff. It is Off by default.

l Off — (Default) The system retains stream data indefinitely.

l On — The system discards data from the stream automatically, based on either time or size.

Retention Type Attribute Description

Retention Time(Days

Days The number of days to retain data. Stream dataolder than Days is discarded.

Retention Size MBytes The number of MBytes to retain. The remainderat the older end of the stream is discarded.

Start and stop stream ingestionStream ingestion is controlled by native Pravega applications or Flink applications.

The Streaming Data Platform UI creates and deletes scope and stream entities, and monitorsvarious aspects of streams. The UI does not control stream ingestion.

Monitor stream ingestionYou can monitor performance of stream ingestion and storage statistics using the Pravega streampage in the Streaming Data Platform UI.

Procedure

1. Log into the Streaming Data Platform UI as a user with admin role or as a member of thescope or project whose streams you want to monitor.

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 99

Page 100: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

2. Navigate to Pravega > <scope-name> > <stream-name>.

This page shows:

l Ingestion rates

l General stream parameter settings

l Segment heat charts show segments that are hotter or colder than the current triggerrate for segment splits. For streams with a fixed scaling policy, the colors on the heatchart can indicate ingestion rates. The redder the segment is, the higher the ingestionrate.

Manage usersAdministrators manage appropriate user access to Streaming Data Platform and data.

Summary of ways to add a new userThe method for adding a new user depends on whether UAA federation is enabled. UAA federationis required in production deployments.

UAA federation? How to add a new user

Enabled Choose one of the following ways to add a new user.

l On the Pivotal UI, use the UAA page.

l On the command line, use the UAA CLI (uaac).

l If UAA is configured to use an LDAP provider, add the new userinto the LDAP database.

Not enabled Without federation, different actions are required to provide accessto the UI and the kubectl command line.

l For UI access, add a local user to Keycloak using the Keycloakdashboard.

l For kubectl command line access, add the new local user withthe kcadm command line tool.

Add a new user into UAA using the Pivotal UICluster administrators can create new usernames in UAA.

About this task

Procedure

1. Log onto the Pivotal UI as a user with uaa_admin role.

2. In the left column, click UAA.

3. Click Add User and complete the form, including the creation of an initial password.

Note: The user name must conform to Kubernetes and Pravega naming requirements asdescribed in Naming requirements on page 86.

Note: Users can change their passwords later.

Manage Projects, Scopes, and Users

100 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 101: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

4. To verify the new user: Click Register or Add at the end of the form.

You are immediately logged in and authenticated as the new user. A message appears askingyou to see your System Administrator for authorization (role) assignments.

a. Enter the Streaming Data Platform UI endpoint in your browser. login screen, UI as thenew user

b. On the login screen that appears, click the UAA button.

c. Login using the new username and password.

d. Verify that a message appears stating that the user is authenticated and should see theadministrator for permissions to view data.

e. Click the User icon in the upper right and choose Logout.

Results

The new username exists in the PKS UAA registry. However, an admin can see the new user on theKeycloak dashboard.

The new user can authenticate but does not have any authorizations to view or perform anyactions in the platform.

Add new user into UAA on the command lineWith UAA federations, administrators can add new UAA users on the command line using the UAACLI (uaac).

Before you begin

l You must have downloaded the UAA CLI.

l You must have cluster-admin credentials.

Procedure

1. Log in to kubectl. See Log in to kubectl for cluster-admins on page 66.

2. Create a new UAA user.

Use the uaac user add command to create a new username. The user name mustconform to Kubernetes and Pravega naming requirements as described in Namingrequirements on page 86.

For example:

uaac user add lab-admin --emails [email protected] --password password

The new user is visible in the UAA registry.

3. To assign roles on the command line, see Assign roles using uaac on page 104.

Add new local user on the Keycloak UIWithout UAA federation, administrators must add users who need access to the Streaming DataPlatform UI into Keycloak. Use the cluster's Keycloak dashboard.

Procedure

1. In a browser window, go to the Keycloak endpoint in the Streaming Data Platform cluster.

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 101

Page 102: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

If the Streaming Data Platform UI is open, you can prepend keycloak. to the UI endpoint.For example, http://keycloak.sdp.lab.myserver.com. Otherwise, see Obtainconnection URLs on page 64.

2. Click Administration Console.

3. Log in using the Keycloak administrator username (admin) and password.

See Obtain default admin credentials on page 72.

4. Click Manage > Users.

5. On the Users screen, click Add User on the right.

6. Complete the form.

Note: The user name must conform to Kubernetes and Pravega naming requirements asdescribed in Naming requirements on page 86.

7. Optionally click the Credentials tab to create a simple initial password for the new user.

Create a new password. Keep Temporary enabled to prompt the user to change thepassword on the next login.

8. To authorize the new user to perform actions and see data, an admin can make the user amember of projects or scopes.

Add new local user on the Keycloak command lineWithout UAA federation, administrators must add users who need command line access to theStreaming Data Platform Kubernetes cluster into Keycloak. Use the Keycloak CLI (kcadm).

Procedure

1. Log in to kubectl. See Log in to kubectl for cluster-admins on page 66.

2. Authenticate to Keycloak as an administrator:

kcadm.sh config credentials --server $CLUSTER_KEYCLOAK/auth --realm realm-name --user <user-name> --password password

where:

l $CLUSTER_KEYCLOAK is set to the endpoint for the KeyCloak instance running in theStreaming Data Platform cluster.

l <realm-name> is the main Streaming Data Platform realm. The default value used inthe installer is nautilus.

l <user-name> is a Streaming Data Platform user with admin role.

l <password> is the password for the admin user.

For example:

kcadm.sh config credentials --server $CLUSTER_KEYCLOAK/auth --realm nautilus --user nautilus --password password

3. Use the kcadm tool to create a new user.

Manage Projects, Scopes, and Users

102 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 103: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Enter the following to see the syntax of commands for this tool.

kcadm.sh create

Note: The user name must conform to Kubernetes and Pravega naming requirements asdescribed in Naming requirements on page 86.

Note: Users can change their passwords later.

The new user will be visible in Keycloak.

4. To assign roles, see Assign roles using kcadm on page 105.

Assign rolesAdministrators assign authorization roles to users through actions on the Streaming Data PlatformUI or on the command line.

A new username can authenticate immediately, but needs administrator action to gainauthorizations to view or perform any actions in Streaming Data Platform. The following tableshows how an administrator assigns roles to users.

UAA federation? How to assign a role to a user

Enabled Choose either of the following ways to assign roles. Either wayprovides authorizations that are effective for UI and kubectlaccess.

l Use the Streaming Data Platform UI, click the Add memberbuttons on Project or Scope screens, and then add theusername to the member list. The user will have access toproject or scope resources on the UI and the command line.

l Use uaac commands to assign project or scope roles.

Not enabled Without federation, different actions are required to provideauthorizations that are effective for the UI and for the kubectlcommand line.

l For UI access, use the Add member buttons on the Projectand Scope screens in the Streaming Data Platform UI.

l For kubectl command line access, use the kcadm.sh tool to

assign roles to users.

Assign roles on the Streaming Data Platform UIWhen UAA federation is enabled, this set of procedures provides access to projects and scopes onboth the UI and the kubectl command line. When UAA federation is not enabled, this set ofprocedures provides access to projects and scopes on the UI only.

User request Role required Procedure on the UI

Ability to view or developapplications or streams in anexisting project.

Project member. Add or remove projectmembers on page 90

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 103

Page 104: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

User request Role required Procedure on the UI

Ability to view or developapplications or streams for anew project.

Create the new project, andthen add members to it.

Create a project on page 87

Ability to view or developstreams in an existing scopethat is independent of anyproject.

Scope member. Add or remove scopemembers on page 96

Ability to view or developstreams in a new scope that isindependent of any project.

Create the new scope, andthen add members to it.

Create a scope independentof projects on page 96

Ability to view and administerall projects and streams.

Administrator Not available on the UI. See Assign admin role on page106.

Why not use the KeyCloak UI for role assignments?

Although it is possible for administrators to perform role assignments with the KeyCloak UI, thatmethod is not appropriate for the following reasons:

l Role assignments performed on the KeyCloak interface enable authorizations for activities onthe Streaming Data Platform UI only. The authorizations do not extend to actions beyond theUI, such as using kubectl on the cluster or running applications that use the API.

l Role assignments performed as described above, using the Streaming Data Platform UI, or, inthe case of authorizing a new platform administrator, in kubectl, apply to all interactions withthe platform. This provides a more consistent experience for administrators and developers.

l The procedures listed n the table above are the least error prone and easiest authorizationmethods.

Administrators may use the KeyCloak UI to verify role assignments and troubleshoot authorizationproblems.

Assign roles using uaacWhen UAA federation is enabled, this procedure provides an alternate method for assigning rolesto users. The assigned roles provide access to projects and scopes on the UI and on the kubectlcommand line.

Procedure

1. Log in to kubectl. See Log in to kubectl for cluster-admins on page 66.

2. Identify the rolename to assign. Rolenames are aligned with project and scope names.

You can get currently defined rolenames using kubectl in another window:

kubectl get rolenames

3. Assign a role to an existing username.

uaac member add <rolename> <username>

Manage Projects, Scopes, and Users

104 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 105: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

For example, the following example gives the user joeg access to the project namedtest1.

uaac member add project-test1 joeg

Assign roles using kcadmUse this procedure when UAA federation is not enabled, to give users kubectl command lineaccess to projects and scopes

Before you begin

l You must log into PKS with a username that has cluster-admin role.

l The user name that you are assigning a role to must already exist in Keycloak.

Procedure

1. Log in to kubectl. See Log in to kubectl for cluster-admins on page 66.

2. Authenticate to Keycloak as an administrator:

kcadm.sh config credentials --server $CLUSTER_KEYCLOAK/auth --realm realm-name --user <user-name> --password password

where:

l $CLUSTER_KEYCLOAK is set to the endpoint for the Keycloak instance running in theStreaming Data Platform cluster. T

l <realm-name> is nautilus.

l <user-name> is a Streaming Data Platform user with admin role.

l <password> is the password for the admin user.

For example:

kcadm.sh config credentials --server $CLUSTER_KEYCLOAK/auth --realm nautilus --user nautilus --password password

3. Identify the rolename to assign. Rolenames are aligned with project and scope names.

You can list currently defined rolenames in another window using kubectl:

kubectl get rolenames

4. Assign a role to a user name:

kcadm.sh add-roles --realm <realm-name> --rolename <rolename> --uusername <user-name>

where:

l <realm-name> is nautilus.

l The role being assigned.

Manage Projects, Scopes, and Users

Dell EMC Streaming Data Platform Installation and Administration Guide 105

Page 106: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l <user-name> is the user that you are assigning the role to.

For example, the following command gives the user joeg permission to access resourcesfor the project named testproject:

kcadm.sh add-roles --realm nautilus --rolename project-testproject --uusername joeg

Assign admin roleAn administrator can assign the admin role to another user on the command line. using theKeycloak CLI.

Before you begin

l You must log into PKS with a username that gets credentials with Streaming Data Platformadmin role.

l The user name that you are assigning a role to must already exist in Keycloak or UAA. See Summary of ways to add a new user on page 100.

About this task

WARNING Users with admin role have permission to view any data and perform any activitywithin the Streaming Data Platform cluster. Assign this role with care.

Procedure

1. Follow the instructions for Assign roles using kcadm on page 105. In the last step, useadmin as the rolename.

For example:

kcadm.sh add-roles --realm nautilus --rolename admin --uusername lab-admin

2. To verify, ensure that the new admin can view all projects and scopes on the StreamingData Platform UI.

Manage Projects, Scopes, and Users

106 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 107: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 7

Monitor Health

Administrators can monitor platform health as well as job and stream performance.

l Monitor licensing................................................................................................................. 108l Monitor or change SRS registration status.......................................................................... 109l Monitor and manage events................................................................................................. 110l Run health-check.................................................................................................................. 111l Monitor Pravega health........................................................................................................ 112l Monitor stream health.......................................................................................................... 112l Monitor application health.................................................................................................... 112l Logging.................................................................................................................................112

Dell EMC Streaming Data Platform Installation and Administration Guide 107

Page 108: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Monitor licensingThe Streaming Data Platform User Interface shows licensing status.

To view the status of your Streaming Data Platform licenses, log onto the UI and navigate toSettings > License.Figure 6 Licensing information in the UI

Note: If you installed the product with an evaluation license, no licenses are listed.

The following table describes information on the License screen.

Section Field Name Description

Header Entitlement SWID The Streaming Data Platform product Software ID.

Instance SWID The Secure Remote Services (SRS) activation Software ID.

Body Name Two types of licenses are tracked within the Streaming Data Platform productlicense:

l Streaming Flink Cores — Tracks the number of Kubernetes coresdedicated to the Flink analytic engine.

l Streaming Platform Cores — Tracks the number of Kubernetes coresdedicated to other processing within the Platform.

Note: Kubernetes cores are not necessarily the same as physical cores.

Type Licenses for Streaming Data Platform are subscription licenses.

Start Date The date the license was obtained.

End Date The date the subscription ends. On this date, you will begin to receive warningevents about an expired license. Contact Dell EMC to renew a subscription.

Grace Period The date the grace period ends. On this date, you will begin to see only criticalevents collected on the events screen. The product will not shut down. DellEMC will contact you about subscription renewal.

Quantity Shows the number of cores in your subscription.

Monitor Health

108 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 109: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Section Field Name Description

Usage Tracks usage on the cores in each category. In the Flink Cores category, usagethresholds may apply, above which you may need to increase the number ofcores in the subscription.

Note: The product does not shut down because of an expired subscription. However, if youupload an expired license or alter the license file, the signature is invalidated and your productwill not be licensed.

Note: Be careful when performing network transfers of the license file with tools such as FTP.To avoid any signature changes when using FTP, use the binary option.

Monitor or change SRS registration statusYou can monitor or change the status of the SRS registration on the UI.

The UI provides a way to disable SRS registration and connectivity to Dell EMC, which may beconvenient for maintenance activities.

To view or change SRS status, log onto the Streaming Data Platform UI and navigate to System >SRS Gateway.

Figure 7 SRS Gateway information in the UI

Field Description

FQDN/IP Fully qualified domain name or IP of the SRS Gateway.

Port Configured port for communication with the SRS Gateway.

Instance SWID The Software ID of the SRS license.

Product The product name that is licensed for connection with SRS.

Registered Whether the Dell EMC backend systems have registered this SRS.

Test Dial HomeResults

The results of the last dial home test.

Test Dial Home Time The time of the last dial home test.

Monitor Health

Dell EMC Streaming Data Platform Installation and Administration Guide 109

Page 110: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Field Description

Actions l Test — Test the dial home feature. Dial home connects the SRS Gateway at thecustomer site with the Dell EMC support systems and allows support teams to connectremotely.

l Disable — Disable connectivity to the SRS Gateway. The events continue to queue andwill be delivered when the feature is enabled.

l Enable — Enable SRS Gateway connectivity.

Monitor and manage eventsThe Streaming Data Platform UI displays collected events and provides convenient features formanaging events.

To show collected events, log into the UI and navigate to System > Events. The events aremessages collected from the Streaming Data Platform applications and their associated k8sresources.

Figure 8 Events list in the UI

Filtering messages by type

You can filter the events that appear in your view by type. Select a type in the Type dropdownbox.

Monitor Health

110 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 111: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Acknowledging and managing events

You can mark an event as Acknowledged, which can help to separate events that you can safelyignore from those that need action. To acknowledge an event, click the Acknowledge button.

You can filter events by whether or not they are acknowledged by making a selection in theAcknowledged dropdown box.

By combining type and acknowledged filters, you can, for example, display only critical events thatare unacknowledged.

Searching events by text strings

Use the Search Text text box to filter by a text search. The search operates on the Component,App Name and Reason fields. Wildcards are not supported.

Run health-checkThis script checks the state of various components in the Streaming Data Platform cluster. It maybe run at any time after Streaming Data Platform is installed.

Procedure

1. Navigate to the folder where you unzipped the decks-installer-<version>.zip file.

2. Run the script.

$ ./scripts/health-check.py

The output looks similar to the following:

Starting health check...- Checking pod health - Checking pod health for namespace : nautilus-system - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : longevity-0 - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : catalog - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : nautilus-pravega - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts- Checking pravega cluster health - Pravega-cluster state is healthy- Check for failed helm deployments - No failed helm deployments were detected- Check Tier2 - Tier2 check succeeded

Monitor Health

Dell EMC Streaming Data Platform Installation and Administration Guide 111

Page 112: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Monitor Pravega healthThe dashboards provide information about Pravega operations in the platform.

Monitor for the following issues concerning Pravega:

Network issues

The first dashboard on the Streaming Data Platform Dashboards page shows currentstreaming throughput. A slow or stopped throughput without an obvious reason might indicatea network issue.

Adequate memory

The Grafana Pravega System dashboard shows various memory-related metrics. Notehowever that none of the Grafana metrics data is kept longer than one month.

Monitor stream healthMonitor streams for the following issues.

Hot streams

Monitor the stream-specific dashboards for unusual fluctuations.

Pravega storage

In Grafana, on the Pravega Alerts dashboard, there are Pravega metrics that identify problemswith Pravega interacting with storage.

Monitor application health

Monitor Flink and Pravega application status:

LoggingStreaming Data Platform generates all of the standard logs in native Kubernetes.

Users with cluster-admin role on the Streaming Data Platform cluster can access all of thesystem logs using native Kubernetes commands.

PKS offers log accumulation and monitoring tools. See the PKS documentation here: https://docs.pivotal.io/pks/1-5/monitor.html.

Monitor Health

112 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 113: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 8

Use Grafana Dashboards

Grafana deployed as part of Streaming Data Platform comes with a preinstalled plugin thatmonitors Pravega metrics. This chapter describes how to access and use the Grafana dashboards

l Grafana dashboards overview...............................................................................................114l Connect to the Grafana UI....................................................................................................114l Retention policy and time range .......................................................................................... 116l Pravega System dashboard.................................................................................................. 117l Pravega Operation Dashboard.............................................................................................. 119l Pravega Scope dashboard....................................................................................................124l Pravega Stream dashboard.................................................................................................. 125l Pravega Alerts dashboard.................................................................................................... 129l Custom queries and dashboards ..........................................................................................129l InfluxDB Data ...................................................................................................................... 130

Dell EMC Streaming Data Platform Installation and Administration Guide 113

Page 114: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Grafana dashboards overviewThe Grafana dashboards show metrics about the operation and efficiency of Pravega.

A metrics stack consisting of InfluxDB and Grafana is deployed with Pravega in the sameKubernetes namespace (nautilus-pravega). InfluxDB is an open source database product.Grafana is an open source metrics visualization tool.

The InfluxDB instance deployed in Streaming Data Platform contains a preconfigured pravegadatabase. The database is defined with four retention policies and a set of continuous queries tomove aggregated data from shorter retention policies to the longer ones.

Pravega reports metrics automatically and continuously into InfluxDB. Streaming Data Platformadds processes to continuously aggregate and delete the metrics according to the definedretention policies. The result is a self-managing database.

Grafana is installed into the cluster with a plugin application for Pravega. The plugin containspredefined dashboards that visualize the collected Pravega metrics. The predefined dashboardsare:

Dashboard Description

Pravega Alerts Monitors the health of Pravega in the cluster.

Pravega System Dashboard Shows details about heap and non-heap memory, buffermemory, garbage collection memory, and threads.

Pravega OperationDashboard

Shows various operational latencies and read/writethroughputs.

Pravega Scope Dashboard Shows scope total throughput rates, throughput by stream,and maximum per segment rates.

Pravega Stream Dashboard Shows stream-specific throughput, segment metrics, andtransaction metrics.

You may create additional customized dashboards using any of the metrics stored in InfluxDB.

Some of the Pravega metrics are shown in the Streaming Data Platform UI, on the main Dashboardand the Pravega Stream pages. Administrators can inspect the reported data in more detaill on theGrafana dashboards. By monitoring the dashboards, administrators can identify developing storageand memory problems and help identify stream-related inefficiencies or drill into problems.

The dashboards are available only to users with admin role.

Connect to the Grafana UIThe Grafana dashboards are available to Streaming Data Platform users with admin role.

Procedure

1. Choose one of the following ways to access the Grafana dashboards:

l If you are already logged onto the Streaming Data Platform UI as an admin, click theDashboards icon and then click the Grafana Dashboards link:

Note: The link appears only for admin users.

l Use the Grafana endpoint URL in your browser. See Obtain connection URLs on page 64.On the login screen that appears, enter your Streaming Data Platform admin credentials.

The Grafana Home Dashboard appears. Under Installed Apps, notice that the PravegaMonitoring app is installed.

Use Grafana Dashboards

114 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 115: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

2. To switch to a different dashboard, click the title of the current dashboard in the upper leftcorner. A dropdown list of all available dashboards appears.

In the screen above, the current dashboard name is Home. Clicking Home shows thefollowing:

Click a dashboard name to display that dashboard.

3. Most of the dashboards have controls, in the form of dropdown menus, that let you finetune the data shown.

For example, most dashboards have a Retention control that lets you choose the retentionpolicy from which to pull the data.

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 115

Page 116: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Retention policy and time rangeOn the Pravega dashboards, the time range and retention policy settings work together to definethe data that is displayed.

Time range

The time range control is a standard Grafana feature. In any dashboard banner, click the clock iconon the right side of the banner to display the time range choices. Click on a range under Quickranges to select it, or define your own range under Custom range.

Figure 9 Time range on Grafana dashboards

Retention

The retention control is specific to Streaming Data Platform. It selects the aggregation level of thedata to display. The following table shows the internally defined retention policies and associatedaggregation levels.

Retentionpolicy

Aggregation level Description

two_hour Original metrics reported byPravega every 10 seconds

The original 10-second metrics aredeleted after 2 hours.Use with time ranges that are between10 seconds and 2 hours. If you want toexamine metrics older than 2 hours, useone of the other retention choices.

one-day 1-minute periods, aggregrated fromthe 10-second metrics.

The 1-minute aggregated metrics aredeleted after 1 day.Use with time ranges that are between 1minute and 1 day.

one_week 30-minute periods, aggregated fromthe 1-minute metrics.

The 30-minute metrics are deleted after1 week.Use with time ranges that are between30 minutes and 1 week.

one_month 3-hour periods, aggregated fromthe 30-minute metrics.

The 3-hour aggregated metrics aredeleted after 1 month.Use with time ranges that are between 3hours and 1 month.

Interactions between time range and retention

Some time range and retention combinations may not show any data. If the time range specified isless than the aggregation period in the retention choice, the combination results in no data. Asexamples:

Use Grafana Dashboards

116 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 117: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

l The two_hour retention choice shows data that exists in the data base for a maximum of twohours. A time range of Last 12 hours can only show data for the last two hours.

l The one_week retention choice shows data in 30-minute periods. A time range of Last 5minutes does not show any data. Any range of 30 minutes or less will not show any data. Atime range of Last month can only slow data for the last week.

l The one-month retention choice shows data in 3-hour periods. A time range of Last hourdoes not show any data. Any range of 3 hours or less does not show any data. A time range ofLast year can only show data for the last month.

Pravega System dashboardThe Pravega System Dashboard shows the JVM metrics for Pravega controllers and segmentstores, one host container at a time.

Controls

This dashboard contains the following controls.

host

Choose the reporting container.

retention

Choose a retention policy, which controls the aggregation periods of the displayed data.

retention aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retentionpolicy and time range on page 116 for more information.

Description

This dashboard has three sections that you can expand and contract using the dropdown arrows.

The Totals section shows the memory usage by the host JVM for heap and non-heap areas.

Note: Watch for Used or Committed memory approaching the Max memory. If this happens,you might need to tweak the Pravega deployment parameters. Either increase the memory percontainer or increase the number of the component replicas, as your K8s environment permits.

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 117

Page 118: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The GC section of the dashboard shows garbage collector metrics.

The Threads section shows thread counts and states.

Use Grafana Dashboards

118 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 119: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Pravega Operation DashboardPravega Operation Dashboard shows the operational metrics for multiple moving parts involved inPravega operations, including segmentstores and Bookeeper.

Controls

This dashboard contains the following controls.

host

Choose a specific segmentstore or choose All.

retention

Choose a retention policy, which controls the aggregation periods of the displayed data.

retention aggregation period

two_hour 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retentionpolicy and time range on page 116 for more information.

Description

This dashboard has 6 sections that you can expand and contract with dropdown arrows.

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 119

Page 120: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The Current Latency Stats section shows the current values of different levels of Read/Writelatencies. The values in this section are color coded and turn red if their value goes above 50 ms.

Note: Monitoring for red values will quickly catch problems.

The Throughput Rates section shows the total throughput rates for Tier 1 (Bookkeepers) and Tier2 (NFS) storage. For more information about Pravega tiered storage, see this section in thePravega documentation. This section includes both the user-created streams and the systemstreams needed for Pravega operation.

The Segmentstore - Segment Latencies section reports Tier 1 read/write latencies and read/write error latencies.

The latency graphs show percentile groups, as follows:

Use Grafana Dashboards

120 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 121: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Legend indicator Percentile

p0.1 10% percentile

p0.5 50% percentile

p0.9 90% percentile

p0.99 99% percentile

p0.999 99.9% percentile

p0.9999 99.99% percentile

The Segmentstore - Storage Latencies section shows Read/Write latencies and errors aboutTier 2 storage.

Note: Monitoring these metrics can provide hints about communication problems with Tier 2storage.

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 121

Page 122: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The Segmentstore - Container Latencies section shows metrics for Pravega segment containerentities (not to be confused with the Docker containers running the segment stores).

Use Grafana Dashboards

122 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 123: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The Segmentstore - Bookeeper section contains Bookkeeper client metrics. The nativeBookkeeper metrics are not available here.

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 123

Page 124: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Pravega Scope dashboardThe Pravega Scope dashboard shows the total throughput rates and maximum per segment ratesfor user streams in a Pravega scope.

Controls

This dashboard contains the following controls.

scope

Choose the scope name that you want to see metrics for.

stream type

Choose to show metrics for system streams, user-defined streams, or all streams.

retention

The retention choice defines the aggregation level of the displayed data. The default retentionis two-hours. It shows data in 10-second intervals.

Also choose a compatible time range.

See Retention policy and time range on page 116 for more information.

Description

This dashboard has 3 sections that you can expand and contract using the dropdown arrows.

l Write bytes

l Read bytes

l Write events

All three sections are organized in a similar way.

The panels on the left show individual throughput rates for each stream in the scope, plus a totalfor the scope.

Use Grafana Dashboards

124 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 125: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Note: These charts show which streams have high load and which ones do not have any load.

The panels on the right show the write or read rate for the segment with the highest rate withinthe scope.

Note: If you see something alarming at the scope level, you can drill down into the problem onthe Pravega Stream dashboard.

Pravega Stream dashboardThe Pravega Stream dashboard shows details about specific streams.

Controls

This dashboard contains the following controls.

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 125

Page 126: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

stream

Choose a stream name within the selected scope. When the scope selection changes, thestream dropdown menu is repopulated with appropriate stream names.

stream type

Choose to show metrics for system streams, user-defined streams, or all streams.

scope

Choose a scope name.

retention

Choose a retention policy, which controls the aggregation periods of the displayed data.

retention aggregation period

two_hour 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retentionpolicy and time range on page 116 for more information.

Description

This dashboard contains a row of metrics followed by five sections that you can expand andcontract using the dropdown arrows.

The row of metrics at the top shows the latest values available for this stream in the chosenretention policy. For example, if you choose one_month retention, the values can be as old asthree hours ago because the data points are aggregated only every three hours for that retentionpolicy.

The Segments section shows the number of segments, segment splits, and segment merges overtime.

Note: The Pravega controller reports these metrics. When no changes are happening, thecontroller does not report metrics, and this could be reflected in the charts if there are nometrics reported during the time period selected. You can always view the current metrics onthe Stream page in the Streaming Data Platform UI. Those metrics are collected using theREST API rather than relying on reported metrics from the controller. Another advantage ofthe Streaming Data Platform UI's Stream page is the heat charts for stream segments. Thoseare not available in Grafana.

Use Grafana Dashboards

126 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 127: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The following three sections appear next.

l Write Bytes

l Read Bytes

l Write Events

These sections are all organized in the same way. The panels on the left show totals for thestream. The panels on the right show maximum segment rates.

Note: Inspecting the maximum per segment rate is complementary to using the heat charts inthe Streaming Data Platform UI.

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 127

Page 128: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The Transactions section appears last. This section contains data only if the stream performstransactional writes.

Note: In the left panel, monitor the number of aborted transactions. Too many abortedtransactions could indicate a networking problem or a problem in the business logic of the Flinkor Pravega application.

Use Grafana Dashboards

128 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 129: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Pravega Alerts dashboardThe Pravega Alerts Dashboard reports critical and warning conditions from various checks onPravega health.

Each chart includes an information icon in its upper left corner. Hover your cursor over the icon toview a description of the condition that the chart is monitoring.

Custom queries and dashboardsYou can create custom queries or new dashboards using any data in InfluxDB.

Queries

You can explore the Pravega metrics available in InfluxDB by creating ad-hoc queries. This featuregives you a quick look at metrics without having to define an entire dashboard.

l Click the Explore icon on the left panel of the Grafana UI.

l For datasource, choose pravega-influxdb.

l Create your query against any measurement available in the database..

You can not save the queries. For that, you need to create a custom dashboard.

Custom Dashboards

You may create new, custom dashboards from any data available in the pravega-influxdbdatasource. See the next section for an introduction to the metrics structure.

If you want to customize the predefined dashboards, we strongly recommend that you save thechanges as custom dashboards, rather than overwriting the original ones. You are logged in as aGrafana Editor which enables you to edit and overwrite the dashboards.

Note: If you overwrite the original dashboards, your changes will be lost if the Pravega Appplugin is updated in a subsequent Streaming Data Platform release.

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 129

Page 130: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

InfluxDB DataThis section provides an overview of the metrics stored in InfluxDB.

Pravega metrics

Pravega metrics are stored in InfluxDB according to the naming conventions described in the MetricsNames.java file, with periods (.) replaced by underscores ( _ ). For example,segmentstore.segment.write_bytes is stored assegmentstore_segment_write_bytes. All metrics are tagged with their host, which is thePravega pod reporting the metric.

Some of the metrics are tagged with scope, stream, segment, or container (if applicable).

For more information about Pravega metrics, see Pravega documentation.

The original metrics reported by Pravgea and described in the MetricsNames.java file areprefixed with pravega_. Most of the metrics shown on the Grafana dashboards do not have thatprefix because they represent some kind off aggregation over the original Pravega metrics. Forexample, typical metrics used in the dashboards are rates that are calculated on the originallyreported counts.

Calculated rates

In addition to the original Pravega metrics, the database contains some pre-calculated rates toenable faster InfluxDB queries for certain inquiries.

Segment Read/Write rates are tagged with scope, stream, and segment. They are stored in thefollowing measurements with the Rate field:

segmentstore_segment_read_bytessegmentstore_segment_write_bytessegmentstore_segment_write_events

Stream-level Read/Write rate aggregates are tagged with scope and stream and stored in thefollowing:

segmentstore_stream_read_bytessegmentstore_stream_write_bytessegmentstore_stream_write_events

Global Read/Write rate aggregates over all segments, streams, and scopes are tagged with thesegmentstore instance in the host tag. They are stored in the following:

segmentstore_global_read_bytessegmentstore_global_write_bytessegmentstore_global_write_events

Pravega Tier2 Read/Write rates are available as storage rates:

segmentstore_storage_read_bytessegmentstore_storage_write_bytes

Use Grafana Dashboards

130 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 131: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Bookkeeper client write rate is stored here:

segmentstore_bookkeeper_write_bytes

Transactional rates are available at the stream level, tagged with scope and stream. They arereported only if transactional writes are happening on the stream.

controller_transactions_abortedcontroller_transactions_createdcontroller_transactions_committed

There are also two gauges for transactions:

controller_transactions_openedcontroller_transactions_timedout

Use Grafana Dashboards

Dell EMC Streaming Data Platform Installation and Administration Guide 131

Page 132: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Use Grafana Dashboards

132 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 133: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 9

Kubernetes Resources

This chapter describes the Kubernetes resources in the Streaming Data Platform cluster.Understanding the architecture of Kubernetes components deployed in the cluster can help withtroubleshooting and support tasks.

l Namespaces.........................................................................................................................134l Components in the nautilus-system namespace...................................................................134l Components in the nautilus-pravega namespace................................................................. 135l Components in project namespaces.....................................................................................135

Dell EMC Streaming Data Platform Installation and Administration Guide 133

Page 134: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

NamespacesA Streaming Data Platform cluster contains the following namespaces.

kube-system

Contains the default system software.

pks-system

Contains the PKS system software.

catalog

Contains the service catalog for the cluster.

nautilus-system

Contains Streaming Data Platform software.

nautilus-pravega

Contains the Pravega store and Pravega software.

Project-specific namespaces

Each user-created project has its own namespace. The namespace name is project-<projectname>.

Components in the nautilus-system namespaceThe nautilus-system namespace contains components to support Streaming Data Platformfunctions.

Components in nautilus-system

Subsystem

Name Description

Core Cert Manager Provisions and manages TLS certs

External DNS Dynamically registers DNS names for platform services andingress connections

Metrics Operator Manages InfluxDB and Grafani metrics stack

Nautilus UI Provides the web UI for managing the platform

NFS ClientProvisioner

Provisions persistent volumes within the configured NFSserver

NGINX Ingress Ingress controller and load balancer

Security Keycloak Provides identity and access management for applicationsand services.

Keycloak-webhook Injects Keycloak credentials into relevant pods.

Keycloak-broker Handles Keycloak roles and Keycloak clients.

Flinkservices

Flink-operator Runs the Flink engine in the cluster.

Project-operator Manages projects in Flink.

Kubernetes Resources

134 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 135: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Subsystem

Name Description

Serviceability

DECKS Manages SRS registration, call-home, and licensing.

KAHM Provides event and health management services

Monitoring Provides monitoring of resource usage

CRDs in nautilus-system

The nautilus-system namespace defines the following CRDs. Their operators are included in the listof components above.

l ProjectSystem

l Project

l FlinkCluster

l FlinkApplication

l FlinkImage

Components in the nautilus-pravega namespaceThe nautilus-pravega namespace contains components to support Pravega functions within theStreaming Data Platform platform.

Components in nautilus-pravega

Component name Description

pravega-operator Manages Pravega clusters.

pravega-cluster Pravega software.

pravega-broker Provisions Pravega scopes.

zookeeper-operator Manages zookeeper clusters for segments.

CRDs in nautilus-pravega

l PravegaCluster

l ZookeeperCluster

l InfluxDB

l Grafana

Components in project namespacesEach analytic project has a dedicated Kubernetes namespace.

A project's namespace name is project-projectname. For example, a project that you createwith the name test has a namespace name of project-test.

For additional information about project namespaces, see Manage projects on page 86.

Kubernetes Resources

Dell EMC Streaming Data Platform Installation and Administration Guide 135

Page 136: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Kubernetes Resources

136 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 137: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

CHAPTER 10

Authentication and authorization

This chapter provides high-level background information about authentication and authorization inStreaming Data Platform. This information is intended for troubleshooting and high-levelunderstanding of the security mechanisms that control user access to the platform and itsresources. For detailed administrator procedures, see the following:

l Post-install Configuration and Maintenance on page 71 includes procedures for UAA federationand LDAP integration.

l Manage Projects, Scopes, and Users on page 85 includes procedures for adding new users andassigning roles.

l Authentication overview...................................................................................................... 138l Authorization ....................................................................................................................... 141l Application authentication and authorization....................................................................... 142l Additional Keycloak services ............................................................................................... 142

Dell EMC Streaming Data Platform Installation and Administration Guide 137

Page 138: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Authentication overviewStreaming Data Platform uses PKS User Account and Authentication (UAA) services forauthentication. PKS UAA federation coordinates lower level authentication services provided byKeycloak and Kubernetes native mechanisms.

PKS protects access to the cluster

Streaming Data Platform runs in the Pivotal Container Service (PKS) environment. PKSprovides top level protection of clusters in its environment through its own authenticationservice. The Kubernetes cluster in which Streaming Data Platform is running is protected byPKS UAA.

UAA federation coordinates authentication

For production deployments, Streaming Data Platform requires federation with PKS UAA. Thefederation provides enhanced and coordinated authentication services and improves the userexperience.

With UAA federation, all user authentication, for both the UI and the kubectl command line,is delegated to UAA.Streaming Data Platform uses the OIDC protocol to connect bothKeycloak and the Kubernetes cluster to the UAA provider.

For development and test deployments, you may forego the UAA federation. For suchdeployments, administrators must create user names separately for access to the UI and thekubectl command line and must manually assign roles in both spheres.

Optional LDAP integrations

Integration with existing LDAP identity providers is supported through the UAA federation.

Keycloak handles external requests

Within the product under the UAA level, Keycloak is an open source Identity and AccessManagement solution. Both human users and applications require access to Streaming DataPlatform from outside of the cluster.

Keycloak processes authentication requests originating external to the Kubernetes cluster.

Kubernetes handles internal requests

Security between cluster components is tightly integrated with Kubernetes nativemechanisms. Authentication requests internal to the Kubernetes cluster are handled byKubernetes authentication integration methods, such as impersonation. Internal requestsinclude users accessing the cluster on the command line with kubectl as well ascommunication between platform components.

About UAA federationWith UAA federation, both Keycloak and the K8s cluster connect to UAA over OIDC. They bothuse UAA as the common authentication and authorization authority.

The following figure shows these relationships.

Authentication and authorization

138 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 139: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Figure 10 UAA federation

1. The Streaming Data Platform UI delegates authentication to Keycloak.

2. Keycloak is connected in a federation to UAA, over OIDC.

3. The UI backend forwards credentials to other services running in the cluster, such as Mavens,Pravega, Grafana, and Flink.

4. The kubectl instance is also connected to UAA over OIDC.

5. The kubectl users are authenticated in UAA.

About KeycloakKeycloak is open source middleware for identity and access management. Keycloak is a provenindustry solution for use with Kubernetes clusters.

The Streaming Data Platform installation deploys and configures an instance of Keycloak in theKubernetes cluster. In this documentation, the term Keycloak refers to the Keycloak instancerunning in the Streaming Data Platform Kubernetes cluster.

Keycloak performs the following functions for Streaming Data Platform:

l Authenticates and authorizes users — Keycloak handles requests from human users of theStreaming Data Platform user interface.

l Authenticates and authorizes clients — In Keycloak, clients are non-human requestors. Thisincludes Streaming Data Platform backend processes and your custom applications. securethese applications and services. Streaming Data Platform uses a Keycloak Client Adapter forOpenID to securely connect clients to Keycloak. Keycloak uses JSON web tokens (JWT) andOAuth2 to authenticate and authorize clients.

l Integrates with PKS UAA — A required step for deploying production Streaming Data Platformenvironments is to configure Keycloak to federate with PKS UAA. See Set up UAA federationon page 73. for instructions.

Note: You can integrate with other authentication directories, such as LDAP, through UAA.See Set up LDAP integration on page 74 for instructions.

Keycloak Administration ConsoleYou can access the Keycloak Administration Console in the Streaming Data Platform cluster.

See Obtain connection URLs on page 64.

Authentication and authorization

Dell EMC Streaming Data Platform Installation and Administration Guide 139

Page 140: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

This console is not intended as a major tool for Streaming Data Platform administrators, but ratheras a troubleshooting and testing resource. The dashboard can be useful to administrators for:

l Verifying the existence of users and roles

l Researching or verifying roles assignments

l In development and test deployments, creating new user accounts for the Streaming DataPlatform UI.

Note: In production environments, do not use the Keycloak dashboard to create users or tocreate or assign roles. Doing so is not supported. Futhermore, while those actions provide thedesired access to users on the Streaming Data Platform UI, they do not propagate down to theKubernetes level. For example, assigning a role on the Keycloak Administration Console resultsin authorizations for the user in the Streaming Data Platform UI, but that user is blocked fromviewing those same resources when using kubectl commands.

Keycloak resourcesAlthough the Keycloak Administration Console shows all of the native Keycloak resources,Streaming Data Platform only uses a subset of them.

Streaming Data Platform uses the following Keycloak resources.

Realms

A core concept in Keycloak is a realm. A realm secures and manages metadata for a set ofusers and registered OAuth2 clients.

Streaming Data Platform uses the following two realms, which are created and configured bythe installer tool.

l master realm — This is the root realm, hosting users such as the installationadministrator.

l nautilus realm — This is where all Streaming Data Platform identities, local andfederated, reside.

Users

Users are human users.

In development and test environments, administrators can create new local users on thisConsole. Local users can authenticate to the Streaming Data Platform UI. Project-specificroles are assigned to a user when an administrator makes the user a project member. Localusers typically do not have access to the cluster on the kubectl command line; they may havethat access only if an administrator manually assigns roles to the user name in the cluster.

In production systems, new users must be defined in UAA. Project-specific roles are assignedto a user when an administrator makes the user a project member. Federated usersautomatically have access to their assigned project resources on both the UI and thekubectl command line.

Clients

Clients are applications and services (nonhuman users) that need access to cluster resources.Clients are:

l Internal clients and service accounts used by the platform.

l Service accounts used by Analytics projects. User applications (Flink applications) run inthis context.

Roles

Roles are Kubernetes RBAC rolenames used for authorization.

Authentication and authorization

140 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 141: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Rolenames are created in Keycloak when projects are created. If UAA federation is enabled,the rolenames are available to the K8s authorization model so that a user on the kubectlcommand line has the same role-based authorizations on the command line and on the UI.

AuthorizationAuthorization in Streaming Data Platform is controlled by role-based access control (RBAC).Kubernetes handles authorization for most control-plane operations, and Keycloak handles data-plane authorization.

Authorization to access project resources

Authorizations are based on Analytic Projects. Users gain access to the resources in specificprojects by becoming members of the project. Administrators assign users to projects on the UI.See Add or remove project members on page 90. Users can belong to more than one project.

Each project has its own Kubernetes namespace. Resources in a project include the Maven repoand the project's Pravega scope. Pravega authorizations are at the scope level; project memberscan access all of the streams in the project's Pravega scope.

Independent scopes

For scopes that are independent of any project, users gain access to the streams in the scope bybecoming members of the scope. Administrators assign users to scopes on the UI. See Add orremove scope members on page 96. Unlike project membership, scope membership does notequate to an RBAC rolename. The member usernames are resources in the namespace.

Admin role

The admin role in Streaming Data Platform has access to all projects and all scopes in the platform.

Summary of rolesThis section describes the authorization rolenames used within Streaming Data Platform.

Table 10 Streaming Data Platform Rolenames

rolename Description

admin Users with complete wild-card access to all projects and all Pravegascopes. One user with this role is created by the installation. You maycreate additional usernames with admin role. See

project-<projectname>-member

Role assigned to users who are project members or to applications that areregistered with Keycloak. These rolenames are created whenever a newproject is created. The role provides authorization to access resources fora project. Access includes but is not limited to: Maven artifacts, Pravegascopes in the project's namespace, and Flink clusters defined in theproject.

Table 11 PKS and UAA rolenames

rolename

uaa-admin UAA user who can create other UAA users, create clients, and assignroles.

Authentication and authorization

Dell EMC Streaming Data Platform Installation and Administration Guide 141

Page 142: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Table 11 PKS and UAA rolenames (continued)

rolename

pks.clusters.admin orpks.clusters.manage

UAA user who can create K8s clusters and automatically become cluster-admin for them.

cluster-admin Users with complete access to all resources within the Streaming DataPlatform cluster. If you follow recommended procedures for creatingusers and assigning roles, the admin users in the previous section become

cluster-admin users in the cluster.

Application authentication and authorizationKeycloak handles authentication and authorization for Streaming Data Platform applications.

Applications in Streaming Data Platform may be Pravega readers and writers and Flinkapplications.

An application registers with Keycloak and receives credentials. To gain access, the applicationuses oAuth2 technology supported by Keycloak to submit credentials and receive and submittokens.that include assigned roles.

Applications register with Keycloak with REST API calls. A registered application is represented inthe cluster with a Kubernetes service account. Keycloak assigns credentials to the service accountand creates a JSON blob containing all required information for authentication.

To authenticate, an application must have the JSON blob. The blob includes the location ofKeycloak and the credentials.

When a client presents the JSON blob, Keycloak attempts to authenticate it and if successful,responds with a token. Tokens are short-lived. When one expires, another must be obtained.

The application presents the token to access services, such as access to Pravega. The tokencontains the roles for the client.

You can research the following details on the Keycloak dashboard:

l To see credentials for a registered application, navigate to Clients > <client-name> >Credentials

l To see roles assigned to an application, navigate to Clients > <client-name> > ServiceAccount Roles.

l To see and download the JSON blob that a client needs for authentication, navigate toClients > <client-name> > Installation and set Format Option to Keycloak OIDC JSON.

Additional Keycloak servicesIn addition to servicing the UI frontend and applications, Keycloak also performs authenticationand authorization services for the UI backend, Pravega scopes, and Flink clusters.

How Keycloak services Apache Flink applications

Apache Flink applications need authorization to access Pravega. With the help of a service broker,Keycloak performs authentication of accounts and authorization for resources.

Authentication and authorization

142 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 143: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

How Keycloak services Pravega

Authorization is performed at the Pravega scope level . Users are granted read or read/writeaccess to all streams within a given scope.

Streaming Data Platform implements a Pravega plugin for Keycloak. The plugin maps Keycloakpermissions on Pravega scopes in Keycloak to native Pravega authorizations.

Keycloak authorization on a scope Pravega permission

write READ_UPDATE

read READ

empty NONE

How Keycloak services the UI backend

User actions on the Streaming Data Platform UI require access to services such as:

l Mavens

l Influxdb

l pravega-controller REST API

l K8 APIs

The Kubernetes impersonation methodology is used to send authentication and authorizationinformation to these services. Each service processes the information using its own authenticationand authorization methods.

Authentication and authorization

Dell EMC Streaming Data Platform Installation and Administration Guide 143

Page 144: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Authentication and authorization

144 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 145: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

APPENDIX A

Configuration values file reference

This reference describes all configuration attributes for Streaming Data Platform.

l Overview .............................................................................................................................146l Template of configuration values file................................................................................... 146

Dell EMC Streaming Data Platform Installation and Administration Guide 145

Page 146: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

OverviewMost Streaming Data Platform installation attributes have default values. You can override any ofthe defaults in a configuration file.

Some attributes do not have easily predictable default values, so they most likely need overriding.Examples are:

l User-supplied passwords and certificatesl connection information to Tier 2 storagel license file

To provide the above values and any other attribute values that you want to override, create oneor more .yaml files. Using YAML format, include sections for which you want to supply overridevalues.

You specify the location of one or more configuration value files on the command line in thedecks-install apply command.

Template of configuration values fileThe following template shows all installation attributes.

#global:# external:# host: "" #tld that services will use# tls: false# ingress:# annotations:# kubernetes.io/ingress.class: nginx# kubernetes.io/tls-acme: "true"catalog:# image:

zookeeper-operator:# image:# repository:# tag:

metrics-operator:# image:# repository:# tag:# "srsNotifier": "streamingdata-srs"

nautilus-ui:# image:# repository:# tag:

#nginx-ingress:# controller:# image:# repository:# tag:# config:# proxy-buffer-size: "128k"# proxy-buffers: "4 256k"# fastcgi-buffers: "16 16k"# fastcgi-buffer-size: "32k"# podAnnotations:# ncp/ingress-controller: true

Configuration values file reference

146 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 147: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

# defaultBackend:# image:# repository# tag:

keycloak: keycloak:# image:# repository:# tag:

# admin password hard coded to password for now. Before GA, either: # prompt the user for a password during install OR blank this out and let # the keycloak chart generate a random one (it outputs a kubectl command to retrieve # the secret)# password: "..."# DESDPPassword: "..."

keycloak-injection-hook:# image:# repository:# tag:

keycloak-service-broker:# image:# repository:# tag:

nfs-client-provisioner:# image:# repository:# tag:# nfs:# server:# path:# mountOptions:# - nfsvers=4.0# - sec=sys# - nolock# storageClass:# archiveOnDelete: "false"

project-operator:# image:# repository:# tag:# mavenImage:# repository:# tag:# zkImage:# repository:# tag:

flink-operator:# image:# repository:# tag:# flinkImage:# repository:# tag_1_7_2:

pravega-operator:# image:# repository:# tag:

Configuration values file reference

Dell EMC Streaming Data Platform Installation and Administration Guide 147

Page 148: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

pravega-service-broker:# image:# repository:# tag:

pravega-cluster:# pravega_debugLogging: false# pravega_version: 0.5.0-2269.6f8a820-0.9.0-019.007be9f# credentialsAndAcls: base64 encoded password file: https://github.com/pravega/pravega-tools/blob/2c2dcb327a289f1f861deb96e23c2bf29e6b7f6c/pravega-cli/src/main/java/io/pravega/tools/pravegacli/commands/admin/PasswordFileCreatorCommand.java # pravega_security pravega.client.auth.token & credentialsAndAcls are coupled, if one changes, the other must pravega_options:# log.level: "DEBUG"

pravega_container_count: 48 # expect it to be 8 containers per segmentstore if not reducing memory

segment_store_jvm_options: # - "-XX:MaxDirectMemorySize=8g"

bookkeeper_options:# useHostNameAsBookieID: "true"

bookkeeper_jvm_options:# memoryOpts: # - "-Xms2g"# - "-XX:MaxDirectMemorySize=8g"

pravega_storage:# tier2: # size: 250Gi# class_name: "nfs"# ledger: # size: 250Gi# class_name: <ledger-storage-class># journal: # size: 250Gi# class_name: <journal-storage-class># index: # size: 10Gi# class_name: <index-storage-class># cache: # size: 100Gi# class_name: <cache-storage-class># zookeeper: # size: 20Gi# class_name: <zookeeper-storage-class>

pravega_replicas:# controller: 1# segment_store: 3# zookeeper: 3# bookkeeper: 4

pravega_resources: controller:# limits:# cpu: 500m# memory: 1Gi# requests:# cpu: 250m# memory: 512Mi segment_store: # limits:

Configuration values file reference

148 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 149: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

# cpu: "1"# memory: 2Gi# requests:# cpu: 500m# memory: 1Gi bookkeeper:# limits:# cpu: "1"# memory: 2Gi# requests:# cpu: 500m# memory: 1Gi pravega_security:# TOKEN_SIGNING_KEY: "..."# pravega.client.auth.method: "Basic"# pravega.client.auth.token: "..." #note, if this changes credentialsAndAcls needs to change# autoScale.tokenSigningKey: "..."# AUTHORIZATION_ENABLED: "true"# autoScale.authEnabled: "true"

pravega_externalAccess:# enabled: true# type: LoadBalancer | NodePort

zookeeper_image:# repository:# tag:

bookkeeper_image:# repository:

pravega_image:# repository:

grafana_image:# repository:# tag:

influxdb_image:# repository:# tag:

grafana_notifiers:# kahm:# namespace: nautilus-system# username: <kahm-username># password: <kahm-password>

external-dns-resources:# externalDNSSecrets:# - name:# value: |# {# ....# }

external-dns:# see https://github.com/helm/charts/blob/8ab12e10303710ea3ad9d771acdd69d7658b7f47/stable/external-dns/values.yaml

cert-manager-resources:# certManagerSecrets:# - name:# value: |# {# ....# }

Configuration values file reference

Dell EMC Streaming Data Platform Installation and Administration Guide 149

Page 150: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

# clusterIssuer:# name:# server:# email:# acmeSecretKey:# solvers:# # you can specify multiple solvers using labels and selectors. see:# # https://docs.cert-manager.io/en/latest/tasks/issuers/setup-acme/index.html# - dns01:# clouddns:# serviceAccountSecretRef:# name:# key:# project:# - dns01:# route53:# # hosted zone id taken from route53 “Hosted Zone Detailsâ€# hostedZoneID:# region:# accessKeyID:# secretAccessKeySecretRef:# name: #TODO need to put this keey above in certManagerSecrets# key: #TODO need to put this keey above in certManagerSecrets

cert-manager:# see https://github.com/jetstack/cert-manager/blob/v0.8.0/deploy/charts/cert-manager/values.yaml

# Serviceability:

decks:# see https://github.com/EMCECS/charts/tree/master/decks

srs-gateway:# see https://github.com/EMCECS/charts/tree/master/srs-gateway

kahm:# see https://github.com/EMCECS/charts/tree/master/kahmdellemc-streamingdata-license:# see https://github.com/EMCECS/charts/tree/master/dellemc-licensemonitoring: # image:# repository: devops-repo.isus.emc.com:8116/nautilus/monitoring# tag: latest# pullPolicy: Always

# license:# name: dellemc-streamingdata-license # namespace: nautilus-system

# schedule: "*/10 * * * *"

# subjects:# - name: Streaming Flink Cores# code: STRM_FLINK_CORES# uom_code: ZC# uom_name: Individual CPU Cores# niceName: Flink# selectors:# - component=taskmanager# - component=jobmanager# - name: Streaming Platform Cores# code: STRM_CORES# uom_code: ZC# uom_name: Individual CPU Cores# niceName: Platform

Configuration values file reference

150 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 151: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

# namespaces:# - nautilus-system# - nautilus-pravega

Configuration values file reference

Dell EMC Streaming Data Platform Installation and Administration Guide 151

Page 152: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Configuration values file reference

152 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 153: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

APPENDIX B

Summary of scripts

This appendix contains a summary of the scripts that are delivered with the product. The scriptsare in unzipped contents of the decks-installer-<version>.zip file, under the /scriptsfolder.

l Summary of scripts..............................................................................................................154

Dell EMC Streaming Data Platform Installation and Administration Guide 153

Page 154: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Summary of scriptsThe following scripts are included with Streaming Data Platform.

Software requirements for some scripts

There are software requirements for your local machine for most scripts. See Prepare the workingenvironment on page 46.

federate.sh

This script is part of the setup for UAA federation. It is not a standalone script. See Set up UAAfederation on page 73for context and run instructions.

health-check.py

This script may be run at any time after Streaming Data Platform is installed. It checks the state ofvarious components in the Streaming Data Platform cluster and generates a summary as output.

See Run health-check on page 111.

post-install.sh

Run this script after running the decks-install apply command. This script confirms thatyour latest run of the decks-install apply command left the cluster in a healthy state. Thisscript invokes the health check script. You may run this script at any time.

See Run post-install script on page 59. Also see Change applied configuration on page 78.

post-upgrade.sh

Run this script after upgrading the Streaming Data Platform with a new distribution of manifestsand charts. It confirms that the cluster was upgraded properly and is healthy. It runs the healthchecks

See Upgrade software on page 81.

prereqs.sh

Run this script before running the decks-install apply command the first time (or the firsttime on a new local machine). You can run the script at any time. It does the following:

l Checks your local environment for the required tools and versions of those tools.

l Checks the Streaming Data Platform cluster for a default storage class definition.

l Checks the Streaming Data Platform cluster for the required version of the Tiller server.Installs it if needed, with all required roles and permissions.

Note: The script installs Tiller v2.13.1. Other versions may be incompatible with StreamingData Platform.

See Run prereqs script on page 55.

pre-install.sh

This script must be run one time before installing Streaming Data Platform.

It creates credentials required for the internal communication of Streaming Data Platformcomponents and generates a values.yaml file that must be included with every run of the decks-install apply command.

See Run pre-install script on page 56.

Summary of scripts

154 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 155: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

pre-upgrade.sh

This script must be run before upgrading the Streaming Data Platform version with a newdistribution of manifests and charts. The script ensures that the environment is healthy, includingrunning the health checks. You should not update a cluster that is unhealthy.

See Upgrade software on page 81.

validate-values.py

This script is part of the installation and change configuration processes. It should be used beforerunning decks-install apply. It reads the configuration values files and checks the valuesover certain criteria. For example, it validates the values used for external connectivity andserviceability.

See Run validate-values script on page 56. Also see Change applied configuration on page 78.

Summary of scripts

Dell EMC Streaming Data Platform Installation and Administration Guide 155

Page 156: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Summary of scripts

156 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 157: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

APPENDIX C

Troubleshooting

This appendix contains troubleshooting aids.

l Log files............................................................................................................................... 158l Useful troubleshooting commands....................................................................................... 158l FAQs.................................................................................................................................... 161l Application connections when TLS is enabled...................................................................... 165l Online and remote support...................................................................................................165

Dell EMC Streaming Data Platform Installation and Administration Guide 157

Page 158: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Log filesThis section describes how to obtain useful logs.

Get installation logs

To track installation progress, you can monitor the installation logs. From the installation folder,look at decks-install.logs.

Get pod logs for a namespace

List pod names with:

kubectl get pods --all-namespaces

Get information about a pod with:

kubectl describe pod <pod name> -n <namespace>

For example:

kubectl describe pod keycloak-service-broker-797849c678-52pnl -n nautilus-system

Get the logs for a namespace in the pod:

kubectl logs <pod name> -n <namespace>

Useful troubleshooting commandsThis section introduces CLI commands that can help you get started with researching problems ina Streaming Data Platform deployment.

l bosh commands on page 158

l pks commands on page 159

l helm commands on page 160

l kubectl commands on page 160

bosh commands

Use bosh commands to manage the PKS deployment.

The following bosh commands are useful when getting started with troubleshooting StreamingData Platform deployments. For descriptions of all bosh commands and their syntax, see the boshdocumentation at https://www.bosh.io/docs/cli-v2/.

bosh loginLog in to a bosh shell.

bosh statusGet status for the bosh director.

Troubleshooting

158 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 159: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

bosh vmsList all running VMs.

bosh deploymentsList deployments.

l Use bosh -d <deployment> vms to list VMs in a specific deployment.

l Use bosh -d <deployment> ssh <vm-id> to get a bosh shell in a specificdeployment and VM.

bosh instancesList bosh instances.

bosh tasksList all tasks.

l Use bosh tasks --recents to limit the output to recent tasks.

l Use bosh task <task-id> to get details about a specific task.

bosh locksList bosh locks. The bosh interface uses locks on resources so that only one user can makeupdates. If a task fails or is canceled, associated locks may still exist.

bosh eventsList recent events. Events report information about system and user actions. See https://www.bosh.io/docs/events/ for more information, including how to filter the output.

pks commands

Use pks commands to manage the PKS environment in which your Streaming Data Platformcluster exists.

The following pks commands are useful when getting started with troubleshooting Streaming DataPlatform deployments. For descriptions of all pks commands and their syntax, see the Pivotaldocumentation at https://docs.pivotal.io/pks/1-6/cli/.

Note: Change the version number in the documentation link to match your installed PKSversion.

pks clustersList all running clusters.

pks cluster <cluster-name>Show cluster details.

pks get-credentials <cluster-name>Get kubectl credentials for a cluster.

pks plansList available PKS plans.

Troubleshooting

Dell EMC Streaming Data Platform Installation and Administration Guide 159

Page 160: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

helm commands

Use helm commands to manage the Kubernetes packages that are installed in your cluster. Youcan also check the current helm version number.

The following helm commands are useful when getting started with troubleshooting StreamingData Platform deployments. For descriptions of all helm commands and their syntax, see https://helm.sh/docs/.

helm versionShows the client and server versions for Helm and Tiller.

helm ls --allLists all the releases that are installed in the cluster.

helm ls -all --shortGenerates an abbreviated output of the above command.

kubectl commands

Use kubectl commands to investigate Kubernetes resources in the cluster.

The following kubectl commands and flags are useful for troubleshooting Streaming DataPlatform deployments. For descriptions of all kubectl commands and their syntax, see https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands.

Common flags

The following flags apply to many commands.

--all-namespacesApplies a command, such as kubectl get, to all namespaces rather than to a namednamespace.

-o yamlOutputs a yaml formatted API object.

Useful commands

kubectl config use-context <cluster-name>Switch your current command-line context to the named cluster. Use this command when youare logged into two clusters in the same session.

kubectl cluster-infoDisplay addresses of the master and services with the label kubernetes.io/cluster-service=true.

kubectl api-resources ...Print supported API resources. Some useful flags in this command are:

l --verbs=[verbs] to limit the output to resources that support the specified verbs.

l --namespaced={true | false} to include or exclude namespaced resources. If false,only non-namespaced resources are returned.

l -o {wide | name} to indicate an output format .

Troubleshooting

160 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 161: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The following example displays all resources that support the kubectl list command. Itincludes namespaced resources and shows the output in the shortened name format.

kubectl api-resources --verbs=list --namespaced=true -o name

kubectl get ...List resources of the specified resource type. Some useful kubectl get commands for SDPare:

kubectl get podskubectl get pods --all-namespaceskubectl get serviceskubectl get deploymentskubectl get deployment <deployment-name>kubectl get nodeskubectl get eventskubectl get storageclasskubectl get serviceaccounts

kubectl describe ...Show details of a specific resource or group of resources.

kubectl logs ...Display logs for a container in a pod or specified resource. If the pod has only one container,the container name is optional.

For example:

kubectl logs <pod name> -n <namespace>

kubectl exec ...Run a command in a container.

kubectl attach ...Attach to a process running inside a container. For example, you might want to get outputfrom the process.

kubectl run ...Create a deployment or a job by running a specified image.

FAQsThese frequently asked questions (FAQs) include common installation conditions and operationalobservations.

My installation does not have all the components installed.

If you invoked the installer with the decks-install apply command, the decks-installsync command is safe to use to resume an existing installation. The command tries to install theremaining components. You can run the decks-install sync command more than once.

Troubleshooting

Dell EMC Streaming Data Platform Installation and Administration Guide 161

Page 162: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

My uninstall failed. I still see components in helm list.

If the decks-install unapply command fails, try it again. If the rerun does not solve yourproblem, you can manually remove the charts using helm del --purge --no-hooks chartname. Then retry the decks-install unapply command. This command is necessary in theuninstall process because it deregisters the custom resource definitions that are used with theproduct. Finally, after decks-install unapply runs successfully, delete the namespacesmanually using kubectl delete ns catalog nautilus-system nautilus-pravega.

My pods are showing that they cannot pull the images.

For sites do not have access to image repositories and registries, the installer comes with a tar ballof docker images. You can register those images in your local registry where the pods can pullthem successfully.

My pods are showing as running but the user interface displays errors.

The running status indicates the readiness of the pod. The status does not indicate anythingabout errors concerning the applications or services running within the pod. To research errors,check pod logs with the kubectl logs command. The logs include a timestamp that you cancorrelate with the logs in the other pods to sequence together the chain of actions.

Is there a way to see events for a specific project or application?

Yes. All projects have their corresponding Kubernetes namespace. You can refine kubectl getevents to get events only from the namespace that corresponds to a project. Also, the userinterface lists system events and logs that correspond to a single application.

My logs show intermittent failures but my pods are all healthy.

Check to see that your applications and services are reachable by both names and IP addresses.This check holds true for ingress, proxy, load-balancer and all the pods. DNS records, text entries,and registrations must be accurate. The DNS provider may have listings of entries that werecreated during installation. Check with your system administrator or cloud services provider. Youcan also ping the pods and connect to the services from the containers to ensure that they arereachable within the cluster.

My pods complain that the volume mounts failed.

Delete the pod. Deleted pods come back, and volume mounts are refreshed.

My Keycloak service broker complains that Keycloak is not available.

Most likely, Keycloak is running, but is not resolving by name. Check to see that the Keycloakendpoint is accessible from the keycloak-service-broker. Otherwise, uninstall and reinstallthe product.

My user interface does not install. I get a 503 service unavailable or 404 default backenderror.

The User Interface is not installed properly. You can use the helm chart to delete the UserInterface and install it again. Otherwise, uninstall and reinstall the product . A 404 error in the UserInterface implies that the ingress named nautilus-ui in the nautilus-system namespace isnot set up properly. When you uninstall and reinstall the product, the ingress is set upautomatically.

Troubleshooting

162 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 163: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

My ingress and services do not have IP addresses.

This condition may occur if the public IP pool is exhausted. For example, the NSX-T environmentdefines a public IP address pool. To check for this issue, run the following command, and look forthe value pending in the public IP column.

kubectl get svc -n nautilus-system

A Network Administrator may be able to resolve this issue.

My DNS records are not showing, or the installer is not adding these records.

1. The external-dns credentials for the DNS provider may be incorrect, or you may haveexceeded the rate limit. For example, with a Route53 DNS service provider, there is a rate limitof 5 requests per second per account. To research the issue, check the logs for theexternal-dns pod.

2. Ensure that unique values are used for txtOwnerId and domainFilters in the external-dns configuration. If the same values are used across clusters and policy is set to sync, a newcluster could overwrite all entries with the same txtOwnerId.

yaml txtOwnerId: "<hostname>.<domain>" ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) # if sync policy is used Please make sure the txtOwnerId has to be unique to the cluster, using <hostname>.<domain> ensures uniqueness policy: sync domainFilters: [<hostname>.<domain>]

My DNS records are showing, but the DNS records are not propagated.

Run nslookup keycloak.<domain>. If it resolves, see if it is resolving from the pod networkas well. Start the dnstools pod using this command: kubectl run -it --rm --imageinfoblox/dnstools dnstools. From the dnstools pod spawned above, run nslookupkeycloak.<domain>. If it is not resolving from the dnstools pod, contact an administrator tolook into the network configuration of the cluster.

The cert-manager does not issue the certificates.

If the certificate issuer is Let's Encrypt, and you see an entry for Keycloak in the ingress withkubectl get ingress -n nautilus-system, then check if the ingress has a certificate. Ifthe certificate is issued, the output of keycloak-tls certificate issue should be asfollows:

kubectl get secret keycloak-tls -n nautilus-system -ojsonpath="{.data.tls\.crt}" | base64 --decode | openssl x509 -text|grep Issuer Issuer: C=US, O=Let's Encrypt, CN=Let's EncryptAuthority X3 CA Issuers - URI:http://cert.int-x3.letsencrypt.org/

If the output looks like the following instead, then check the cert-manager pod logs for errormessages. A limit may exist, such as 50 certificates per week per domain.

Issuer: O=cert-manager,CN=cert-manager.local

If you are frequently installing and uninstalling the product, remember to reuse the hostname.Certificate reissues do not count towards the certificate limit. If the certificate was reissued

Troubleshooting

Dell EMC Streaming Data Platform Installation and Administration Guide 163

Page 164: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

already several times, and you see a message like the following, then use another domain that hasnot reached the certificate limit.

urn:ietf:params:acme:error:rateLimited: Error creating new order ::too many certificates already issued for exact set of domains:.com: see https://letsencrypt.org/docs/rate-limits/""key"="nautilus-system/nautilus-ui-tls-2473631805"

If you are using self-signed certificates, you would see the following message instead, and you canuse the same steps as above.

kubectl get secret cluster-wildcard-tls-secret -nnautilus-system -o jsonpath="{.data.tls\.crt}" | base64 --decode | opensslx509 -text|grep Issuer Issuer: CN = Self-Signed CA

Is Keycloak ready with TLS?

Check whether you can connect to Keycloak:

kubectl get ingress keycloak nautilus-systemopenssl s_client -showcerts -servername <endpoint> -connect <ipaddress>:443

Then, check that the certificate in the secret is the same as the certificate that Keycloak returns:

kubectl get secret cluster-wildcard-tls-secret-n nautilus-system -o jsonpath="{.data.tls\.crt}" | base64 --decode | openssl s_client -showcerts -servernamekeycloak.cluster1.desdp.dell.com -connect10.247.114.101:443

My pods are in the containerCreating state.

Check whether the issue is local to the pod. For example, the following message indicates thatnautilus-ui is waiting for its secrets. It waits until the keycloak-service-broker startsservicing the service instance requests.

MountVolume.SetUp failed for volume "nautilus-ui" : secrets "nautilus-ui" not found

Check the keycloak-service-broker logs. Eventually after timeout, the keycloak-service-broker starts servicing and creates the secrets.

The cert-manager is not renewing certificates when the LetsEncrypt provider is used.

In certain cases, when cert-manager cannot connect to LetsEncrypt, certificate orders toLetsEncrypt get stuck in the pending state indefinitely. If you see that expired certificates are notrenewing, delete the orders that are in a pending state. The cert-manager then creates new ordersautomatically and completes the certificate renewal process.

kubectl get orders -n nautilus-system kubectl delete order <ordername> -n nautilus-system

Troubleshooting

164 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 165: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

A reader job fails on a Flink cluster.

When you see that a reader job is not progressing (no longer reading), check if it is stuck onUnknownHostException. Some of the symptoms you may see are:

1. Flink Dashboard shows jobs in a Restarting status instead of Running.

2. Flink job manager continuously throws UnkownHostException with message "Temporaryfailure in name resolution".

When the job manager is in this state, it will not recover from exceptions even after you fix theDNS issues which might have caused the resolution errors.

Here is a workaround example:

kubectl get sts -n longevity-0|grep jobmanager longevity-0-jobmanager 1/1 8d kubectl scale sts longevity-0-jobmanager -n longevity-0 --replicas=0kubectl scale sts longevity-0-jobmanager -n longevity-0 --replicas=1

After using the workaround, check the logs and the Flink dashboard to see if jobs have a status ofRunning. Here is an example kubectl command that checks the logs:

kubectl logs -f longevity-0-jobmanager -nlongevity-0 -c server

Application connections when TLS is enabledThis section describes TLS-related connection information in Pravega and Flink applications.

When TLS is enabled in Streaming Data Platform, Pravega applications must use a TLS endpoint toaccess the Pravega datastore. The URI used in the Pravega client application, in theClientConfig class, must start with:

tls://:443

If the URI starts with tcp://, the application fails with ajavax.net.ssl.SSLHandshakeException error.

To obtain the Pravega ingress endpoint, run the following command.

kubectl get ing pravega-controller -n nautilus-pravega

The HOST column in the output shows the Pravega endpoint.

Online and remote support

The Dell Technologies Secure Remote Services (SRS) and call home features are available for theStreaming Data Platform. These features require an SRS Gateway server configured on-site tomonitor the platform. The Streaming Data Platform installation process configures the connectionto the SRS Gateway.

Detected problems are forwarded to Dell Technologies as actionable alerts, and support teams canremotely connect to the platform to help with troubleshooting.

Troubleshooting

Dell EMC Streaming Data Platform Installation and Administration Guide 165

Page 166: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Online Support:

https://www.dell.com/support

Telephone Support:

United States: 800-782-4362 (800-SVC-4EMC)

Canada: 800-543-4782

Worldwide: +1-508-497-7901

Troubleshooting

166 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 167: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

APPENDIX D

Installer command reference

The Streaming Data Platform installer tool is a command line executable that installs and uninstallsapplications and resources in a Kubernetes cluster, and configures the cluster.

l Overview..............................................................................................................................168l decks-install apply............................................................................................................... 168l decks-install config set........................................................................................................ 170l decks-install push................................................................................................................ 170l decks-install sync................................................................................................................. 171l decks-install unapply............................................................................................................ 172

Dell EMC Streaming Data Platform Installation and Administration Guide 167

Page 168: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

OverviewThe installer implements the following commands.

decks-install apply

Applies a given manifest bundle (with optional overrides) to a remote Kubernetes cluster.

decks-install unapply

Unapplies a manifest bundle from a Kubernetes cluster.

decks-install sync

Starts a reconciliation loop between applications and Helm releases.

decks-install config list

Lists the current configuration values.

decks-install config set

Sets a config value.

decks-install push

Pushes an image bundle to a registry.

Prerequisites

The installer tool runs in the Kubernetes shell environment, outside of the Kubernetes cluster. Theuser must have Kubernetes administrator privileges.

Note: If you follow the step by step installation instructions, you will meet all of the followingrequirements for using the installer tool.

l The Kubernetes cluster must exist.

l You must have direct or network access to the Kubernetes cluster.

l You must have authentication access rights to the Kubernetes cluster.

l Tiller and kubectl must be installed. Tiller must be running with a binding to the cluster-admin role.

l A default registry must be configured. The installer applies the default registry pathname toany unqualified image names in the application manifest, producing a path name of registry-path/image-name.

l The decks-install push command must be used first, to push images to the defaultregistry. Then you can use the other decks-install commands.

decks-install applyApplies the custom resource definitions (CRDs) and applications specified in a manifest bundle tothe Streaming Data Platform Kubernetes cluster. By default, this command also starts thesynchronization process that installs the Helm charts for each application.

Syntax

decks-install apply --kustomize <manifest-bundle-dir> --repo <helm-chart-dir> [--values <path-to-values.yaml>,<additional-values-file.yaml>,...]

Installer command reference

168 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 169: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

[--dry-run] [--skip-sync] [--simple-output] [--tiller-namespace <namespace>]

Options

--kustomize <manifest-bundle-dir>

Required. Specifies the location of the manifest bundle. Include the slash to indicate adirectory. For example:

--kustomize ./manifests/

The manifest bundle is an artifact delivered in the root of the installer zip file, undermanifests/. Manifest files must conform to Kubernetes Kustomize format, as described here.

--repo <helm-charts-dir>

Required. Specifies the location of the Helm charts directory. Include the slash to indicate adirectory. For example:

--repo ./charts/

The charts are artifacts delivered in the root of the installer zip file, under charts/.

--values <path/to/values.yaml>,<path/to/values2.yaml>[,...]

Specifies the pathnames of configuration values files. Separate multiple file names withcommas and no spaces.

Note: Streaming Data Platform requires a configuration file to define required attributes.

--dry-runCurrently not used.

--skip-syncIf specified, prevents the synchronization process from starting. You can start thesynchronization process later, using the decks-install sync command.The apply step adds CRDs and applications to the cluster in a pending state. Thesynchronization step reconciles the applications to the desired state.

--simple-outputDisplays logs to standard out and standard error, which are typically on the command lineterminal.If this flag is omitted, the command writes logs to decks-install.log and decks-install.stderr.

--tiller-namespace <namespace>

Required if Tiller is installed in a namespace other than the default namespace (kube-system). Tiller installation is a prerequisite to running the desa-install commands. See Create the Kubernetes cluster on page 50 for Tiller installation instructions.

Installer command reference

Dell EMC Streaming Data Platform Installation and Administration Guide 169

Page 170: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

decks-install config setSets a configuration value.

Usage

This command uses a key value pair to set a configuration value for an installation setting.

Syntax

decks-install config set key value

Options

key

A configuration field name. See the configuration file template here.

value

The setting value.

Example 1 Set the registry

The following example sets the container registry.

$ ./decks-install config set registry gcr.io/stream-platform/reg

decks-install pushPushes an image bundle to a configured container registry.

Usage

Configure the registry using the decks-install config command.

The image bundle (a tar archive) for Streaming Data Platform contains several large images. Thepush operation may take hours to complete.

Syntax

decks-install push --input <tar-file>

Options

--input <tar-file>

A .tar file of images. Do not extract files. The installer expects the .tar file as input.

--registry <server/path>

Installer command reference

170 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 171: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

The server .tar file of images. Do not extract files. The installer expects the .tar file asinput.

decks-install syncSynchronizes the Kubernetes cluster to the desired terminal state.

Usage

Synchronization consists of installing, upgrading, reconfiguring or uninstalling components in thecluster as needed to match the desired state for each application. The synchronization procedureends when all components are installed, configured, or removed in accordance with the desiredconfiguration as recorded from previously applied or unapplied configurations. Synchronizationusually takes a few minutes.

A synchronization process begins automatically after you use the decks-install apply ordecks-install unapply command. If the synchronization process fails for whatever reason,use the decks-install sync command to resume the process.

It is safe to restart the synchronization process at any time. Be sure to specify the correctmanifest and repo charts in the restart command.

Syntax

decks-install sync --kustomize <manifest-bundle> [--repo <helm-charts-dir>]

Options

--kustomize <manifest-bundle>

Required. Specifies the path and directory name of the manifest bundle that describes theapplications and resources to synchronize. Include the final slash indicating the directory. Forexample:

--kustomize ./manifests/

--repo <helm-charts-dir>

Required to synchronize application installations. Specifies the path and directory name of theHelm charts . Include the final slash indicating the directory. For example:

--repo ./charts/

Installer command reference

Dell EMC Streaming Data Platform Installation and Administration Guide 171

Page 172: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

decks-install unapplyMarks applications for removal from the Kubernetes cluster and starts the synchronizationprocess. Use this command to uninstall Streaming Data Platform from a cluster.

Syntax

decks-install unapply --kustomize <manifest-bundle-dir> --repo <helm-chart-dir> [--dry-run] [--skip-sync] [--simple-output] [--tiller-namespace <namespace>]

Usage

Use this command if you need to start over with a completely new Streaming Data Platforminstallation due to corruption or a major system failure.

If you run decks-install unapply against the same manifest bundle used for installation, ituninstalls all Streaming Data Platform components and it deletes all Streaming Data Platform data.When command execution completes, you have an empty Kubernetes cluster. You can then startover with a new installation into that cluster.

WARNING In Streaming Data Platform V1.0, this process deletes all user data that wasingested into Pravega.

Options

--kustomize <manifest-bundle-dir>

Required. Specifies the location of the manifest bundle that defines the applications touninstall. Include the slash to indicate a directory. For example:

--kustomize ./manifests/

Manifest files must conform to Kubernetes Kustomize format, as described here.

--repo <helm-charts-dir>

Specifies the location of the Helm charts directory to reconcile with. Include the slash toindicate a directory. For example:

--repo ./charts/

The charts are artifacts originally delivered in the root of the installer zip file, under charts/.

--dry-runCurrently not used.

--skip-syncIf specified, prevents the synchronization process from starting. You can start thesynchronization process later, using the decks-install sync command.The apply step adds CRDs and applications to the cluster in a pending state. Thesynchronization step reconciles the applications to the desired state.

Installer command reference

172 Dell EMC Streaming Data Platform Installation and Administration Guide

Page 173: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

--simple-outputDisplays logs to standard out and standard error, which are typically on the command lineterminal.If this flag is omitted, the command writes logs to decks-install.log and decks-install.stderr.

--tiller-namespace <namespace>

Required if Tiller is installed in a namespace other than the default namespace (kube-system). Tiller installation is a prerequisite to running the desa-install commands. See Create the Kubernetes cluster on page 50 for Tiller installation instructions.

Installer command reference

Dell EMC Streaming Data Platform Installation and Administration Guide 173

Page 174: Dell EMC Streaming Data Platform Installation and ... · Build the infrastructure ... Monitor stream ingestion.....99 Manage users ... distribute Docker images from the Apache Flink

Installer command reference

174 Dell EMC Streaming Data Platform Installation and Administration Guide