securing the data-to-information pipeline · 2016-05-09 · securing the data-to-information...

14
Securing the data-to-information pipeline Critical security capabilities for a Connected Information Platform

Upload: others

Post on 07-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

Securing the data-to-information pipelineCritical security capabilities for a Connected Information Platform

Page 2: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

2 | www.paxata.com

© Paxata, Inc. All rights reserved. The Paxata logo and brand trademarks used herein are

owned by Paxata. Other company and product names used herein may be trademarks of

their respective owners. CJ0115-0416

Abstract

Paxata is the first self-service data preparation solution designed to support the needs of both non-technical business analysts and the world’s most demanding IT teams. Business analysts trust the intuitive self-service user interface to transform business data into usable information with comprehensive data integration, data quality, enrichment, governance, and collaboration. IT administrators trust the scalability and security of the Paxata platform which also leverages their existing investments.

Whether you are already underway with a Paxata proof of concept or are just gathering vendor capabilities to address an RFP, security should absolutely be a critical decision point. This document addresses how the Paxata self-service data preparation solution provides comprehensive security for the enterprise with identity and access management, encryption for data at rest and in motion, and complete data governance with auditing, versioning, and lineage.

Page 3: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

3 | www.paxata.com

Securing the data-to-information pipelineTable of contents

Abstract 2Paxata enterprise security infrastructure 4Identity and access management 4

Authentication options 5 Username and password authenticated using LDAP 5 PKI certificate-based authentication using LDAP 5 SSO login with SAML 5 User and group provisioning 6 Manual Provisioning 6 Dynamic Provisioning 6 Two-level authorization model 7 Granular access control to resources 7 Dynamic updates 7 Multi-tenancy 8 Inter-component authentication and access 8Encryption 10 Credentials 10 Data encryption 10 Encryption at rest 10 Encryption in motion 11Governance 11 Accountability and Permission-Based Access 11 Audibililty and Transparency 12 Lineage and Repeatability 13Summary 13

Page 4: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

4 | www.paxata.com

OverviewThis document addresses how the Paxata self-service data prep solution provides comprehensive security for the enterprise in these areas:

• Identity and access managementº Centralized authentication with

Lightweight Directory Access Protocol (LDAP) or SmartCard (Public Key Infrastructure / PKI)

º Single Sign On (SSO) support using SAML

º Unified role-based authorization and resource-level granular permissions

º Multi-tenancy for targeted deployment

º Kerberos for inter-system authentication and SSL support

• Encryptionº Encryption of credentialsº Encryption of data in motion and

at rest• Comprehensive data governance with

auditing, versioning and lineage

Paxata enterprise security infrastructureThe design of the Paxata enterprise-grade secure architecture affects the entire data lifecycle:

• Who can access data• What actions these people are allowed to take on the data• How data is stored at rest• How data moves securely within the system• How data moves out of the system

The following sections discuss how the security infrastructure addresses each of these parts of the data lifecycle.

Identity and access managementThe Paxata platform implements enterprise security standards for user identification, authentication (accessing the application), and authorization (interacting with data).

Administrators must identify people to secure access to resources. Identifying any one person in a large organization is complex for a number of reasons: multiple offices, different geographical locations, diverse access levels, and employee churn.

Everyone else expects quick access to resources and applications to get the job done without technology getting in the way.

To alleviate these burdens, enterprises invest significant resources for unified identity and access management. Organizations expect to leverage their existing infrastructure by integrating with industry standards such Lightweight Directory Access Protocol (LDAP), Security Assertion Markup Language (SAML), and public key infrastructure (PKI).

Identity & access

managementEncryption Governance

Page 5: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

5 | www.paxata.com

Authentication optionsThe Paxata platform supports the most widely used and trusted authentication protocols:

• Username and password authenticated using LDAP • Certificate-based PKI authentication using LDAP • SAML SSO authentication

Username and password authenticated using LDAPThe Paxata self-service data preparation solution enforces username and password authentication using LDAP, the recognized standard for central directories of large enterprises.

When anyone attempts to access the Paxata data preparation platform either through the browser or using the REST API, the platform verifies the user’s credentials against the organization’s LDAP directory service. The platform retrieves a user’s group affiliations from the directory service during every login attempt. A user’s access and entitlements are never out-of-date, which avoids risks faced by locally-installed desktop applications. Customers can administer all aspects of identity and access management using their existing IT investments.

PKI certificate-based authentication using LDAPSome organizations authenticate all users of all internal systems using public key infrastructure (PKI) certificates. PKI authentication consists of a mutual TLS authentication between browser and server, followed by identity verification with a directory service such as LDAP. For organizations that require PKI authentication, the Paxata platform enforces these high security standards required by government and financial institutions.

When certificate-based authentication is in place, accessing the Paxata self-service application no longer prompts users for a username and password. Instead, the web browser prompts users to select a certificate, with support for Smart Cards (CAC) and user PIN numbers.

SSO login with SAMLOrganizations embrace some form of single sign-on authentication (SSO) to enable employees to easily and securely access many services with a single credential (username/password).

A popular standard for SSO is the Security Assertion Markup Language (SAML). SAML is an XML-based, open-standard data format for exchanging authentication and authorization data between two parties through a standard web browser. Paxata supports SAML 2.0 identity providers (IdP), such as Ping Identity and a growing list of others. With SAML, the Paxata platform authenticates with the identity provider but never actually has a copy of the user's password.

3

2

User exists?

present as user

Username/password

<login request>

User-interface action

<response>

1

Member of group(s)?

service account response

Valid password?

service accountresponse

LDAPServer

WebBrowser

A

B

LDAP

2

Member of group(s)?

service account response

1

present as user

User exists?LDAP

ServerWeb

Browser Paxata

User certificate

<login request>

User-interface action

<response>

A

B

PKI

User credentials

WebBrowser

LDAPServer

SAMLIdP

LDAPServer

B

Paxata

Redirects to IdP

A

Request login

C

Redirects to SP

SAML

Page 6: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

6 | www.paxata.com

User and group provisioningApplication administrators can choose how to provision users and groups for the Paxata data preparation platform. Manual provisioning is available in the administrative user interface or through REST API requests. For organizations that use LDAP or SAML authentication, dynamic provisioning is automatic. These choices accommodate the needs of both small and large organizations.

Manual provisioningManual provisioning means that a user must be created by an application administration before that user can successfully login to the Paxata platform. Only an application administrator has permissions to create a user. Manual provisioning is available in the user interface or using the REST API. The initial password must comply with the password complexity rules, which administrators can configure.

Administrators can configure a new group with one or more users. This group identity does not have to correspond to any existing group within the organizational directory. For example, a new manually created group could represent a cross-organizational group that evaluates Paxata for testing purposes, even if this is not a group defined in a central organization directory.

Dynamic provisioningDynamic user provisioning means that users are automatically added upon successful first-time authentication. When people join the organization, they need immediate access to corporate systems and internal data stores. Any time lost is money lost. Similarly, when people leave the organization, their access must be revoked to avoid serious security breaches.

These scenarios are simple with LDAP/SAML integration and dynamic provisioning. An administrator can grant access to Paxata by adding the user to a designated group in their directory service. To revoke or reduce access, administrators only need to remove group or role affiliation in the central directory.

A user’s group and role affiliations are read from the directory service during every successful login. Administrators can confidently enforce group and role access from their LDAP/SAML server.

LDAP:

"The Lightweight Directory Access

Protocol (LDAP) [RFC4510] is a powerful

protocol for accessing directories. It

offers means of searching, retrieving, and

manipulating directory content and ways

to access a rich set of security functions.”

Internet Engineering Task Force (IETF)

SAML:

"To enable a business user to access

many services with only one username/

password combination, you can

implement some form of single sign-on

authentication (SSO). A popular standard

for SSO is a standard called Security

Assertion Markup Language (SAML). SAML

is an XML-based, open-standard data

format for exchanging authentication and

authorization data between parties. SSO

using SAML is similar to integrating with

a centralized Active Directory system and

configuring account provisioning."

Paxata Administration Guide

Page 7: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

7 | www.paxata.com

Two-level authorization modelWithin the Paxata platform, users interact with datasets and projects. Organizations care about two types of access controls:

• What actions can a user perform? Functional permissions define and restrict user actions. Functional permissions are granted to users through customizable roles. Examples of functional permissions are: import new datasets, view projects, edit projects, export datasets to local file. Typically, the functional permissions are defined by administrators using a central directory such as LDAP or SAML.

• What resources can a user access? Resource permissions allow fine-grained access to individual datasets and projects. An example resource permission is the right to view a specific dataset. Resource permissions are granted to users individually or through group membership.

Both roles and groups can be managed locally or through a remote directory service such as LDAP or SAML.

To perform a task, a user must have both functional and resource permissions. The two-level authorization model balances a user’s need for information access with the IT team’s need for governance and control.

For example, a decentralized organization could define the following access-control model:

• Tenant administrator manages functional permissions for all users using LDAP.

• Individual user actions are secured by the tenant administrator. • For each project, the tenant administrator designates a project

lead, who manages resource permissions within each team of individual users.

This model works well in many large organizations because the project leads are the most familiar with their data and user responsibilities.

Granular access control to resourcesThere is no default access in the system granted to any users. Users gain access to actions and resources using explicit permissions granted by groups, roles, and resource permissions.

Users can only grant permissions that they already have. Additionally, any sharing of permissions requires an additional permission to share permissions. No one can ever elevate their own privileges, which prevents a class of serious security vulnerabilities. The only user that can grant all permissions in the system is the superuser.

Dynamic updatesEach time a user logs in, the platform verifies the user’s group and role membership. With LDAP or SAML configuration enabled, the Paxata server contacts the LDAP or SAML server for the the correct set of group and role assignments.

Analyst: A business analyst would likely

have functional permissions to search

for, view, read, and modify projects and

datasets. She would also have resource

permissions to the specific projects she

and her colleagues are working with and

resource permissions to the specific

datasets involved in those projects.

IT: An application administrator would

likely have functional permissions like

access to REST API and the ability to

modify and add user permissions.

Data curator: A data curator would

likely have functional permissions to add

new datasets, view and edit metadata,

and share datasets. He would also have

resource permissions for the datasets

that he has uploaded to the Paxata

Data Library.

Analysts

IT

Curators

Page 8: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

8 | www.paxata.com

Dynamic updates enforce a secure computing environment with centralized enterprise administration. For example, if an administrator removes access to a dataset for a specific group, a user action on that dataset fails immediately if it required privileges from that group assignment.

The Paxata platform verifies that the appropriate assigned permissions are granted for any action. By using groups and roles to define application and data access, administrators can tightly restrict access based on unique business requirements.

Multi-tenancyMulti-tenancy refers to an administrator running multiple logical instances of a particular software application on shared infrastructure. One tenant typically represents an autonomous operational group of users. An example of a tenant would be a department, a subsidiary, or a branch office. Large enterprises today depend on multi-tenancy to make large software deployments cheap and easy to manage.

Because the Paxata platform supports multi-tenancy, tenants within the Paxata platform can reflect a real organizational hierarchy. Each distinct tenant contains its own datasets and projects. A user can never access the data, projects, or configurations of other tenants. The tenant boundary cleanly partitions and segregates confidential data for each tenant within the Paxata platform. Misapplied permissions do not become catastrophic security breaches across teams.

Each tenant maintains its own authentication settings, such as authenticating with native internal authentication, a SAML server, or an LDAP server. Large organizations can enforce diverse authentication requirements by department, such as different LDAP servers for each geography or subsidiary.

Administrators can customize other settings by tenant, some of which affect security. For example, an administrator can limit the type or quantity of data for a tenant to support only the required workflow for each team. Organizations might also configure other settings by tenant, such as defining hardware capacity by team so processing performance goes where it is most critical.

Inter-component authentication and accessModern enterprise applications integrate with other components of an IT environment such as databases, file systems, and LDAP directories. The Paxata platform complies with standard security requirements to authenticate with any necessary IT components on the network. A typical deployment of the Paxata platform connects to a MongoDB database, one or more HDFS data stores, LDAP/SAML directories, and an Apache Spark cluster.

Multi-tenancy: In this example a Spark

cluster with 8 nodes is a shared resource

for 3 teams in an organization. Each of

these teams has its own secured data

store. The Paxata application server can

translate requests from each organization

to access the appropriate data and

use a pool of Spark nodes for data

transformation. In this case, the Marketing

team’s data preparation requests access

to the Marketing data and 4 Spark nodes.

The Finance team’s data preparation

requests access to the Finance data and

4 different Spark nodes in the cluster. The

nodes are allocated dynamically to reflect

requests as necessary.

HR

HR Drive

Node Node Node Node Node

Node Node Node Node Node

MarketingDrive

FinanceDrive

Marketing Finance

Paxata Application Server

Page 9: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

9 | www.paxata.com

HDFS The Paxata platform integrates to HDFS for library storage and supports Kerberos-based authentication. For import and export from HDFS, the Paxata platform supports multiple authentication styles: fixed username, pass-through username, and mapped user names. Optionally, administrators can configure Kerberos HDFS user impersonation. The HDFS server enforces data boundaries for impresonated users without passing separate Kerberos credentials for each user. Paxata platform users only see the data they would normally see from their user ID.

Distributed Processing Engine

Data Prep ApplicationWeb Services

Parallel In-MemoryPipelined Data Prep EnginePowered by Intellifusion™

Data Management in HadoopDistributed File System

LDAP or SAML

MongoDB MongoDB is a robust NoSQL database with wide adoption in enterprise organizations. The Paxata platform stores metadata in MongoDB. The platform supports three forms of authentication with MongoDB:

• Challenge-response (username/password)

• X.509 certificate• KerberosLDAP/SAML

The Paxata platform itself authenticates to LDAP or SAML servers using simple username and password, with support for SSL (which is recommended). Apache Spark

The Paxata platform uses Apache Spark for data processing. Spark enforces all data access authorization rules for each requesting user. For scheduled automation, the authorization rules are enforced at computation time on behalf of a specific user. The Paxata application supports the Apache Hadoop YARN resource manager. Spark can authenticate with YARN using Kerberos.

Authentication and access within the Paxata architecture

Page 10: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

10 | www.paxata.com

“The Advanced Encryption Standard (AES)

specifies a FIPS-approved cryptographic

algorithm that can be used to protect

electronic data. The AES algorithm is a

symmetric block cipher that can encrypt

(encipher) and decrypt (decipher)

information. Encryption converts data to

an unintelligible form called ciphertext;

decrypting the ciphertext converts the

data back into its original form, called

plaintext.”

Federal Information Processing Standards

Publication 197, November 26, 2001

EncryptionEncryption is critical for all enterprise software that deals with data. Data and credentials must be encrypted while at rest and while in motion. When credentials and personally identifiable information (PII) are compromised for an organization, the consequences are critical.

A modern enterprise cannot be too careful in protecting identity assets. There are two ways identity assets become compromised: the lack of a strong encryption or the lack of good identity and access management. Paxata takes these risks seriously.

CredentialsThe Paxata platform stores two types of credentials.

• Credentials that need to be decrypted for outbound connections. These credentials are encrypted using strong encryption (AES-128). Encryption keys are stored using a standard Java keystore for cryptographic keys and certificates.

• The Paxata platform supports native authentication for organizations with no LDAP or SAML infrastructure. Credentials are stored using a one-way cryptographic hash algorithm HMAC-SHA256.

All encryption and hashing algorithms support long key length (256 characters) and addition of cryptographic salt that can be managed by the administrator.

All credentials are transmitted using TLS 1.2 when TLS is enabled. For example, if the Paxata platform is setup to authenticate to the MongoDB database using challenge-response and TLS encryption is enabled, then credentials are encrypted before sending them to MongoDB.

Data encryption

Encryption at restEncrypting data is a more complex problem than encrypting credentials due to the large volume of information. It is important to be effective and efficient with data encryption.

The Paxata platform stores a great deal of data in the Paxata Data Library, which is typically stored on a dedicated folder on an HDFS cluster. There are two basic choices for data encryption:

• OS-level encryption of the data store: OS-level encryption improved drastically in the last decade due to the speed of modern CPUs and ongoing performance tuning in modern operating systems.

• Hadoop HDFS encryption: With HDFS encryption, everything written to HDFS is encrypted. Simple direct access to physical storage volumes is not enough to access the data. HDFS administrators can restrict directory permissions for the Paxata Data Library such that only the user ID that is used by the

Page 11: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

11 | www.paxata.com

Paxata platform can access the data. The Paxata platform uses a Kerberos access ticket to access the data. Without access to the Paxata credentials/keytab, no one else can access the secure encrypted data used by the Paxata platform.

Both encryption options are widely adopted across the modern enterprise. When combined with strong access management policy, administrators can secure data at rest without multiple layers of encryption.

Encryption in motionIn the Paxata platform, datasets move between the Paxata application server, the Paxata Data Library (usually in HDFS), and a Spark cluster. If these three components are co-located in the same firewall partition, no other systems can spy on communication between these components. For import and export, datasets move between the Paxata application server and various data stores, such as other HDFS servers, or databases. Data also moves between the Paxata platform and the Paxata web application or REST API clients.

Paxata supports secured transmission of data with these components:• With MongoDB, metadata can be accessed using TLS 1.2

connections.• With a web browser, the Paxata platform transmits all

communication using a TLS 1.2 connection. • With other import/export systems, the Paxata platform supports

the secured connection option that those environments support.

GovernanceGovernance is as important as accuracy, for both internal best practices and external regulatory auditing. Data governance starts with defining owners of data assets. Clearly defined data custodians are then held accountable for access and integrity of data resources throughout the data lifecycle. It is critical that organizations track data assets throughout the data lifecycle in a fully auditable and transparent manner. Failing compliance reviews can result in severe consequences for the entire organization. When people make mistakes or if security breaches happen, it is important to trace the cause to fix the problem. Further, when people establish good practices, it is important to be able to repeat them precisely and reliably. Every action and resource in the Paxata platform is fully governed through accountability, auditability, transparency, lineage, and repeatability.

Accountability and permission-based accessAs discussed in the previous sections of identity and access management, the Paxata platform’s strict resource and functional permissions protect every asset. Permission-based access extends across the entire platform: the Data Library and its associated export log, all projects, and automation configurations. Beyond access, every specific resource has associated immutable metadata that includes the owner. Every resource is associated with a specific person for complete accountability:

“Governance is as important as accuracy,

for both internal best practices and

external regulatory auditing. Data

governance starts with defining owners

of data assets...Every action and resource

in the Paxata platform is fully governed

through accountability, auditability,

transparency, lineage, and repeatability.”

Page 12: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

12 | www.paxata.com

• Every imported dataset and each subsequent version tracks who uploaded it in the Data Library

• Every export request of any dataset, whether it succeeded or failed, is tracked in the export log

• Every project is associated with the person who created/modified it• Every version of every project is associated with any collaborator

Knowing who is responsible for any resource is only one part of the puzzle. Complete governance requires additional metadata that tracks changes over time for every resource; not just who did it, but what happened, and when.

Auditability and transparencyThe Paxata platform treats every dataset and project as an auditable asset. Every dataset has immutable metadata that tracks who imported it, how it got there from the original source, which columns and records were imported, and when it was fully parsed into the Data Library. New uploaded versions become unique versions with separate metadata about their origin. Original data is untouched by subsequent versions. Distinct versions of both data and metadata are crucial for audits. Additionally, data curators and analysts explore and understand their Data Library with this metadata.

Data preparation steps that transform data do not modify the original dataset. Instead, new versions of the dataset are created with their own metadata and lineage to the original source. Administrators or other authorities can confidently audit the metadata for each dataset, which is critical for compliance audits.

Projects are the mechanism to transform, prepare, and explore data. Every action to the project is recorded as a step within a project. When a user makes any changes to the project, the previous version is preserved forever, and a new immutable time-stamped project version is created. Each project version includes a complete set of the steps and the state of the dataset at that moment. Everyone with permissions to view the project can audit the full history of the project and the details of any prior version. The complete and versioned history of every project helps analysts trace through their actions for repeatability and accountability. Teams can fix mistakes by finding and reverting back to the safe version.

The Paxata platform tracks every action that makes data available to downstream systems. The export log visually displays a complete list of every exported data file and request to expose data through Hive and Impala. This list includes immutable metadata about when, where, and who is associated with each action. A functional permission controls access to view the export log, which ensures only administrators and auditors see potentially sensitive dataset names and metadata. To guarantee security and accountability, all user access to the export log is read-only.

Lineage and repeatabilityData governance within the Paxata platform is implemented by two different notions of lineage. Column lineage is an auditable list of transformations of that column. Dataset lineage lets anyone with

“Each project version includes a complete

set of the steps and the state of the

dataset at that moment. Everyone with

permissions to view the project can audit

the full history of the project and the

details of any prior version. The complete

and versioned history of every project

helps analysts trace through their actions

for repeatability and accountability.

Teams can fix mistakes by finding and

reverting back to the safe version.”

Page 13: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

13 | www.paxata.com

access to the dataset trace the source, such as an external source or a specific Paxata project. The full life cycle of every column of data is traceable. Users with the appropriate access permissions can review every action in a list of data transformations within a project. The original dataset is not directly modified. Instead a project transforms data with an editable series of transformations.

The list of transformations is called the Step Editor. Teams can reuse a single step or a set of steps on different datasets. This means that one project’s Step Editor is a reusable template for future transformations on new incoming version of datasets, or datasets with similar structure. An analyst can apply transformations to a new dataset without having to write any code. The Paxata platform applies existing transformations to the new data and immediately generates a new AnswerSet, which is the project output.

As individuals and teams build and maintain a project, it can be difficult to remember what changed. Every updated set of transformations in a project creates a new version, and users with the right access level can track changes over time. This prevents mistakes in general, and enables meaningful quality control. If a recent change caused problems, such as accidental deletion of transformation steps, users with writable access can revert the project to a previous version of the transformation steps. Versioning maintains traceability.

Within a set of steps, any one transformation step is parameterizable. For example, a data quality check that is applied to a specific column of data can also be repeated and applied to any other columns of data within the same dataset. This technique can allow transformation to be repeated across multiple columns of one or more datasets.

The automation module of the Paxata data preparation platform enhances the reusability of data preparation projects. The automation module gives analysts an intuitive interface for reusing projects or importing new versions of data into the Data Library. Trusted analysts can schedule automate data integration with repeatable extracts from databases, web services, and HDFS. IT teams use these analyst-defined data integration and data transformation templates as virtualized views of data. Because these project reflect relevant organizational needs, IT teams can use these to update and replace existing out-of-date ETL processes.

As with all resources, only trusted analysts and administrators can control automation. There are special fine-grained permissions to edit, view, and run automated processes. Additionally, automation schedules contain metadata of the requesting user. Paxata platform re-checks permissions for the project and dataset at the time of work, which prevents automation jobs from running if the initiating user has reduced or revoked privileges. Repeatable data integration and data transformations are coupled with the complete lineage and permission-controlled access that is present across the entire Paxata platform.

The Paxata self-service data preparation solution builds governance deeply into design, storage, and management. Administrators and data curators can rely on strong permission-based access, auditability, transparency, lineage, and repeatability.

“As individuals and teams build and

maintain a project, it can be difficult to

remember what changed. Every updated

set of transformations in a project

creates a new version, and users with the

right access level can track changes over

time. This prevents mistakes in general,

and enables meaningful quality control.”

Page 14: Securing the data-to-information pipeline · 2016-05-09 · Securing the data-to-information pipeline Table of contents Abstract 2 ... A popular standard for SSO is the Security Assertion

14 | www.paxata.com

Summary Just as data is critical to any modern organization, securing data and application access is equally critical. Paxata delivers self-service data preparation to enterprise customers with the security features that modern organizations demand. Paxata is committed to delivering enterprise-grade security with industry-standard authentication, granular resource authorization, and comprehensive governance.

The Paxata platform maximizes IT investments by supporting security industry standards. Administrators can centrally manage authentication across team boundaries, geographies, time zones, and departments. Administrators have the flexibility to choose how to authenticate access to the Paxata self-service data preparation platform, how to deploy it, and how to configure it for every team. Alongside their other mission-critical applications, information workers access Paxata projects and data quickly and securely. Multi-tenancy is critical for the modern enterprise to manage large and diverse data platforms. Multi-tenant deployments save money, reduce risk, and enforce strong security boundaries. Furthermore, the Paxata platform provides comprehensive governance with accountability, dataset and project versioning, fine-grained step recording, detailed lineage, and complete repeatability for both column-level transformations and complete projects.

If you have questions about how Paxata delivers the security that your organization demands, please contact us at [email protected].

Paxata’s Connected Information PlatformUser Experience (Multi-user, Multi-device, Adaptive)

Connected Information Management Application Web Services

Elastic Parallel In-Memory Pipelined Data Prep Engine powered by IntelliFusion*

Data Library UI REST API Data Prep UI

Comprehensivedata quality

Contextual semanticenrichment

Transparent datagovernance

Agile datacollaboration

Semantic Catalog*

Automation

Machine Learning

Scale-out

Semantics

Optimizer

Columnar Store

WLM

Caching

Operators

Self-service Comprehensive Intelligent

Packaged Apps

RDBMS

Data Warehouse

XML Docs

Flat Files

Web Services

3rd party /External data

Data Sources

Hadoop/Big Data

PredictiveAnalytics

OperationalAnalytics

PackagedApplications

Discoveryand

Visualization

Automated dataintegration