data schema registry sap ixpproject 2020

17
INTERNAL Heet Rajesh Palod August 03, 2020 Data Schema Registry SAP iXp Project 2020

Upload: others

Post on 06-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Schema Registry SAP iXpProject 2020

INTERNAL

Heet Rajesh PalodAugust 03, 2020

Data Schema RegistrySAP iXp Project 2020

Page 2: Data Schema Registry SAP iXpProject 2020

2INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

q Why?

q Objectives & Goals

q Solution: Challenges & Outcome

q Business Impact

Setting the Agenda

Page 3: Data Schema Registry SAP iXpProject 2020

Why? What’s the problem?

Page 4: Data Schema Registry SAP iXpProject 2020

4INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

SAP Concur has thousands of data schemas at rest and in motion in various systems.

This creates following problems for developers working with data –

Ø Lack of shared understanding of o privacy, o security,o compliance requirements

when storing and processing data.

Ø Lack of shared understanding of how data objects relate to each other.

What’s the problem?

Page 5: Data Schema Registry SAP iXpProject 2020

Objectives & Goals

Page 6: Data Schema Registry SAP iXpProject 2020

6INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

q Collaborate and partner with different teams to gather requirements.

q Identify the pain-points and overcome them by charting strategies and actionable solutions.

q Design and propose a new data modeling application to enable sharing of data, via a centralized query-able source of schema knowledge.

q Develop an end-to-end application to automate decisions about correctly handling data as it lives and moves within the systems.

q Minimize resource consumption like time, memory, efforts, etc.

Objectives & Goals

Page 7: Data Schema Registry SAP iXpProject 2020

Solution: Challenges & Outcome

Page 8: Data Schema Registry SAP iXpProject 2020

8INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Should we use relational database? RDBMS demands multiple tables with multiple foreign keys.

Nested SQL queries and complex joins can become unwieldy while navigating through data, and will not perform well as the size of data grows over time.

Amazon Neptune, a fully managed graph database service, uses graph structures such aso nodes (data entities), o edges (relationships), o and properties

to represent and store data.

Graph-Knowledge Powered Solution…

Challenge

Solution

Page 9: Data Schema Registry SAP iXpProject 2020

9INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Schema-managed Solution…

Employee

employee_id

employee_name

Challenge

Solution

How to manage and store the schema?

Model schema data as a graph with data fields as vertices and relationships as edges.

label = Domain Object

Employee Report

report_id

belongs_

to belongs_to

belongs_tobelongs_to

belongs_to

contains

source = employee

_db

name = Employee

id = 2344-2123-

12212342

contains

classification = PII

name = employee_id

id = 2323-3343-

34343355

label = Domain Object

label = Field

label = Field

label = Field

data_type = long

Page 10: Data Schema Registry SAP iXpProject 2020

10INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

How to manage data interaction?

RESTful API layer on the graph database enables users to query and retrieve the data as and when they need.

v PubSub/Data Platform engineer - can query the data classification level for a particular field/set of fields to make decisions about -o Who the data can be shared with?o How the data can be shared with them?o What compliance responsibilities the consumer takes on with

receiving this data?

v Privacy expert - can add/update data classification for any data fields.

v Data owner - can add, update relationships or other metadata about datasets.

RESTful API-Tier Solution…

Challenge

Solution

Page 11: Data Schema Registry SAP iXpProject 2020

Is our solution secured?

v An Amazon Neptune DB cluster can only be created in an Amazon Virtual Private Cloud (Amazon VPC).

v Its endpoints are only accessible within that VPC, usually from an Amazon Elastic Compute Cloud (Amazon EC2) instance running in that VPC.

v Manageably secure..

Challenge

Solution

Secured Solution… In-line with AWS Migration Directive

Page 12: Data Schema Registry SAP iXpProject 2020

12INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Technology Stack

Page 13: Data Schema Registry SAP iXpProject 2020

Business Impact

Page 14: Data Schema Registry SAP iXpProject 2020

14INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Business Impact (1) – Speed, Storage and Security

Challenges• Lack of understanding of

how data objects are related.

• Slow query processing leads to slow data attestation rate.

• Lack of security.

Solutions• Purpose-built to store and

navigate data objects and relationships.

• Graph query boosts processing speed, hence improved data attestation rate.

• AWS facilitates better security.

Page 15: Data Schema Registry SAP iXpProject 2020

15INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Business Impact (2) – Data Governance and Efforts

Reactive Approach• Manually perform the data

governance-related activities time and again.

• Demands human efforts and hence prone to human error.

Proactive Approach• Automate the data

governance-related activities to meet the dynamic data compliance laws and regulations of the governments.

• Reduces manual efforts by >50%.

Page 16: Data Schema Registry SAP iXpProject 2020

16INTERNAL© 2020 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Business Impact (3) – Data, Metadata and Structure

Challenges• Contains duplicated data.• Schema-less structure, no

metadata and hence requires high memory storage.

Solutions• De-duplicates data by

maintaining a graph.• Schema structure allows

storing metadata, re-use of existing data; and hence saves memory storage by >70%.

Page 17: Data Schema Registry SAP iXpProject 2020

Thank You.Heet Rajesh PalodSoftware Developer, SAP iXp InternSeattle, WA

Contact information:[email protected](206)-697-6374