bo - data services & information steward
TRANSCRIPT
IBsolution Academy – WebinarIBsolution Academy – Webinar
BO – Data Services & Information Steward
1
12.11.2015, Goran Deliyski, IBsolution GmbH
IBsolution Academy – Webinar
This webinar is suitable for:
• application developers
• data consultants
• database administrators
• project managers
• data management solution architects
2
IBsolution Academy – Webinar
What will you learn:
• Basic concepts of the SAP Data Services and how it works
• Step-by-step how to accomplish a duplicate check task
• Performance optimization techniques
3
IBsolution Academy – Webinar
Your host
Goran Deliyski
IBsolution Bulgaria
Data Services & Information Steward Consultant
4
IBsolution Academy – Webinar
Agenda
• Introduction ETL
• Get to know the SAP Data Services
• SAP Data Services Designer features and functionality
• Information Steward data profiling capabilities
• Performance optimization techniques
• Demo
• Questions and Answers
IBsolution Academy – Webinar
Introduction ETL
Data Services is a graphical
interface for creating and
staging jobs for data integration
and data quality purposes.
IBsolution Academy – Webinar
SAP Data Services Designer Features and
Functionality
Data Services object types
Projects (single use)
Jobs
Work flows
Data flows
Scripts
Sources and targets
File formats
IBsolution Academy – Webinar
Main transformations and their characteristics
Different categories
Data Integrator Data Quality Platform Text Data Processing
IBsolution Academy – Webinar
Main transformations and their characteristics
Query Transformation
• The most often used transformation
• Already configured and mapped
Query transformation, found in a Data flow
• Double click to open the Editor
IBsolution Academy – Webinar
Query Transformation characteristics
• The Query transform is used to map
source and target columns.
• Numerous functions can be applied
• Select only unique rows
IBsolution Academy – Webinar
Query Transformation characteristics
• More than one sources can be joined with Query transformation
• The Join condition is specified in the ‘From’ tab
• Filter conditions can be applied in ‘Where’ tab
IBsolution Academy – Webinar
Main transformations and their characteristics
Case transformation
• The Case transformation separates
input data rows into multiple output data sets
Numerous outputs are possible:
- 2 targets
- 3 or more targets
IBsolution Academy – Webinar
Main transformations and their characteristics
• Merge Transformation
• The Merge transform combines incoming data sets
with the same schema structure to produce a single
output data set with the same schema as the input data sets.
• The Merge transform performs a union. All sources must have the same schema including:
• Number of columns
• Column names
• Column data types
IBsolution Academy – Webinar
Main transformations and their characteristics
Data Cleanse Transformation
The Data Cleanse transform is used to perform
parsing and standardizing.
• Parsing identifies individual data elements and breaks them down into their component parts. It rearranges data elements in a single field or moves multiple data elements from a single data field to multiple discrete fields.
• Standardization includes business rules around formats, abbreviations, acronyms, punctuation, greetings, casing, order, and pattern matching – all examples of elements you can control to meet your business requirements.
IBsolution Academy – Webinar
Data Cleanse Transformation
Three tabs:
• Input where you map the fields you want to standardize and/or parse
• In Options tab you define what standardization logic you want to apply to a whole column
• The Output tab contains numerous output columns
IBsolution Academy – Webinar
Data Cleanse Transformation
Use case example
• Source table content
• Selected Output fields
• Result of standardization and parsing
IBsolution Academy – Webinar
Main transformations and their characteristics
Match transformation
• Match Criteria:
Match criteria refers to the field you want to match on. You can use criteria options to specify business rules for matching on each of these fields. They allow you to control how close to exact the data needs to be for that data to be considered a match.
• Match score
• No match score
• Contribution
• Output tab
• Input tab • Options tab
IBsolution Academy – Webinar
SAP Information Steward
Data Insight use cases
Data Quality analysis in form of:
• Data profiling
• Validation Rules
A type of business rule that checks whether the data complies with the business constraints and requirements
IBsolution Academy – Webinar
SAP Information Steward
Create new rule
• Parameters and Conditions
• Rule binding
IBsolution Academy – Webinar
Performance Optimization
ETL performance is all about efficiency, and enabling ETL and database engines to process quickly by doing fewer costly operations.
Let the Data base do the hard work!
• Push Down Operations
Applicable for database sources and targets
To optimize performance, the software pushes down as many SELECT operations as possible to the source database and combines as many operations as possible into one request to the database.
Operations within the SELECT
Aggregations, Distinct rows, Filtering, Joins, Ordering, Projection, Functions
IBsolution Academy – Webinar
Performance Optimization
Improving throughput• Use caching as much as possible –limit the number of times the system must access the database
• Bulk load to the target
• Minimize extracted data
• Increase the Degree of parallelism
• Change array fetch size • Increase the rows per commit
IBsolution Academy – Webinar
IBsolution Academy Certificate
Individual certificate for every attendee:
• Watch the webinar
• Take the multiple choice test
• Get 8 out of 10 questions correctly
To the test http://bit.ly/1Hq6qkH