sql 2012 dqs

48
Data Quality Services @LynnLangit

Upload: lynn-langit

Post on 25-Dec-2014

1.949 views

Category:

Technology


0 download

DESCRIPTION

deck from DevTeach V

TRANSCRIPT

Page 1: SQL 2012 DQS

Data Quality Services

@LynnLangit

Page 2: SQL 2012 DQS

Breakthrough Insights = Better BIP

ower

Vie

w

New

sem

antic

serv

er m

odel

for S

SA

S

Dat

a Q

ualit

y S

ervi

ces

Mas

ter D

ata

Ser

vice

s

Col

umn-

stor

e

Inde

x (1

0x-

100x

fast

er)

Sem

antic

S

earc

h &

Fi

leta

ble

Page 3: SQL 2012 DQS

What is Data Quality Services?

A set of tools and services that allow domain experts to improve Data

Quality• Produces result set with suggested improvements• Does NOT change source data

Page 4: SQL 2012 DQS

Why Use DQS?

• Manually define, match , cleanseSME input

• Programmatically “”, then manually approve

• Can ‘learn’

Machine Cleansing

• Can incorporate 3rd party data• Can integrate with other data processes

(SSIS)Integration

Page 5: SQL 2012 DQS

When to use DQS (scenarios)Issue Detail

Completeness Is all information present?

Conformity Is all data in the correct format?

Consistency Do values represent the same meaning?

Accuracy Do data objects represent their real-world values?

Validity Do data values fall within acceptable ranges?

Duplication Are there multiple copies of the same data?

Page 6: SQL 2012 DQS

DQS Architecture

Page 7: SQL 2012 DQS

Installing DQS

SQL Server 2012

BI Edition

Enterprise edition

Not installed by default

Client / Server / SSIS task

Grant 1 of 3 DQS roles on the DQS_Main db

Make your data accessible for SQL operations

Enable TCP/IP for remote DQS

Post Install

Must run ‘DQS Server Installer’ post SQL Install

Do MDS integration

DQS CU1

Page 8: SQL 2012 DQS

DQS Components on SQL Server 2012

Page 9: SQL 2012 DQS

Data Quality Services client interface

Page 10: SQL 2012 DQS

How to Use DQS?

List of Basic Steps• Create/Refine/Use a Knowledge Base• Perform a Data Quality Evaluation• Generate output (results)

• List of Components• DQS Server• DQS Client(s)

Page 11: SQL 2012 DQS

How to Use DQS? Start with the KB

Knowledge Bases

• Can use included KB• Can refine included KB• Can create KB from

source data• Can manually create

KB

Page 12: SQL 2012 DQS

Parts of DQS KB – Domain Management

Page 13: SQL 2012 DQS

Adding Domain Values

• Correct• Error• Invalid

Page 14: SQL 2012 DQS

More on Domain Values

• Link as synonyms• Set as leading value

Page 15: SQL 2012 DQS

Regular or Composite Domains

Page 16: SQL 2012 DQS

More about KB Domain Management

• Domain Properties – Description, Language…• Reference Data – relate to 3rd party data• Domain Rules – RegEx/length, etc…rule-based• Domain Values – shows substitute values• Term-Based Relations – common word corrections

Page 17: SQL 2012 DQS

Parts of DQS KB – Knowledge Discovery

Page 18: SQL 2012 DQS

Parts of DQS – Knowledge Discovery – 1/2

Step two – Running Discovery

Page 19: SQL 2012 DQS

Parts of DQS KB – Knowledge Discovery – 2/2

Step three – Correcting Values

Page 20: SQL 2012 DQS

DQS KB – Creating a Matching Policy – 1/3

• Select data to be matched for each domain

Page 21: SQL 2012 DQS

DQS KB – Creating a Matching Policy – 2/3

• Create matching rules per domains• Similar

• set similarity score, when matching score < 60• For numbers, set threshold (% or int)• For dates, set threshold (DD, MM or YY)

• Exact – identical values (score of 100)• Configure Weight, must sum to 100

• Can configure Prerequisites

Page 22: SQL 2012 DQS

DQS KB – Creating a Matching Policy -3/3• Test matching rules per domains

• Click ‘Start’• Review ‘Matching Results’ tabs to compare one or more results

Page 23: SQL 2012 DQS

Matching – See Results

Matching is usually performed AFTER cleansing and is focused on identifying (and removing) duplicates

Page 24: SQL 2012 DQS

More Matching Output

Page 25: SQL 2012 DQS

Using the DQS KB to do Data Cleaning

• Create or Open a Data Quality Project• Map the DQS KB to the new data• Perform Cleansing• Manage / View Results• Export corrected results

Page 26: SQL 2012 DQS

DQS Project -- Cleansing

Page 27: SQL 2012 DQS

DQS Cleaning in Process…

Page 28: SQL 2012 DQS

DQS Cleaning complete

Page 29: SQL 2012 DQS

DQS Cleaning – Manage Results

Page 30: SQL 2012 DQS

DQS Output file Information

Export file column names (with option to include "Data and Cleansing Info“)

XXX_Source - original source column  value XXX_Output - clean column  value XXX_Reason - reason column value was either valid or invalid XXX_Confidence - column confidence percentage returned by the DQS

server algorithms XXX_Status - column processing status (i.e. Correct, New, Invalid, etc.) 

Page 31: SQL 2012 DQS

DQS Administration - General

Page 32: SQL 2012 DQS

DQS Administration – Reference Data

Page 33: SQL 2012 DQS

DQS Administration - Logging

Page 34: SQL 2012 DQS

DQS Integration

List of Integration Points• API? – not at this time• SSIS task• MDS (Master Data Management)

Page 35: SQL 2012 DQS

DQS Cleansing Task in SSIS

Page 36: SQL 2012 DQS

DQS Cleansing Task in SSIS - mapping

For each input column define columns for • Source – contains input values• Output – contains correct or

corrected or invalid output values• Status – contains auto suggest,

correct, invalid or new

Page 37: SQL 2012 DQS

Running Package Status

Page 38: SQL 2012 DQS

DQS SSIS Task

Page 39: SQL 2012 DQS

DQS SSIS Task Complex Example

Page 40: SQL 2012 DQS

What is Master Data Management?

Defining MDS• Central repository for data• Rule-based• Can work with DQS

Page 41: SQL 2012 DQS

Types of Data Quality Projects

• Exact matches (WHERE = WHERE <> WHERE IN)• LIKE (%string matching)

T-SQL scripts (boolean match)

• CONTAINSFull-text matching

(semantic word match)

• SEMANTICSIMIALARITIESTABLESemantic Search (semantic phrase match)

• List belowSSIS tasks - (transactional,

multi-valued matching)

• Knowledge Base - rules/matches• Data Quality project - clean / correct dataDQS (KB matching)

• Versioned Entities, Attributes and RulesMDS (One view of truth)

Page 42: SQL 2012 DQS

New since RC0

• Use knowledge import from projects back to your knowledge base (KB) with Cleanse2KB

• Use the Office speller as part of the DQS client• Use Composite Domain rules

to correct values to detect rules violations

• Import values from Excel import values together with their synonyms

• Use unstructured composite domain values? KB parsing is a new feature that takes advantage of your knowledge for a

more accurate parsing

• Modify server log settings through the client UI

Page 43: SQL 2012 DQS

Performance Information

Page 45: SQL 2012 DQS

Next Steps

• Install DQS

• Create a KB

• Try out Data Cleansing

Page 46: SQL 2012 DQS

Related Session(s)

• SQL BI SQL 366 - Understanding Analysis Services in SQL Server 2012 SQL 422 – Integrating Spreadsheets with Enterprise Data SQL 245 - Why Data Warehousing Projects Fail

Page 47: SQL 2012 DQS

www.TeachingKidsProgramming.org

Do a Recipe Teach a Kid (Ages 10 ++)Microsoft SmallBasic Free Courseware (recipes)

Page 48: SQL 2012 DQS

Keep up with Data

Follow me @LynnLangit

RSS my blog www.LynnLangit.com

Hire me• To help build your BI/Big Data

solution• To teach your team next gen BI

with SQL Server 2012