data quality services in sql server 2012

23
IN SQL SERVER 2012 DATA TOOLS Data Quality Services

Upload: grant-holly

Post on 24-Dec-2014

47 views

Category:

Technology


1 download

DESCRIPTION

DQS is SQL Server 2012 gives your users tools to help you, yes you the developer, validate and cleanse your data in the ETL que.

TRANSCRIPT

Page 1: Data Quality Services in SQL Server 2012

IN SQL SERVER 2012 DATA TOOLS

Data Quality Services

Page 2: Data Quality Services in SQL Server 2012

A bit about me

Hi, I’m Grant HollyI’m from Portland, ORI love the outdoors, good beers,mathematics, metal working,and I’m a huge SQL geek

Page 3: Data Quality Services in SQL Server 2012

Pictures! + Relevant XKCD

I SQL

Page 4: Data Quality Services in SQL Server 2012

What I do

Technical training and consulting in: DB administration DB development DB performance tuning ETL / SSIS SSAS cubes and tabular models SSRS reports

Page 5: Data Quality Services in SQL Server 2012

How to get ahold of me

Email: [email protected] (us): 360.355.7861

Page 6: Data Quality Services in SQL Server 2012

Why are we doing today?

What is this DQS thing?Why not just keep doing what we’re doing?Ok, so why use it?You mean other people can help me enforce

data quality?I’m sold! Let’s set this bad boy up!Wait, I have a question.

Page 7: Data Quality Services in SQL Server 2012

What is “Data Quality?”

Businesses need reliable dataReliability means both accuracy and

availabilityQuality issues: invalid data, inconsistent

format, duplication, etc.Especially important as the number and

diversity of data sources increase

Page 8: Data Quality Services in SQL Server 2012

What is Data Quality Services?

New to SQL Server 2012 Data Tools (BIDS)Designed to ease data stewardshipDesigned to be end-user-ableSpread the data validation work (love?) outIntegrate into ETL process

Page 9: Data Quality Services in SQL Server 2012

How does it work?

“Domain based” focusUses knowledge basesRemove erroneous or improperly formatted

data

Page 10: Data Quality Services in SQL Server 2012

How does it work

Choose from reference knowledge basesExplore a sample of dataData stewards make the call on “liners”

Page 11: Data Quality Services in SQL Server 2012

A picture!

Page 12: Data Quality Services in SQL Server 2012

How does it work?

Can find and fix duplicated dataDuplicates evaluated based on policies and

thresholds

Page 13: Data Quality Services in SQL Server 2012

Another picture!

Page 14: Data Quality Services in SQL Server 2012

What we did / are doing in place of DQS?

LookupsPL SQL / T SQLScripting

Page 15: Data Quality Services in SQL Server 2012

How can DQS help?

Dedicated tool for validationInterface aimed at end-usersOffload work from ETL stream

Page 16: Data Quality Services in SQL Server 2012

Cleansing data

Validating dataKnowledge bases start out pretty good!Knowledge bases grow over time with user

inputLess complicated than scripting

Page 17: Data Quality Services in SQL Server 2012

De-duplicating data

Define policiesDefine thresholdsMuch less overhead than using lookups!

Page 18: Data Quality Services in SQL Server 2012

Reference Data Services

Can integrate trusted 3rd party dataData used as a reference to check againstCan be built into knowledge basesWorks with Azure marketplace (if you’re into

that kind of thing)

Page 19: Data Quality Services in SQL Server 2012

Automation through SSIS

DQS Cleansing transform in SSISCan be used in-line with other ETL packagesImplement DQS on data sources or

destinationsAutomate DQS with SQL Server Agent

Page 20: Data Quality Services in SQL Server 2012

What’s the catch?

Interface feels a little “1.0”Requires a DQS database to be setupAdmins still have to define knowledge bases

and policies / thresholds for de-duplication

Page 21: Data Quality Services in SQL Server 2012

Requirements

DQS client installDQS database connected to clientsUser trainingStakeholder collaboration

Page 22: Data Quality Services in SQL Server 2012

Recap

DQS gives stewards tools to put quality validated data inline with your ETL process

Specialized tool to offload validation and de-duplication

Can cleanse and de-duplicate dataCan be in-lined with DQS transform in SSISCan be automated through SQL Server agent

Page 23: Data Quality Services in SQL Server 2012

OK, real questions