november 10 th, 2011 dqs bootcamp d avid f aibish, s enior p rogram m anager sql s erver d ata q...

22
November 10 th , 2011 DQS BOOTCAMP DAVID FAIBISH, SENIOR PROGRAM MA SQL SERVER DATA QUALITY SERVICES Microsoft SQL Server 2012

Upload: elfreda-webster

Post on 24-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

November 10th, 2011

DQS BOOTCAMPDAVID FAIBISH, SENIOR PROGRAM MANAGER

SQL SERVER DATA QUALITY SERVICES

Microsoft

SQL Server 2012

Page 2: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

Our Day Together …

2

8:15 – 09:00 DQS Overview David9:00 – 11:45 Knowledge Mgmt & CleansingSharon11:45– 12:45 Lunch 12:45 – 14:30 Matching Gadi14:30 – 15:00 SSIS David15:00 – 15:45 Customers Stories & Market Opportunities David16:00 – 16:30 MDS/DQS Integration Andi

16:30 – 17:00 Summary, Feedback and Q&A Yossi17:00 - BYOD & DIY (With help )

Page 3: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

DATA QUALITY 101

Page 4: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

What is Data Quality

Data Quality represents the degree to which the data is suitable for business usages

Data Quality is built through People + Technology + Processes

Bad Data Bad Business4

Page 5: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

5

Top 3 impediments

Source: Information Week Reports, 2011

Why Data Quality is Important

Page 6: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

6

Top Barrier for BI

Source: Information Week Reports, 2011

Why Data Quality is Important

Page 7: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

7

DQ is MDM top driver

Source: Information Week Reports, 2011

Why Data Quality is Important

Page 8: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

8

DQ Market – A Brief Overview

Demand is on the rise.Overall market size for DQ software in 2010 was $800M. 12.6% increase over 2009. Forecasted 16% yearly grow in next five years.

- Gartner, 2011

It’s not only the breadth of functional capabilities.Focus on the business User. Leverage your business resources.

- Gartner, 2011

Business process – For data quality (and MDM) initiatives to be a success – they need to support integration with the existing business processes

20.1%

15.9%

15.2%13.0%

5.3%

30.4% SAS InstituteIBMInformaticaSAPQASOther Vendors

Data Integration market ($2.6B in 2009)Source: Gartner

Page 9: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

9

Common Data Quality Issues

Data Quality

Issue Sample Data Problem

Standard Are data elements consistently defined and understood?

Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system

Complete Is all necessary data present?

20% of customers’ last name is blank, 50% of zip-codes are 99999

Accurate Does the data accurately represent reality or a verifiable source?

A Supplier is listed as ‘Active’ but went out of business six years ago

Valid Do data values fall within acceptable ranges?

Salary values should be between 60,000-120,000

Unique Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?

Page 10: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

10

DQ Issues and DQ Dimensions

Name Gender Street House # Zip code City State D.O.B

John Doe Male 60th street 45 New York New York 08/12/64

Jane Doe Male Jonathan ln 36 10023 Poughkeepsy NY 21-dec-1954

Name Gender Street House # Zip code

City State D.O.B

John Doe Male E 60th St 45W 10022 New York NY 08/12/64

Jane Doe Female Jonathan Lane

36 10023 Poughkeepsie NY 12/21/54

Name Address Postal Code City StateJohn Smith 545 S Valley View Drive # 136 34563 Anytown New YorkMargaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New YorkMaggie Smith 545 S Valley View Dr Anytown New YorkJohn Smith 545 Valley Drive St. 34253 NY NY

Name Address Zip Code City State ClusterJohn Smith 545 S Valley View Drive # 136 34563 Anytown New York 1Margaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New York 1Maggie Smith 545 S Valley View Dr Anytown New York 1John Smith 545 Valley Drive St. 34253 NY NY 2

Before

Before

After

After

Completeness

Accuracy Conformity Consistency Uniqueness

Page 11: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

11

Components of Data Quality Solutions

11

Amend, remove or enrich data that is incorrect or incomplete. This includes correction, enrichment and standardization .

Identifying, linking or merging related entries within or across sets of data.Cleansing Matching

Profiling MonitoringAnalysis of the data source to provide insight into the quality of the data and help to identify data quality issues.

Tracking and monitoring the state of Quality activities and Quality of Data.

Page 12: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

INTRODUCEDQS

Page 13: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

AlwaysOn Reliable Secondaries

FileTableColumnStore Index

15k Partitions

SQL Server Data ToolsPower View

BI Semantic Model

Data Quality Services

Full-text Search Performance

Distributed Replay

Reporting Alerts

ODBC Driver for Linux

Statistical Semantic SearchWindows Server Core Support

Multiple Secondaries

Availability GroupsDefault Scheme for Windows Groups

T-SQL Enhancements

Full Globe Spatial

SSMS to Windows Azure Platform

PowerPivot Enhancements

Master Data Management Excel Add-in

PowerShell 2.0 Support

PHP & Java Connectivity

SQL Audit for All Editions

CDC Support for SSIS

New SSIS Design Surface

Online Operation Enhancements

Multi-site Clustering

Unstructured Data Performance

Resource Governor Enhancements

Database Recovery Advisor

HA for StreamInsight

Flexible Failover Policy

Extended Events Enhancements

Contained Database Authentication

SharePoint Active Directory Support

SQL Server Express LocalDB

User-defined Audit

Audit Filtering

Audit Resilience

FTS Support for Czech & Greek

AlwaysOn Connection Director

Ad Hoc ReportingSSIS Troubleshooting

SSIS Package Management

T-SQL Debugger Enhancements

Spatial 2D Support

Unstructured Data Performance

Page 14: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

14

Key Points - DQS

High quality data is critical to effective business intelligence and to business activities

DQS is an on-premise Data Quality product in SQL Server 2012, extendible with knowledge from multiple parties thru Azure DataMarket

Richer DQ knowledge and capabilities in the cloud will make it even easier to provide high quality data

Data Quality Services (DQS) is a Knowledge-Driven data quality solution enabling data

stewards to easily improve the quality of their data

Page 15: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

Microsoft’s DQS Solution Concepts

Knowledge-Driven

Semantics

Knowledge Discovery

Based on a Data Quality Knowledge Base (DQKB)

Data Domains capture the semantics of your data

Acquires additional knowledge the more you use it

Open and Extendible

Easy to use

Add user-generated knowledge & 3rd party reference data providers

User experience designed for increased productivity

Page 16: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

DQS Process

16

Build

Use

DQ Projects

KnowledgeManagement

Match & De-dupe Correct

& st

andardize

Manage Knowledge

Connect

EnterpriseData

ReferenceData

Cloud Services

IntegratedProfiling

Notifications

ProgressStatus

KnowledgeBase

Disco

ver /

Exp

lore

Data

Page 17: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

DQS Architecture

Matching

Reference Data

DQ ClientsDQS UI

DQ Server

DQ Projects Store

Common Knowledge Store

Knowledge Base Store

DQ Engine

3rd Party / Internal

MS DQ Domains

Store

Reference Data

Services

Reference Data Sets

SSIS DQ Component

DQ Active Projects

MS Data

Domains

Local Data

Domains

Published KBs

Knowledge Discovery

Data Profiling & Exploration

Cleansing

Knowledge Discovery

and Manageme

nt

Interactive DQ

Projects

Data Exploration

Azure Market Place

Categorized Reference Data

Categorized Reference Data

Services

Reference Data API(Browse, Get,

Update…)

RD Services API

(Browse, Set, Validate…)

MDS Excel Add in

Future Clients –

Excel, Dynamics

Page 18: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

DQS Empowers the users

Define Manage Coordinate Measure Continuously Improve Control and Monitor

With DQS the IW / Data Expert can get actively involved in Data Quality initiatives

Page 19: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

DQS Value Proposition

Knowledge-Driven• Rich semantic Knowledge Base• Continuous improvement as

knowledge is discovered• Build once, reuse for multiple

DQ improvements

Open and Extendible

Easy to use

• Focus on cloud-based Reference Data

• User-generated knowledge• Integration with SSIS and MDS

• Focus on productivity and user experience

• Designed for business users• Out-of-the-box knowledge (DQ

content)

Page 20: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

Resources

www.microsoft.com/teched

Sessions On-Demand & Community Microsoft Certification & Training Resources

Resources for IT Professionals Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet http://microsoft.com/msdn

Learning

http://northamerica.msteched.com

Connect. Share. Discuss.

Page 21: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

Additional DQS Resources

DQS Blog

Tips, tricks and guidance on best

practices for using DQS – courtesy of

the DQS team

DQS Movies

A set of getting started movies for

an easy introduction to DQS

DQS Forum

Come participate in DQS related

discussions in our DQS forum on MSDN

Available Hereblogs.msdn.com/b/dqs

Available Here

Page 22: November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.