november 10 th, 2011 dqs bootcamp d avid f aibish, s enior p rogram m anager sql s erver d ata q...
TRANSCRIPT
November 10th, 2011
DQS BOOTCAMPDAVID FAIBISH, SENIOR PROGRAM MANAGER
SQL SERVER DATA QUALITY SERVICES
Microsoft
SQL Server 2012
Our Day Together …
2
8:15 – 09:00 DQS Overview David9:00 – 11:45 Knowledge Mgmt & CleansingSharon11:45– 12:45 Lunch 12:45 – 14:30 Matching Gadi14:30 – 15:00 SSIS David15:00 – 15:45 Customers Stories & Market Opportunities David16:00 – 16:30 MDS/DQS Integration Andi
16:30 – 17:00 Summary, Feedback and Q&A Yossi17:00 - BYOD & DIY (With help )
DATA QUALITY 101
What is Data Quality
Data Quality represents the degree to which the data is suitable for business usages
Data Quality is built through People + Technology + Processes
Bad Data Bad Business4
5
Top 3 impediments
Source: Information Week Reports, 2011
Why Data Quality is Important
6
Top Barrier for BI
Source: Information Week Reports, 2011
Why Data Quality is Important
7
DQ is MDM top driver
Source: Information Week Reports, 2011
Why Data Quality is Important
8
DQ Market – A Brief Overview
Demand is on the rise.Overall market size for DQ software in 2010 was $800M. 12.6% increase over 2009. Forecasted 16% yearly grow in next five years.
- Gartner, 2011
It’s not only the breadth of functional capabilities.Focus on the business User. Leverage your business resources.
- Gartner, 2011
Business process – For data quality (and MDM) initiatives to be a success – they need to support integration with the existing business processes
20.1%
15.9%
15.2%13.0%
5.3%
30.4% SAS InstituteIBMInformaticaSAPQASOther Vendors
Data Integration market ($2.6B in 2009)Source: Gartner
9
Common Data Quality Issues
Data Quality
Issue Sample Data Problem
Standard Are data elements consistently defined and understood?
Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system
Complete Is all necessary data present?
20% of customers’ last name is blank, 50% of zip-codes are 99999
Accurate Does the data accurately represent reality or a verifiable source?
A Supplier is listed as ‘Active’ but went out of business six years ago
Valid Do data values fall within acceptable ranges?
Salary values should be between 60,000-120,000
Unique Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?
10
DQ Issues and DQ Dimensions
Name Gender Street House # Zip code City State D.O.B
John Doe Male 60th street 45 New York New York 08/12/64
Jane Doe Male Jonathan ln 36 10023 Poughkeepsy NY 21-dec-1954
Name Gender Street House # Zip code
City State D.O.B
John Doe Male E 60th St 45W 10022 New York NY 08/12/64
Jane Doe Female Jonathan Lane
36 10023 Poughkeepsie NY 12/21/54
Name Address Postal Code City StateJohn Smith 545 S Valley View Drive # 136 34563 Anytown New YorkMargaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New YorkMaggie Smith 545 S Valley View Dr Anytown New YorkJohn Smith 545 Valley Drive St. 34253 NY NY
Name Address Zip Code City State ClusterJohn Smith 545 S Valley View Drive # 136 34563 Anytown New York 1Margaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New York 1Maggie Smith 545 S Valley View Dr Anytown New York 1John Smith 545 Valley Drive St. 34253 NY NY 2
Before
Before
After
After
Completeness
Accuracy Conformity Consistency Uniqueness
11
Components of Data Quality Solutions
11
Amend, remove or enrich data that is incorrect or incomplete. This includes correction, enrichment and standardization .
Identifying, linking or merging related entries within or across sets of data.Cleansing Matching
Profiling MonitoringAnalysis of the data source to provide insight into the quality of the data and help to identify data quality issues.
Tracking and monitoring the state of Quality activities and Quality of Data.
INTRODUCEDQS
AlwaysOn Reliable Secondaries
FileTableColumnStore Index
15k Partitions
SQL Server Data ToolsPower View
BI Semantic Model
Data Quality Services
Full-text Search Performance
Distributed Replay
Reporting Alerts
ODBC Driver for Linux
Statistical Semantic SearchWindows Server Core Support
Multiple Secondaries
Availability GroupsDefault Scheme for Windows Groups
T-SQL Enhancements
Full Globe Spatial
SSMS to Windows Azure Platform
PowerPivot Enhancements
Master Data Management Excel Add-in
PowerShell 2.0 Support
PHP & Java Connectivity
SQL Audit for All Editions
CDC Support for SSIS
New SSIS Design Surface
Online Operation Enhancements
Multi-site Clustering
Unstructured Data Performance
Resource Governor Enhancements
Database Recovery Advisor
HA for StreamInsight
Flexible Failover Policy
Extended Events Enhancements
Contained Database Authentication
SharePoint Active Directory Support
SQL Server Express LocalDB
User-defined Audit
Audit Filtering
Audit Resilience
FTS Support for Czech & Greek
AlwaysOn Connection Director
Ad Hoc ReportingSSIS Troubleshooting
SSIS Package Management
T-SQL Debugger Enhancements
Spatial 2D Support
Unstructured Data Performance
14
Key Points - DQS
High quality data is critical to effective business intelligence and to business activities
DQS is an on-premise Data Quality product in SQL Server 2012, extendible with knowledge from multiple parties thru Azure DataMarket
Richer DQ knowledge and capabilities in the cloud will make it even easier to provide high quality data
Data Quality Services (DQS) is a Knowledge-Driven data quality solution enabling data
stewards to easily improve the quality of their data
Microsoft’s DQS Solution Concepts
Knowledge-Driven
Semantics
Knowledge Discovery
Based on a Data Quality Knowledge Base (DQKB)
Data Domains capture the semantics of your data
Acquires additional knowledge the more you use it
Open and Extendible
Easy to use
Add user-generated knowledge & 3rd party reference data providers
User experience designed for increased productivity
DQS Process
16
Build
Use
DQ Projects
KnowledgeManagement
Match & De-dupe Correct
& st
andardize
Manage Knowledge
Connect
EnterpriseData
ReferenceData
Cloud Services
IntegratedProfiling
Notifications
ProgressStatus
KnowledgeBase
Disco
ver /
Exp
lore
Data
DQS Architecture
Matching
Reference Data
DQ ClientsDQS UI
DQ Server
DQ Projects Store
Common Knowledge Store
Knowledge Base Store
DQ Engine
3rd Party / Internal
MS DQ Domains
Store
Reference Data
Services
Reference Data Sets
SSIS DQ Component
DQ Active Projects
MS Data
Domains
Local Data
Domains
Published KBs
Knowledge Discovery
Data Profiling & Exploration
Cleansing
Knowledge Discovery
and Manageme
nt
Interactive DQ
Projects
Data Exploration
Azure Market Place
Categorized Reference Data
Categorized Reference Data
Services
Reference Data API(Browse, Get,
Update…)
RD Services API
(Browse, Set, Validate…)
MDS Excel Add in
Future Clients –
Excel, Dynamics
DQS Empowers the users
Define Manage Coordinate Measure Continuously Improve Control and Monitor
With DQS the IW / Data Expert can get actively involved in Data Quality initiatives
DQS Value Proposition
Knowledge-Driven• Rich semantic Knowledge Base• Continuous improvement as
knowledge is discovered• Build once, reuse for multiple
DQ improvements
Open and Extendible
Easy to use
• Focus on cloud-based Reference Data
• User-generated knowledge• Integration with SSIS and MDS
• Focus on productivity and user experience
• Designed for business users• Out-of-the-box knowledge (DQ
content)
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
http://northamerica.msteched.com
Connect. Share. Discuss.
Additional DQS Resources
DQS Blog
Tips, tricks and guidance on best
practices for using DQS – courtesy of
the DQS team
DQS Movies
A set of getting started movies for
an easy introduction to DQS
DQS Forum
Come participate in DQS related
discussions in our DQS forum on MSDN
Available Hereblogs.msdn.com/b/dqs
Available Here
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.