Data Quality Services in SQL Server 2012

Introduction

Microsoft SQL Server 2012 Data Quality Services (DQS) is a powerful tool that can help your IT teams (especially QA and testers) detect and prevent data quality issues that affect your customers.

  • Data Quality Services (DQS) is a new knowledge-driven data cleansing solution delivered in SQL Server 2012
  • Supports building Knowledge Bases that support correction and de-duplication of data
  • Knowledge in a Knowledge Base can improve over time and can be reused by many solutions
  • Integration Services (SSIS) 2012 includes a DQS Cleansing transform
  • The Master Data Services (MDS) Excel Add-in can leverage DQS

Business Scenarios

  • Create and manage DQS Knowledge Bases
  • Discover knowledge from the organization’s data
  • Explore and integrate with 3rd party reference data
  • Correct, de-duplicate and standardize data
  • Monitor and control data quality processes

DQS Components

  • DQS consists of a Data Quality Server and the Data Quality Client
  • The Data Quality Server is a SQL Server instance feature that consists of three SQL Server catalogs with data-quality functionality and storage
  • The Data Quality Client is a standalone Windows application that allows data stewards, data experts, and IT Professionals to perform data quality operations, and supports:
    • Creating and maintaining Knowledge Bases
    • Creating and maintaining Data Quality Projects
    • Monitoring data quality activities
    • Managing system settings

Knowledge Bases

  • Rationale: to cleanse data you need knowledge about it
  • The Knowledge Base is a data repository of knowledge that enables professionals to understand their data and maintain its integrity
  • Knowledge in a Knowledge Base is maintained in Domains, each of which is specific to a data field
  • Domains capture the semantics of the data
  • Domains can use online Azure reference data
    • Online DataMarketReference Data Service
    • Direct Online 3rd Party Reference Data Services
  • Processes include:
    • Domain Management – to define domains
    • Knowledge Discovery – to learn domain values
    • Matching Policy – to identify potential duplicates and non-matches

Data Quality Project

  • A Data Quality Project is a means of using a Knowledge Base to improve the quality of source data by performing data cleansing and data matching activities
  • Created and managed in the Data Quality Client
  • Results can be exported to a SQL Server table or CSV file
  • Two types:
    • Cleansing Activity – processed data is categorized as new, invalid, corrected, and correct
    • Matching Activity – used to prevent data duplication by identifying exact and approximate matches

DQS Cleansing Transform

  • Implements data cleansing in an SSIS 2012 data flow
  • Configuration involves:
    • Defining a connection to a Data Quality Server
    • Selecting a knowledge base
    • Mapping input columns to domains
    • Selecting advanced statistical columns
  • The output includes the original data and corrected data, together with status

For more Microsoft Technical Training information visit www.directionstraining.com or call 1-855-575-8900.