Data Quality
Apr 6, 2019
Varinder S Sembhi

Data is like air, it is all around you, but you won’t notice it’s quality until it becomes hard to breathe.

An organization with bad or questionable data quality has a problem, it is hard to make sound decisions based on bad quality data. In such organizations, confidence in data is generally low. As per one study by Forbes/KPMG, 84% of CEOs are worried about poor strategic decision making due to bad data. To make matters worse, this data quality problem is becoming more severe year over year because of data growth, an average 40%, often driven by digital transformation. Like air quality, data quality should not drop to the level that it is irreversible and cause the organization to suffer.

A typical data quality management process

Quality needs to be measured in the entire supply chain of data movement at various points to ensure the level of quality at the consumer level. As data flows through an organization it gets replicated multiple times, adding more complexity in the data quality measurement.

Perfect data quality is not often a viable objective. There are different dimensions in which data quality can be measured, and not all dimensions are necessary for every data element. Importance and use of the data elements plays an essential role with an overall goal to increase confidence in the data for its consumers. The most common dimensions are in the table below:

Dimensions Definition Examples
Accuracy Veracity of data to its authoritative source Precision (e.g. Decimal place), information distorted in flow
Completeness Availability of required data attributes Missing data element
Conformity Alignment of content with standards Date format, ISO code
Consistency Compliance with required format, values or definitions US/EU standard, imperial vs. metric system
Coverage Availability of required data records E.g. Eastern Canada data missing
Timeliness Representative of current condition and availability for use Credit rating is not updated, report is last quarter
Uniqueness Record or attribute is recorded only once Natural keys to surrogate key relation

Quality measurement is the first step in the process. A coordinated effort is required for data quality improvement through a data remediation process. The typical steps are as follows:

  • Identify critical data elements (CDE) that are important and require a certain level of data quality. CDE might be important because of a regulatory requirement or critical business needs.
  • Define specifications across appropriate dimensions and determine the tolerance level.
  • Document the data lineage of critical data elements (CDEs) from golden source to consumer.
  • Measure data quality as per defined specification at different points in the supply chain and report exceptions.
  • Engage the remediation team to fix data quality (stop immediate bleeding) and then find the root cause of inferior quality to fix permanently (if applicable).
  • Publish dashboard with consolidated quality measurments across all relevant quality dimensions.

The challenge comes when these steps are partially followed in isolation and information is not completely shared with the consumers. To build a robust data culture, each and every report produced should have a label with data quality confidence score. This allows consumers to ask critical questions while using data:

  • What is the source of this data and how far is it from the golden source?
  • What are the quality metrics used?
  • Is this data governed and certified?
  • When was quality last measured?

Publishing a dashboard along with datasets can answer these questions and increase confidence in the data. Here is an example of such a dashboard.

Data Quality Dashboard

Data Quality Function (DQF)

The data quality effort should be standardized within the organization to avoid repetitive work. The Data Quality Function is a center of excellence (COE) and helps business teams to achieve their data quality goals. This DQF is typically a part of the Data Governance Office or Chief Data Office and works closely with the governance team. Capturing metadata on CDEs, identification of golden source and documentation of data lineage are a few of the activities where the governance team takes the lead in the process described above. Key functions of the DQF are as follows:

  • Define objectives of Data Quality Function and get buy-in
  • Identify, assign and educate teams about different roles and provide training
  • Provide a set of standards to define criticality of data elements
  • Guidelines to establish rules to measure data quality including control points
  • Help prioritize data sources for data quality measurement
  • Provide tools to automate the measurement process and remediation workflow
  • Automate the reporting of data quality metrics regularly
  • Audit & Govern total data quality process


Data quality management is a proactive and continuous measure. The DQF is crucial to jump-start a quality measurement journey. This transformation can be overwhelming and requires organizational support alogn with the right process and technologies. The DQF has a role to play in coaching the organization in the importance of data quality and as such, need to develop coaching skills. If you would like to learn more about starting your own DQF or discuss other ways of managing data quality, you can reach me at