What are the three key factors when assessing data quality?
Data drives business decisions that determine how well business organizations perform in the real world. Vast volumes of data are generated every day, but not all data is reliable in its raw form to drive a mission-critical business decision. Show
Today, data has a credibility problem. Business leaders and decision makers need to understand the impact of data quality. In this article, we will discuss:
Let’s get started! What is data quality?Data Quality refers to the characteristics that determine the reliability of information to serve an intended purpose (often, in business these include planning, decision making, and operations). Data quality refers to the utility of data as a function of attributes that determine its fitness and reliability to satisfy the intended use. These attributes—in the form of metrics, KPIs, and any other qualitative or quantitative requirements—may be subjective and justifiable for a unique set of use cases and context. If that feels unclear, that’s because data is perceived differently depending on the perspective. After all, the way you define a quality dinner, for instance, may be different from a Michelin-starred chef. Consider data quality from these perspectives:
In order to understand the quality of a dataset, a good place to start is to understand the degree to which it compares to a desired state. For example, a dataset free of errors, consistent in its format, and complete in its features, may meet all requirements or expectations that determine data quality. (Understand how data quality compares to data integrity.) Data quality in the enterpriseNow let’s discuss data quality from a standards perspective, as it is widely used particularly in the domains of:
Let’s first look at the definition of ‘quality’ according to the ISO 9000:2015 standard: Quality is the degree to which inherent characteristics of an object meet requirements. We can apply this definition to data and the way it is used in the IT industry. In the domain of database management, the term ‘dimensions’ describes the characteristics or measurable features of a dataset. The quality of data is also subject to external and extrinsic factors, such as availability and compliance. So, here’s holistic and standards-based definition for quality data in big data applications: Data quality is the degree to which dimensions of data meet requirements. It’s important to note that the term dimensions does not refer to the categories used in datasets. Instead, it’s talking about the measurable features that describe particular characteristics of the dataset. When compared to the desired state of data, you can use these characteristics to understand and quantify data quality in measurable terms. For instance, some of the common dimensions of data quality are:
DAMA-NL provides a detailed list of 60 Data Quality Dimensions, available in PDF. Why quality data is so criticalOK, so we get what data quality is – now, let’s look at why you need it:
How to measure data qualityNow that you know what you expect from your data—and why—you’re ready to get started with measuring data quality. Data profilingData profiling is a good starting point for measuring your data. It’s a straight-forward assessment that involves looking at each data object in your system and determining if it’s complete and accurate. This is often a preliminary measure for companies who use existing data but want to have a data quality management approach. Data Quality Assessment FrameworkA more intricate way to assess data is to do it with a Data Quality Assessment Framework (DQAF). The DQAF process flow starts out like data profiling, but the data is measured against certain specific qualities of good data. These are:
Using these core principles about good data as a baseline, data engineers and data scientists can analyze data against their own real standards for each. For instance, a unit of data being evaluated for timeliness can be looked at in terms of the range of best to average delivery times within the organization. Data quality metricsThere are a few standardized ways to analyze data, as described above. But it’s also important for organizations to come up with their own metrics with which to judge data quality. Here are some examples of data quality metrics:
(Learn more about dark data.) How to enforce data qualityData quality management (DQM) is a principle in which all of a business’ critical resources—people, processes, and technology—work harmoniously to create good data. More specifically, data quality management is a set of processes designed to improve data quality with the goal of actionably achieving pre-defined business outcomes. Data quality requires a foundation to be in place for optimal success. These core pillars include the following:
Getting startedIf you are like many organizations, it’s likely that you are just getting settled in with big data. Here are our recommendations for implementing a strategy that focuses on data quality;
DQM roles & responsibilitiesAn organization committed to ensuring their data is high quality should consider the following roles are a part of their data team:
Leverage technologyData quality solutions can make the process easier. Leveraging the right technology for an enterprise organization will increase efficiency and data quality for employees and end users. Improving data quality: best practicesData quality can be improved in many ways. Data quality depends on how you’ve selected, defined, and measured the quality attributes and dimensions. In a business setting, there are many ways to measure and enforce data quality. IT organizations can take the following steps to ensure that data quality is objectively high and is used to train models that produce the profitable business impact:
Finally, identify and understand the patterns, insights, and abstraction hidden within the data instead of deploying models that churn raw data into predefined features with limited relevance to the real-world business objectives. What are the factors used to assess data quality?There are five traits that you'll find within data quality: accuracy, completeness, reliability, relevance, and timeliness – read on to learn more. Is the information correct in every detail?
What are the 3 dimensions of information quality?Accuracy refers to the quality of the data. Availability describes the information in the data made available to the analyst. Relevance refers to the relevance of the data to the analysis goal: whether the data contains the required variables in the right form and whether they are drawn from the population of interest.
What are the three critical components for determining data quality?There are five components that will ensure data quality; completeness, consistency, accuracy, validity, and timeliness. When each of these components is properly executed, it will result in high-quality data.
What are the three characteristics of high quality data?Five Primary Characteristics of High Quality Data. Accuracy.. Completeness.. Validity.. Consistency.. Timeliness.. |