The data quality problem is still ubiquitous, putting critical business processes at risk.
In the 21st century, data plays the same role that oil played in the 18th century. It drives the digital economy, proving a valuable asset for governments, businesses, and society alike. This puts unprecedented pressure on software developers to ensure the highest quality of data, thus evading inefficiencies and unlocking previously unimaginable insights.
However, when it comes to data quality management, there are no universal criteria, and each organization should choose its own set of metrics depending on the specifics of the software product they are building and the workflows this product aims to address, among other things.
Irrelevant or inaccurate data might significantly undermine any data-driven process, resulting in compromised decisions as well as legal and financial damages.
Have a look at how companies around the world are affected by less-than-optimal data:
• 77% of companies believe that inaccurate and incomplete data affects the bottom line.
• 6% of annual business revenue is lost because of poor-quality data.
• 40% of business initiatives fail because of insufficient data quality.
• 41% of companies think inconsistent data prevents them from maximizing ROI.
• Only 16% of companies cite a significant impact of predictive analytics.
• Bad data costs US $3 trillion per year.
In any case, an all-around evaluation of data can help you avoid pitfalls and fix the negative impact of your current poor-quality data. First, start with the databases used in the product as well as the metrics that the development team uses to assess the quality of your applications.
If the thought of doing that all by yourself makes you uncomfortable, there is another option: turn to a product assessment platform such as TETRA™ for help.
Benefits of taking data quality under control
Large organizations operating massive amounts of information depend heavily on data consistency. Maintaining high-quality data helps avoid duplicate mailings, makes customer data easy to analyze, and enables different departments to stay on the same page.
Verified and reliable data also helps business users and stakeholders make informed decisions.
Intetics could go on with examples, but the unifying factor is the ability for companies to use quality data to elevate processes and take advantage of valuable opportunities.
The global standard for the evaluation
How to estimate the quality of datasets across the organization? The international standard ISO/IEC 25012:2008 defines the quality characteristics of data. Using this standard, organizations can validate the current quality of data and detect areas for improvement.
Data quality management process
In the practice, follow seven steps to verify the quality of business data and maintain the quality of all datasets.
1. Define data quality metrics
Choosing which data quality metrics to monitor and which KPIs to set depends on the specifics of the product. Follow the above-mentioned ISO/IEC product quality standard as a point of reference, choosing the most suitable set of metrics to use in the project. If still undecided, try to glean some historical insight into what might undermine the data quality, or take a trial-and-error approach.
2. Perform data profiling
Review all your data sources (various modules of your application) and catalog all types of data to be analyzed. Think of tables and data fields like ‘account name’ and ‘password’ as well as relevant characteristics of this data, such as whether it’s a mandatory field or not, whether it allows input of text, numbers, or date, and so on.
3. Analyze selected data
At this stage, get to check — both manually and with the help of automation tools — if the way the product captures, stores, and processes data now corresponds to the expected patterns specified during data profiling (see the previous step). As a result, it might be detected costly errors, such as missing values or values outside the required range.
4. Create the dashboard and report
Be ready with a detailed report for stakeholders to see the big picture and for tech leads to come up with an action plan based on the results of the analysis. While the former might be satisfied with a visualized dashboard for high-level insights, the latter will need detailed information on which data requires fixing — for example, when a field that, according to the documentation, is supposed to validate input doesn’t do so, or when input is not limited as required by the documentation (for instance, users under a certain age aren’t supposed to be able to register an account in the system).
5. Perform root cause analysis
It’s possible to do away with data pitfalls if to track down the initial problems and address them at the root level. Trace erroneous data fields and the characteristics they fail to comply with. It might be that you kicked off the project with inadequate or incomplete requirements or made mistakes in configuring the database.
6. Fix data quality issues
When you have uncovered quality issues, you need to fix them and implement solutions that can prevent further disruptions at any stage of data processing. Taking preventive measures is less costly and more efficient than dealing with low-quality data later in the project.
7. Continue regular data quality control
At least once a year, there is a need to verify the state of data, analyze profiling and the changing requirements to data. Go step by step through the process described above, reveal new possible issues, and fix them. The testing frequency depends on the scale and complexity of your project and the resources you have.
Conclusion
To avoid the most notorious data issues such as inconsistency, inaccuracy, incompleteness, and many others, there must be set up a well-oiled quality monitoring process. In doing so, the best bet would be to follow the data quality management best practices from this article or turn to experts who can assess the state of your data and recommend solutions to any problem.