John Doe*, Vice President of Fraud Operations of a leading financial services institution, was disappointed with the latest Fraud Resolution report. The effort metrics for detection of fraudulent claims did not show any improvement from the previous report despite assurances from the Analytics team that introduction of Big Data technologies would help reduce these metrics by at least 50%. For an organization that had invested heavily in Big Data technologies, the impact on ROI seemed minimal. To find fixes, John called in a meeting with Rani*, Program Manager of the Analytics program.
During the tempestuous meeting that followed, John learned that the Analytics team had not been able to proceed past the pilot stage of the project due to a number of data-related issues. Rani spoke about a whole list of challenges that were preventing her team from achieving their goals. The challenges primarily stemmed from lack of data quality from the source systems, causing issues with data processing. This included non-compliant, inconsistent, inaccurate and untimely data from an unmanageable plethora of sources that ranged from databases to large flat files. Rani’s team tried to find a strategy to overcome these difficulties since the issues were getting detected late in the data lifecycle.
The one thing that was now clear was that the team needed a data assurance strategy to ensure the quality of data that was being pulled into the system was as close to the source as possible. The assurance strategy needed to be comprehensive as well as easy to implement, especially given the volume and variety of data that needed to be tested.
Data wellness is critical throughout the Data Analytics lifecycle. However, to ensure this wellness, issues with data such as completeness, correctness, timeliness, etc. need to be captured as close to the source of the data as possible. Ensuring data wellness requires an assurance process based on a Test strategy that expedites the detection and weeding out of bad data in the lifecycle.
The Test strategy should also enable the enterprise to dissociate itself from the technology aspect and focus solely on creating value. This would require a methodology that accurately identifies, verifies and validates all the key activities in the data lifecycle. For maximum effectiveness, the methodology would need to be measurable, repeatable and should address various technical and process-related challenges.
Retailers, too, are working on ways to wiggle out of data messiness as they cut through the clutter of business complexities. The Chief Data Officer of a leading multi-national retailer, Lucie Hall*, was at a loss to explain to the CEO as to why her team was again not going to deliver the bi-weekly Consumer Behavior Report required by Strategy, and Sales and Marketing teams worldwide. Despite a fruitful recruiting drive where the company had hired the best data scientists and done a successful data integration with various social media platforms, her team could not release the report on time. Lucie was constantly reminded that the company management had spared no expense in setting up a world class infrastructure and the team, and thus expected to see some quick results.
As Lucie took stock of the situation, she realized the first key challenge was the exceedingly long time and effort required to ensure data quality and value throughout the data processing lifecycle. Although the team had integrated every data source correctly, the validation of the data at each stage was still performed manually due to the large number of tools used for a particular task
The other challenge was the time the team spent in creating, executing and maintaining a suite of scripts and algorithms that did not integrate well with the tools used at various stages of the lifecycle. Furthermore, there was a lack of uniformity in the design and creation of the various scripts and algorithms used to run the entire system, which became a nightmare for the team to manage.
Lucie realized that a lack of a holistic assurance approach to data lifecycle management was at the root of all problems. The assurance process required elucidation of an effective Testing strategy to Test both the data and the technology on which work was performed. This required the inception of a robust automation strategy that stitched together the various tools and workflows that were performed by the team. This would not only eliminate inefficiencies within the team, but also would expedite the execution of the report building process.