DataOps - An Antidote for Data Value Chain Challenges

Abstract

Over the past decade, we have witnessed an unprecedented technological disruption in application and data functions. They are now enabling enterprise-wide digital transformation initiatives to empower customers and colleagues. This evolution, coupled with the avalanche of data, creates a wonderful opportunity to establish data-driven enterprises, while simultaneously poses challenges around managing, governing, monitoring, and improving the value of data. Enterprises need to re-imagine how they deliver data and analytics programs, as current processes often fall short of meeting the rapidly evolving demands of business. A DataOps framework acts as an antidote to address data-related challenges, enabling a shift towards agile and reliable delivery of data and analytics programmes. This paper discusses the significance of DataOps and leveraging it to alleviate problems faced in data analytics programs. It also provides a framework to implement DataOps successfully in enterprises.

Challenges abound in the data value chain

A typical data value chain comprises the following stages: Acquire, Process, Publish, Consume, and Act. Typical challenges in this value chain include (Ref1):

Inability to respond to rapidly evolving business demands
High failure rate in data engineering pipelines, in addition to a slow and reactive remediation strategy
Lack of data quality controls across the data value chain
Lack of trust in data
Hard to discover an enterprise’s data
Operationalization of analytical data products
Hard to validate data fed into machine learning (ML) models

Figure 1 illustrates several key statistics on enterprise data value chains (Ref2).

DataOps - An Antidote for Data Value Chain Challenges

Figure 1: Enterprise data value chain statistics

Six steps to leveraging DataOps to mitigate challenges

DataOps (see Figure 2) brings together the best software engineering and data engineering tools and methodologies, coupled with cultural changes and monitoring controls, to create trust in data and accelerate analytics delivery. It puts analytics at the heart of an enterprise. DataOps acts as a bridge between data providers and consumers by facilitating bidirectional communication flow to improve the quality of the data value chain.

DataOps combines Agile, DevOps, and statistical process controls to provide delivery efficiencies and increase the value of the data value chain.

Figure 2: DataOps architecture

Let’s delve into what it takes to implement a successful enterprise-wide DataOps strategy (Ref3). The six key steps are as follows:

#1 Establish the DataOps function with a culture of deeper collaboration

It’s critical to establish the DataOps function with senior stakeholders of the enterprise with representation from both business and IT. Define the operating model, establish KPIs across the data value chain pertinent to the DataOps function and the enterprise as a whole, track through DataOps implementation, and continuously refine the KPIs to further increase the data value.

It is important to establish an enterprise-focused strategy. Key stakeholders, such as the Chief Information Officer (CIO), Chief Technology Officer (CTO), Chief Data Officer (CDO), Chief Digital Officer, Chief Analytics Officer (CAO), Chief Data Architect, Chief Data Scientist and Head of Business Functions and Finance representatives, must be included.

#2 Leverage/set up Enterprise-level Agile and DevOps capabilities

Most modern enterprises have either built or are in the process of building Agile and DevOps capabilities. Data & Analytics teams should, therefore, join forces and leverage the enterprise’s Agile and DevOps capabilities to:

Shift from a project-centric approach to a product-centric approach (i.e., geared toward analytical outcomes).
Establish the end-to-end (from idea to operationalization) pipeline for analytics.
Instill automated testing across each stage of the data value chain and follow Test-Driven Development (TDD) tools and methodologies.
Enable quality controls at each step of the data value chain.

#3 Automate the provisioning of data, analytics, and AI infrastructure

One critical principle of DataOps is the ability to scale IT infrastructure in an agile manner to meet the rapidly evolving business requirements. Many commercial and open-source tools are available to automate infrastructure. Regardless of the hosting environments (cloud/on-premise/hybrid), enterprises should rely on infrastructure as code to set up, configure, and scale Data & Analytics platform services. Version control the code similar to the application code or analytics code. Ensure automation in security and compliance requirements as well.

Examples of data infrastructure automation include:

Data lakes
Data warehouses
BI platforms
Machine learning infrastructure
Deep learning infrastructure
Containerization and orchestration

#4 Establish multi-layered data architecture to support a variety of analytical needs

Modern-day data platforms are complex with varied needs, so it’s important to design your data platform in alignment with business priorities to support myriad data processing and consumption needs. One of the proven design patterns is to set up multi-layered architecture (raw, enriched, reporting, analytics, sandbox, etc.), with each layer serving a different purpose, and increase the value over time.

It is also important to establish the owners across different layers. Register data assets across various data layers to support your enterprise data discovery initiatives. Set up data quality controls across various layers to create data assurance and trust. Set up appropriate data access controls so that data providers and consumers can safely share and access data and insights. Containerize and scale these services across various analytical engagements as reusable services.

#5 Build data value chain orchestration pipelines

Orchestration plays a pivotal role in stitching together the data flows from one layer to another to bring “ideas to operationalization.” Leverage containerization capabilities to ensure that the sub-components of these orchestration pipelines are scalable and reusable across the enterprise.

Key pipelines supported by DataOps are:

Data Engineering pipelines (batch and real-time)
Common services (data quality, data catalog) pipelines
BI reports/dashboards (batch and real-time)
Machine Learning pipelines (batch and real-time)

#6 Define and implement a holistic monitoring and alerting framework

Build a comprehensive monitoring and alerting framework to continuously measure how each stage of your data value chain responds to the changes. Socialize these KPIs with the DataOps function to take the right course of action and build reusable artifacts where possible.

Benefits of DataOps

Streamlined and highly wired enterprise and domain data governance functions
Enhanced experience and adoption of data marketplace and other similar initiatives
Opportunity to measure and incrementally improve the value generated by data
Ability to make the data trustworthy and share it safely
Improved customer and employee experiences in using data
Accelerated delivery of high-quality data with real-time feedback loops
Better version control of datasets to manage changes in artifacts essential for governance and iterative development
Ability to capture metrics and reports to provide a big-picture assessment of the state of the analytics and data team

DataOps is the future of data management

Given the rapid and constant changes in data, enterprises need a comprehensive solution to bring together every part of a business into one pipeline. That’s what DataOps enables. It drives companies to use data more efficiently, leveraging the right tools, technologies, and skill-sets.

With better end-to-end data pipeline visibility, automated orchestration, higher quality, and faster cycle times, DataOps enables data analytics groups to better communicate and coordinate their activities. Clearly, DataOps is the antidote that organizations always wanted to address data value chain challenges, and it will become a critical discipline for those who want to thrive in the new age data landscape.

References

https://www.cio.com/article/3221430/business-intelligence/4-reasons-most-companies-fail-at-business-intelligence.html
https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/
https://datakitchen.io/dataops-case-studies.html

About the author(s)

Ravi Varanasi

Partner, Data Analytics and AI, Wipro

Ravi Varanasi has more than two decades of experience in data, analytics, cloud, architecture, innovation, and thought leadership. Ravi has an impressive mix of working for some major banks and consultancies, at various capacities across business functions like pensions, investments, IT operations, global standards, commercial banking, wealth management, payments, and anti-money laundering.

Dilip Maringanti

Partner, Data Analytics and AI, Wipro

Dilip Maringanti has worked with global financial institutions and retailers in setting up their data strategies and leading many data transformation engagements. He specializes in providing strategy, advisory, and architecture services in multi-cloud, data, and AI spaces.

DataOps – An Antidote for Data Value Chain Challenges

About the author(s)

Related Articles

Contact Wipro