How to calculate the ROI for data observability
On the one hand, businesses are more data driven than ever before. On the other hand, data pipelines are increasingly complex and error prone. Is it time to invest in data observability?
What is data observability?
Analogous to observability in software engineering, data observability refers to the practice of instrumenting your data systems to give a comprehensive view of what is going on in each component of your data stack at any given time.
You can read more about data observability and why it’s important here.
Convincing your organization that you need a data observability solution
Building a data observability practice in your organization often requires upfront investments – in engineering hours, process changes, and the purchase of technical solutions. Often, before leadership is willing to commit, they’ll want to understand the return on investment (“ROI”). Data teams looking to invest in data observability will need to prove that better quality, fresher data maps directly to increased revenue and/or cost savings.
Calculating ROI
ROI is a generic performance metric that measures the efficiency of a particular investment, in particular the return compared to the cost. It’s especially helpful when used to compare multiple potential investments.
There are two components to calculating ROI:
- calculating the return
- calculating the initial investment/cost
Since you’re trying to justify an investment to improve data, make sure that your argument is data-driven. This means starting with the most quantifiable impact: how will implementing data observability either increase revenue or decrease costs?
Here are few examples of “pathways” that bad data might take to affect the company bottom line:
- If a data outage impacts a company’s machine learning models, the loss of revenue can be significant. For example, a data outage that results in Uber’s surge pricing algorithm updating, might potentially cause millions in lost revenue even over the course of an hour.
- Data quality issues might result in direct costs. For example, if the format of customer names and addresses are not validated, multiple mailers might be sent to the same actual customer, creating waste.
- Data quality issues eat into developer productivity. Without even taking into account opportunity cost, the time that engineers spend chasing down data reliability issues that they shouldn’t have directly maps to salaries and equity compensation.
To ensure that you’re quantifying the potential return in a comprehensive, methodical way, rather than adding up random impacts, we recommend the following steps to calculate return.
Step 1: Identify all specific business issues within a company
Some examples here might include:
- Users are registering for “new user” promo codes more than once.
- Fraud detection is not catching fraudulent users.
- Analytics dashboard showing sales is not up to date
Step 2: Determine the cost of these specific business issues
The respective answers here might be:
- Cost of users using “new user” promo codes when they should not be allowed to: $100,000/year
- Cost of fraudulent users: $200,000/year
- Cost of inadequate inventory in different locations due to lack of up-to-date analytics dashboard: $300,000/year
Step 3: Determine whether bad data is at the root of the issue.
The respective answers here might be:
- Yes, because there’s no validation on new user names or emails so there are duplicate entries of a single user in the database
- Yes, because there’s missing data
- Yes, because there’s often a delay in the transformation of data
Step 4. Set data SLAs to improve the quality of the data.
The respective answers here might be:
- The users database table must be deduplicated; all future writes must be standardized in format, and checked against existing entries.
- Missing training data must be interpolated.
- Max delay from orders data being produced in Shopify and orders data at rest in Snowflake should be 24 hours. This should allow for timely inventory projections.
Step 5. Determine the updated cost of the issue to the business.
The respective answers here might be:
- This should reduce the cost of duplicate new user orders by 100%.
- Savings of $100,000/year.
- This should bring the false negative rate down to 2% from 4%.
- Savings of $100,000/year.
- This should bring the leftover inventory percentage down 50%.
- Savings of $150,000/year
Less quantifiable metrics
While things like engineering time and software outages can be more or less mapped to dollars and cents, there are other potential “returns” for data observability that are less quantifiable but arguably even more significant. These include:
- Ability to make good business decisions
- Potential PR or legal risk
- Lower employee retention
Our recommendation is that you do not attempt to include these “soft” metrics in the quantitative calculation, as you would have to make potentially ungrounded estimates. However, you can include a qualitative writeup of them along with your final ROI report. This provides decision makers with an additional data point if they’re on the fence, and allows them to value the soft impact as they choose.
Calculating Investment/Cost
In addition to determining the return, data teams will also need to calculate the cost. A simple strategy for determining the cost is to examine three categories:
People: the cost of data engineers to whom the issue will be assigned.
Process: the cost of hiring, training, and general change management.
Technology: data observability tool purchase, implementation, and maintenance as well as infrastructure like servers or databases.
When evaluating all of these categories, it is important to consider both short- and long-term costs.
Case Study
Let’s say that you are an e-commerce brand, and your business issues are as above. Let's look at certain specific issues to determine the overall ROI of a data observability tool:
Issue: Users are registering for “new user” promo codes more than once
- Potential savings after observability tool implementation: $100,000
- Implementation cost: $80,000
- Total savings: $20,000
- ROI: 25%
Issue: Fraud detection is not catching fraudulent users
- Potential savings after observability tool implementation: $100,000
- Implementation cost: $80,000
- Total savings: $20,000
- ROI: 25%
Issue: Analytics dashboard showing orders is not up to date
- Potential savings after observability tool implementation: $150,000
- Implementation cost: $80,000
- Total savings: $70,000
- ROI: 87.5%
Conclusion
Before companies invest in data observability, they will often want to calculate the ROI. They can do this by enumerating business issues, determining their data quality roots, and then setting SLAs that will ameliorate these issues. In arguments made to decision-makers, the quantitative ROI can be supplemented by additional “intangible” effects of data quality improvements, i.e. developer morale and better business decision making.
Monitoring
Schema change detection
Lineage monitoring