Thought leadership
-
October 25, 2022

The onion, the carrot, and the “alert budget”

When you introduce a data observability product to your stack, be thoughtful about how you choose to roll it out. The rollout will impact how your team’s attention is directed, and therefore how your budget is allocated.

Kyle Kirwan
Get Data Insights Delivered
Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.
Stay Informed
Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.
Get the Best of Data Leadership
Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Did you know you already have a data quality budget? It might sound surprising, but you do.

I don’t mean a financial budget .  I mean the total amount of time your team will spend in 2023 either fixing data issues, or hardening your data pipelines against various potential failure modes.

Every data team will spend some amount of time on the former (at least until the Metaverse delivers us from the realities of data engineering), and hopefully many will also allocate time to the latter. You can think of this budget in other ways - attention span, or capacity, or focus time.

So when you introduce a data observability product to your stack, be thoughtful about how you choose to roll it out. The rollout will impact how your team’s attention is directed, and therefore how your budget is allocated.

Two alert budget rollout options

Let’s say you have 1000 tables in your warehouse. We'll call the first rollout style the "onion" approach. It looks something like this:

With this “onion” approach — your strategy is to monitor all 1000 tables for the simplest attributes like freshness and volume. Imagine peeling off just one or two layers from an onion. You get broad 360-degree visibility across your warehouse.

What happens to your team’s attention with this approach? They might be alerted to look at any table, anywhere, on any given day, to resolve a freshness or volume issue. Your organization should only roll out data monitoring this way if they use data very evenly, with only a few “hotspots” of regularly-referenced data. You trade breadth for depth.

Let’s call the second option the “carrot” approach. It looks something like this:

In this approach, you apply more fine-grained monitoring to a smaller number of tables. As opposed to the onion approach, where you’re lightly monitoring everything, you prioritize the highest-use tables in your organization. You leave the rest alone.

Imagine a tall and narrow carrot, wherein all tables are stacked on top of each other. The top 200 tables, the top of the carrot are highly to moderately important. The last 800 are infrequently used or unimportant. You monitor those top 200 tables for the same attributes, plus outliers, duplicates, and formatting problems. Your team will have more time available to spend improving the reliability of these 200 tables, at the expense of missing freshness and volume issues on the other 800.

Which approach do you choose?

Why do you need to make this tradeoff at all? Shouldn't you just monitor everything as deeply as you can?

No - you'd run out of your alert budget very quickly! Your time and attention are limited. Your data observability system needs to focus on the issues that impact data stakeholders (either internal ones like execs, or external ones like customers).

You have to  be wise about spending your “alert budget.” You can loosely conceptualize the budget you have by following this formula:

(# Tables) x (# Attributes per table) = (Total Attributes)

If your observability system tracks seven attributes:

  • Freshness
  • Volume
  • Missing values
  • Duplicates
  • Distributions
  • Outliers
  • Formatting problems

That gives you 7000 attributes that might draw your attention at any time (1000 tables x 7 attributes).

Your rollout strategy — onion or carrot — will determine which of the 7000 attributes you care about tracking.

There’s also another important factor to consider. There’s a non-zero false positive rate in any alerting system, whether it’s an ML anomaly detector error, or an erroneous set of manual threshold rules.

With a 2% false positive rate, in the broad-rollout scenario (the “onion”), expect around 40 useless alerts each day (1000 tables x 2 attributes x 0.02 = 40) spread across 1000 tables. With a 2% false positive rate in the narrow-rollout scenario (the “carrot”), expect 28 useless alerts per day (200 tables x 7 attributes x 0.02 = 28) across your top 200 tables.

Attention is precious, and your team can only respond to limited alerts each day. So take care with your “alert budget.” When it comes to data observability tools, you can choose between a broad and shallow “onion” approach, or narrow and deep “carrot” approach. There’s no right answer, but either way, make sure you’re safeguarding your team’s attention and focus.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.