Product
-
January 3, 2023

So you've implemented metadata metrics...now what?

Metadata metrics are a great first step in your data observability journey, but they are not the end of it.

KIT WETZLER
Get Data Insights Delivered
Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.
Stay Informed
Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.
Get the Best of Data Leadership
Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Metadata metrics are a great first step in your data observability journey, but they are not the end of it. Column-level metrics provide deeper information about your data and can be used to make better assessments.

What are metadata metrics?

Metadata metrics are broad freshness and volume metrics meant to tell you if a data pipeline has succeeded or not. As you can see from the table below, they mostly have to do with whether a table has been updated and/or read from.

Metadata Metric NameAPI NameDescriptionHours since last loadHOURS_SINCE_LAST_LOADThe number of hours since an INSERT, COPY, or MERGE was performed on a table. It is suggested as an autometric once per table.Rows insertedROWS_INSERTEDThe number of rows added to the table via INSERT, COPY, or MERGE statements in the past 24 hours. It is suggested as an autometric once per table.Read queriesCOUNT_READ_QUERIESThe number of SELECT queries issued on a table in the past 24 hours. It is suggested as an autometric once per table.

With Bigeye, metadata metrics are available for deployment from the moment you connect your data warehouse: Bigeye scans your existing query logs to automatically track these three metrics across every table. This makes metadata metrics a cornerstone of Bigeye’s T-shaped monitoring philosophy, which recommends that you track fundamentals across all your data while applying deeper monitoring on the most critical datasets, such as those used for financial planning, machine learning models, and executive-level dashboards.

What don't metadata metrics provide?

Metadata metrics do not provide column-level information on things that go wrong, for example:

  • If you loaded blank values into a column that never had blank values
  • If you loaded dates into a column where there have never been dates before
  • If a transform went wrong, and you ended up with the wrong values in different columns than you expected

I’ve implemented metadata metrics. Now what?

Once you’ve set up broad coverage of all your tables with metadata metrics, the next step is to drill down further with column-level metrics.

How do I turn on column-level metrics?

With Bigeye, it’s simple to implement column-level monitoring.

1. Bigeye recommends column-level metrics with Autometrics

When you first connect your data warehouse (and whenever a new table is added to the warehouse), Bigeye begins profiling your data to understand what it looks like. Bigeye can then generate Autometrics for the table based on the content of each of the columns. For example:

  • If you’ve got three values in the column, it’s probably an enum
  • If you have no duplicate values, maybe you never want the column to have any duplicates
  • Maybe it looks like an ID column, which means you’ll want to check for duplicates
  • If the column is full of strings, maybe it’s a column or timedate column.

Depending on these heuristics, Bigeye suggests a set of metrics for each table. You can find these suggestions on the **Autometrics** tab of the table's catalog page.  

For more details on which metrics are available as Autometrics, and the criteria data must match in order for them to be suggested on a given column, review Bigeye's Available Metrics.

In addition to the metrics themselves, Bigeye also generates auto-thresholds, which are automatic thresholds calculated from historical patterns.

Auto-thresholds free you from having to manually set, tune, and update potentially thousands of sets of thresholds. Thanks to Bigeye’s anomaly detection engine, these thresholds are also dynamic – they adapt to business changes, seasonality, and your feedback.

For example, when a data issue notification is fired, but the user thinks that the data batch in question is actually good in practice, the user can tell Bigeye that the underlying data state is tolerable or that a false positive alert is present. Bigeye will take this information into account so that similar behavior in the future will not trigger an alert.

2. Turn on the column-level metrics that you actually care about!

Unlike other data observability vendors, Bigeye allows you to pick and choose which  column-level metrics on which tables you want to enable, rather than forcing you to enable all of them on all tables.

This allows the data team to avoid alert fatigue by focusing on the columns that are important to them and ignoring the ones that are not.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.