Product
-

Tracking data quality performance with the new Bigeye dashboard

The new Bigeye dashboard provides instant insight into the health of data pipelines so you can spot trends to help prioritize and manage data quality at scale.

Kendall Lovett

We’re excited to announce the availability of the all-new Bigeye Dashboard. This new and improved dashboard gives Bigeye users immediate insights into the health of their data pipelines and the quality of data overtime.

The new Bigeye dashboard allows us to better set and monitor data quality goals across our data team and demonstrate to executive leadership how our performance is improving over time.

Nick Heidke, Principal Data Architect, SimpleRose

The Bigeye dashboard allows customers to get valuable insights in three key areas: monitoring coverage, reliability and quality scores, and issue response time. Users can also use time frame and data source filters to explore the data and perform additional analysis.

Monitoring coverage

The monitoring coverage section provides insight into both the number of tables currently monitored by Bigeye and the percentage of tables covered by various data quality dimensions. This helps customers track Bigeye usage over time and find potential gaps in coverage that they may want to invest in. For example, customers may expect to have uniqueness checks on at least 50% of their tables. If they see that this number is currently only 30%, they can take steps to increase it in Bigeye.

Reliability & quality scores

Customers can use the pipeline reliability and data quality charts to see trends in their data operations and understand if performance is improving or declining over time. This is especially useful for communicating progress to data executives.

Bigeye monitors the reliability of data pipelines by tracking the freshness and volume of data inserted into tables. If data doesn’t arrive on time (freshness) or doesn’t include the expected number of rows (volume), then Bigeye will flag this as a pipeline reliability issue. The pipeline reliability score is a reflection of the percent of tables that are “healthy” i.e., those that don’t currently have any open freshness or volume issues. This score is also displayed week over week to allow teams to observe performance over time.

The Bigeye dashboard also monitors the health of data quality checks, or “metrics”, over time. Users can see aggregated insights into the health of data across 70 out-of-the-box checks for dimensions like uniqueness, completeness, validity, distribution, and more.

  • Pipeline reliability: Detects whether tables are updating on time and with the expected volume of data. Example metrics: freshness, volume, read queries.
  • Uniqueness: Detects when schema and data constraints are breached. Example metrics: Distinct (#), Duplicates (#).
  • Completeness: Detects when there are missing values in datasets. Example metrics: Null (#,%), Empty String (#,%), NaN (#,%)
  • Distribution: Detects changes in the numeric distribution of values, including outliers, variance, skew and more. Example metrics: Min, Max, Average, Variance, Skew.
  • Validity: Detects whether data is formatted correctly and represents a valid value. Bigeye offers validity metrics across a number of categories. Example metrics: String Length (Min, Max, Avg), UUID, SSN, Phone, Zip code, Email, State code.
  • For a full list of available metrics, visit the docs.

Issue response time

In order for data teams to focus on and improve data quality, they need an easy way to measure and communicate performance. The Issue response section of the dashboard provides a helpful starting point for understanding performance by monitoring the number of issues closed by each Bigeye user, the average time to resolution, and how many issues have been interacted with. This allows customers to set simple yet meaningful goals or SLAs and communicate their performance to business and data leaders across the organization.

Ready to see the new Bigeye dashboard in action? Schedule a demo.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.