Company

September 14, 2023

Enabling self-serve data quality with Bigeye

min read

Liz Elfman

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Many data teams aim for"self-service", with analysts who have direct access to data and tools. But in practice, access alone doesn't always enable frictionless self-service.

Without proper data quality monitoring and governance, "self-service" wastes time and muddies the clarity that analytics are supposed to bring. Enabling comprehensive data quality capabilities for analysts is easier said than done.

It requires an advanced, streamlined system to constantly tune alerts, debug root causes, and resolve issues. This ends up being kicked back to the data platform and engineering teams, creating bottlenecks. So how can organizations make self-service data quality attainable?

The key is using technology to remove the manual toil from quality management and accelerate issue resolution. This is where Bigeye comes in.

How Bigeye enables self-serve data quality

Bigeye was designed to make it easy for data teams to monitor their data quality, set SLAs, and get notified of any issues before they turn into downstream problems. Some key capabilities that enable self-service data quality:

Metadata metrics

Bigeye's Metadata Metrics provide instant observability across your entire data warehouse by automatically tracking key operational metrics. They require zero manual configuration and are enabled as soon as you connect your data source. They work by scanning existing query logs in your data warehouse to monitor metrics like:

Time since the last table refresh
Rows inserted per day
Number of queries per day

This allows you to quickly detect common data pipeline issues like stale data, irregular data loads, and changes in table usage, without any additional load on your warehouse.

Metadata metrics are a key part of Bigeye's "T-shaped monitoring" approach:

Broad coverage across all data via Metadata Metrics
In-depth monitoring on critical tables using suggested and custom metrics

Bigconfig templates

Bigconfig is Bigeye’s configuration-as-code tool that allows easy setup of comprehensive data monitoring across warehouses. It uses a simple YAML syntax to define tags, metrics, and monitoring collections.

Bigconfig empowers self-service data quality in two key ways:

Efficient monitoring setup

Analysts can instantiate monitoring for common data types like IDs, amounts, and timestamps with just a few lines of configuration. Tags identify field patterns across tables. Metrics apply checks like uniqueness and distributions to tagged columns. This removes the manual work of configuring every single column.

Custom business logic

Bigconfig makes it easy to define custom metrics using SQL snippets. Analysts can build specialized checks for business data without coding entire scripts. For example, validate expected values in JSON data, flag payment amounts outside of ranges, etc.

Other benefits include:

Organizing metrics into collections for clear organization and ownership
Adjusting monitoring as needs change by updating the config file
Deploying monitoring across environments using Infrastructure-as-Code tools
Metrics adapt intelligently over time with anomaly detection

With Bigconfig, analysts can set up and evolve data quality monitoring themselves. By coding metrics instead of rules, Bigeye removes friction from self-service governance. This means faster observability and accelerated delivery of analytics.

Autothresholds

A key challenge with manual threshold-based alerts is that they require constant tuning as data patterns change. With autothresholds, Bigeye removes this friction by analyzing historical data to calculate dynamic thresholds that adapt as trends evolve. Analysts don't have to manually define rigid thresholds that go stale.

Anomaly detection

Bigeye uses advanced anomaly detection techniques to identify unexpected changes or unusual patterns in data. This is a key capability for enabling effective self-service data quality.

Simple threshold-based alerts require manual setup and tuning. With potentially thousands of metrics to monitor, this doesn't scale. Bigeye's anomaly detection is automated, adapting to changing data patterns over time.

Additionally, accurate anomaly detection finds subtle issues missed by basic methods. It understands trends and seasonality in data. Bigeye uses multiple statistical models to minimize false positives and false negatives.

The system also learns from user feedback, improving over time. Excluding anomalies from baselines prevents noisy data from skewing detection.

Easy root cause detection

When anomalies are found, Bigeye provides insights for rapid investigation:

Root cause analysis shows upstream data lineage and related issues.
Impact analysis reveals how downstream data may be affected.
Timelines, graphs, and sample queries assist with debugging.

This means analysts can easily diagnose and resolve many data issues without engineering support. If needed, they can clearly describe the problem to route to the right team.

BI integrations

Bigeye integrates directly with Tableau to provide visibility into report usage and data lineage. This empowers analysts to quickly trace data issues impacting business intelligence. For example, tables used in Tableau are mapped to backend warehouse sources. Analysts can trace report failures back to underlying data issues through interactive lineage graphs.

Bigeye also displays popularity metrics for Tableau reports and dashboards. Analysts get visibility into the most accessed visualizations to prioritize critical data flows.

With these capabilities, Bigeye provides the baseline data quality and trust needed for advanced self-service analytics. Users have the flexibility to define rules and get alerts tailored to their specific needs, while also benefiting from organization-wide standards, governance, and reliability. Let's walk through some examples.

Financial data

Many companies rely on Stripe transaction data to make key business decisions around revenue, refunds, fraud, and more. Bigeye makes it easy to monitor critical aspects of Stripe data quality and integrity. Its Stripe Bigconfig template provides out-of-the-box monitoring collections for:

General data integrity metrics

Unique transaction IDs across tables
Non-null ID fields
Proper join integrity between transaction tables

Balance transaction integrity

Valid currency codes
Reasonable transaction amounts
Matching amounts for charges and refunds

Accurate sales data metrics

Number of successful sales and refunds
Chargebacks and disputed transactions
Invoice totals match charges
Fee amounts in expected ranges

Custom business metrics

Revenue totals by currency, product, geo
Refund rates by product line
Average subscription fee over time
Expected JSON fields and values

CRM data

Another common data source that people want to monitor is CRM data, e.g. from Hubspot. Bigeye’s Hubspot Bigconfig template makes it easy to monitor things like:

Validating that primary keys like company IDs and contact IDs are unique and not null, and that they stay intact across tables like contacts to companies.
Validating that key fields like industry, lead source, conversion events, tech stack, segments, and deal stage/type are the expected values and in the expected ranges.
If integrating external enrichment data, validating that these fields stay in sync.

Final thoughts

High-performance self-service analytics relies on trusted, high-quality data. Otherwise, analysts spend their time fighting data issues rather than driving insights. Bigeye provides a flexible, collaborative platform that empowers users to take control of their data quality. The result is happier analysts, faster insights, and analytics that scale across the business.

Try out Bigeye's self-service approach to data quality today by requesting a demo today.

share this episode

Resource

Monthly cost ($)

Number of resources

Time (months)

Total cost ($)

Software/Data engineer

$15,000

$540,000

Data analyst

$12,000

$144,000

Business analyst

$10,000

$30,000

Data/product manager

$20,000

$240,000

Total cost

$954,000

Role

Goals

Common needs

Data engineers

Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.

Freshness + volume
Monitoring
Schema change detection
Lineage monitoring

Data scientists

Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.

Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing

Analytics engineers

Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.

Lineage monitoringETL blue/green testing

Business intelligence analysts

The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.

Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing

Other stakeholders

Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.

Integration with analytics toolsReporting and insights