Product
-
December 13, 2021

The complete data quality workflow: Introducing Issues and Dashboard

To get a better sense of how Issues and Dashboard create a complete data quality workflow, let’s take a look at a typical day-in-the-life of a data team leveraging the Bigeye data observability platform.

Bigeye Staff

With the release of Issues and Dashboard, Bigeye provides a complete data quality workflow – starting with a smarter way to prioritize and resolve data quality issues and ending in a high-level view of data quality. When you resolve and document data quality problems in Issues, your resolution metadata powers Dashboard, giving you a top-level view of data quality, allowing you to see detection and resolution time trends, identify problem areas, and get a holistic view of the health of your data.

Bigeye already provides great ways to communicate and align on data quality with stakeholders – through data quality SLAs and integrations with data catalogs like Alation. Now, Issues and Dashboard ensure that there is seamless communication and understanding between the people resolving data quality issues and those managing the data team.

To get a better sense of how Issues and Dashboard create a complete data quality workflow, let’s take a look at a typical day-in-the-life of a data team leveraging the Bigeye data observability platform.

Starting with easier prioritization

A data engineer on the data team receives alerts from Bigeye and logs in to see that Bigeye has detected three new data quality Issues: the count is too high on order.id; the max is too high on event_logs.load_time, and the null (%) is too high on payments.charge_amount. Each issue could be important, but which to start on?

Rather than guessing which is most significant, the data engineer asks their manager to help prioritize their efforts. The engineering manager checks their Dashboard in Bigeye and can tell that the issue affecting order.id is important to a business-critical SLA and directs the data engineer to start there.

The engineering manager also sees that their data quality coverage is quite low at only 24% and asks the data engineer to have Bigeye monitor more metrics. With Bigeye, data quality coverage can be easily increased by either enabling Autometrics on specific tables with a few clicks or telling Bigeye to automatically deploy metrics across multiple tables at once.

Leading to smarter resolution

The data engineer now digs into the count issue with order.id. With Issues, they can see that it’s related to a transformation error that occurred a few days prior.

Once the fix is tested, the data engineer marks the alert as a “good alert,” helping to reinforce Bigeye’s anomaly detection. The data engineer then closes the Issue and documents the source of the problem and the fix, ensuring that anyone who encounters the Issue in the future will be able to resolve it quickly.

Finishing with a high-level understanding of data quality

From Dashboard, the engineering manager, and anyone else on the data team, can see that the order.id issue has been resolved. The order health SLA also reflects the resolution.

The data manager can also see that data quality monitoring has been increased to 72% and that 12 Issues have been resolved from the last month. This information informs the data manager that the team has been resolving Issues quickly and that they now have broader coverage of their data.

If you’d like to learn more, check out the product pages for Issues and Dashboard. Or, if you’re ready to see how Bigeye can help your team address data quality and create more reliable data pipelines, we’d love to give you a demo.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.