Product

December 13, 2021

The complete data quality workflow: Introducing Issues and Dashboard

min read

Bigeye Staff

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Get the Best of Data Leadership

Subscribe to the Data Leaders Digest for exclusive content on data reliability, observability, and leadership from top industry experts.

Stay Informed

Sign up for the Data Leaders Digest and get the latest trends, insights, and strategies in data management delivered straight to your inbox.

Get Data Insights Delivered

Join hundreds of data professionals who subscribe to the Data Leaders Digest for actionable insights and expert advice.

With the release of Issues and Dashboard, Bigeye provides a complete data quality workflow – starting with a smarter way to prioritize and resolve data quality issues and ending in a high-level view of data quality. When you resolve and document data quality problems in Issues, your resolution metadata powers Dashboard, giving you a top-level view of data quality, allowing you to see detection and resolution time trends, identify problem areas, and get a holistic view of the health of your data.

Bigeye already provides great ways to communicate and align on data quality with stakeholders – through data quality SLAs and integrations with data catalogs like Alation. Now, Issues and Dashboard ensure that there is seamless communication and understanding between the people resolving data quality issues and those managing the data team.

To get a better sense of how Issues and Dashboard create a complete data quality workflow, let’s take a look at a typical day-in-the-life of a data team leveraging the Bigeye data observability platform.

Starting with easier prioritization

A data engineer on the data team receives alerts from Bigeye and logs in to see that Bigeye has detected three new data quality Issues: the count is too high on order.id; the max is too high on event_logs.load_time, and the null (%) is too high on payments.charge_amount. Each issue could be important, but which to start on?

Rather than guessing which is most significant, the data engineer asks their manager to help prioritize their efforts. The engineering manager checks their Dashboard in Bigeye and can tell that the issue affecting order.id is important to a business-critical SLA and directs the data engineer to start there.

The engineering manager also sees that their data quality coverage is quite low at only 24% and asks the data engineer to have Bigeye monitor more metrics. With Bigeye, data quality coverage can be easily increased by either enabling Autometrics on specific tables with a few clicks or telling Bigeye to automatically deploy metrics across multiple tables at once.

Leading to smarter resolution

The data engineer now digs into the count issue with order.id. With Issues, they can see that it’s related to a transformation error that occurred a few days prior.

Once the fix is tested, the data engineer marks the alert as a “good alert,” helping to reinforce Bigeye’s anomaly detection. The data engineer then closes the Issue and documents the source of the problem and the fix, ensuring that anyone who encounters the Issue in the future will be able to resolve it quickly.

Finishing with a high-level understanding of data quality

From Dashboard, the engineering manager, and anyone else on the data team, can see that the order.id issue has been resolved. The order health SLA also reflects the resolution.

The data manager can also see that data quality monitoring has been increased to 72% and that 12 Issues have been resolved from the last month. This information informs the data manager that the team has been resolving Issues quickly and that they now have broader coverage of their data.

If you’d like to learn more, check out the product pages for Issues and Dashboard. Or, if you’re ready to see how Bigeye can help your team address data quality and create more reliable data pipelines, we’d love to give you a demo.

share this episode

Resource

Monthly cost ($)

Number of resources

Time (months)

Total cost ($)

Software/Data engineer

$15,000

$540,000

Data analyst

$12,000

$144,000

Business analyst

$10,000

$30,000

Data/product manager

$20,000

$240,000

Total cost

$954,000

Role

Goals

Common needs

Data engineers

Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.

Freshness + volume
Monitoring
Schema change detection
Lineage monitoring

Data scientists

Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.

Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing

Analytics engineers

Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.

Lineage monitoringETL blue/green testing

Business intelligence analysts

The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.

Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing

Other stakeholders

Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.

Integration with analytics toolsReporting and insights