Thought leadership
-
September 15, 2022

Data observability: Build or buy?

Companies looking to invest in data infrastructure tooling often face an age-old conundrum: do we build it or do we buy it? Let's explore.

Kyle Kirwan

Companies looking to invest in data infrastructure tooling often face an age-old conundrum: do we build it or do we buy it? In this blog post, we’ll walk you through your options. When it comes to data observability, there’s no one-size-fits-all right answer, but there are indicators that one particular choice will best suit your team.

The data observability landscape

Over the past few years, data infrastructure startups have proliferated – data warehouses, data transformation, and data observability tools. That means buyers like yourself have a wealth of options to choose from in the marketplace.

On the other hand, it’s rare for a purchased solution to fit your organization’s needs right out of the box. So build your own, using all of those talented engineers and data experts at your disposal! You’ll have control and flexibility - but you’ll have to spend precious engineering resources and project planning time.

It might come as no surprise that in general, we believe organizations should buy rather than build. That’s not just because we like acquiring new customers (although of course we do). It’s also because data tools solve problems that fall outside most organizations’ core businesses.

If you’re a fintech startup, are you in the business of catching a data outage? Probably not; it’s a commoditized service. As a fintech company, you gain no extra plaudits for building that solution on your own. Look to the data experts to do it for you, just as others look to you to process payments or exchange money across international borders. Play to your strengths.    

Setting up the decision framework

Mark Grover runs the data catalog startup Stemma, and he previously served as a product manager at Lyft. He suggests that prior to quantitative cost/benefit analysis, set expectations for a successful project. After that, look at the reality of what it’ll take, whether that’s in-house engineering resources, or integrations, to actually make the project a success.

“Knowing those two things early and having that clear dialogue is super important. As we all know, disappointment occurs when expectations don’t match reality,” Grover said.

In terms of measuring success, there are two metrics typically used:

  • Adoption – a certain number of people are using the product
  • NPS – a certain number of people love the product

With expectations defined and circumstances considered, you can now move on to project planning. What is at your disposal? Is it time (“Need this done by Q2”) or engineers (“We have a team of four engineers”). While you calculate, beware that even if you’re buying and not building, implementation may require that you “pony up some resources and organizational muscle,” Grover says.

The major considerations

1. Is the observability technology core to your business?

Is the technology core to your business/critical to your business success? If not, go towards a “buy” approach. If, on the other hand, you’re a large business with special data needs, or if what you’re building is relevant to your core product, lean into a “build” approach.

Take Uber, who built their own mapping system. The bespoke “build” approach made sense because maps are a chokepoint to their core business. By being cut off from Google’s mapping API, the business could suffer and never recover. It made sense to build an in-house mapping system that Uber could rely on to stay consistent.

However, another business who might use a map as a “nice-to-have” need not build their own complex mapping system from scratch.

2. Does the tool’s framework fit the organization, or will the organization need to adapt?

When buying a data tool off-the-shelf, it often comes with a certain framework. For example, the tool might assume that all your data is going into a cloud data warehouse. It was designed according to function optimally when that is the case.

If your data isn’t going into a cloud data warehouse, tools that run on top of data warehouses won’t be a fit for you.

3. What is the cost and time-to-value?

Sure it’s possible to manually check dashboards and build custom queries, but all of these activities take time. We estimate that building a custom solution in-house can cost hundreds of thousands of dollars (see below), and take months to build.

Not to mention the strain on your precious data and engineering teams. Building in-house observability means redirecting their time from customer problems to internal goings-on. And unless you suffer from the very rare issue of bored and underworked technical teams - you’d probably prefer to keep their focus on customer-facing issues!

4. Does the vendor have an experienced, communicative, and flexible team?

With a great vendor, you may be able to reap some of the benefits of customization, without having to build them in-house. If your data observability vendor has a track record of supporting scaling, complex companies, they may be the perfect partner for you. Ask about their support, training, customer success, and integrations. Do they have a roster of enterprise customers?

If there are features that are important for your organization, ask about them during the purchase decision-making process: “It’s fair to ask, I need feature ABC, I need compliance XYZ, when can we put that in?” Grover said.

What does their product roadmap look like in the near future? You might  also benefit from feature requests that the vendor gets from their entire customer base, even when it's not something you would have thought about on your own.

Estimating custom solution costs

While it’s not an exact science, you can estimate whether to build or buy by considering your holistic investment. That investment goes beyond the dollars you spend. In general, you want to understand the time-to-value of a custom solution. What's the length of time it will take to start seeing an ROI on your investment?

For a straight build-versus-buy comparison, also try to quantify some or all of the following considerations:

Build bespoke data observability in-houseBuy and implement data observability softwareResourcesInternal resources: Team (project and product managers, engineers, analysts) X Time (Weeks to rollout) X Cost (Salaries)IT resources for implementation, any customization, training, integrations, and maintenanceMaintenanceUpgrades, bug fixes, new feature development, scalingSoftware maintenance costsHardwareServers, database licenses, storageCustomizations, integrationsTraining and onboardingChange management, knowledge transfer, onboarding stewardshipOnboarding, supportOther feesDevelopment, hostingSupport, additional users, volumeOpportunity costsTaking time away from customer-focused developmentMissing out on super personalized, customized features

In addition, look at the advantages of building versus buying, and how much they’re worth to you. When it comes to building, can you put a dollar amount on any of the following factors?

  • Personalization - Bespoke builds allow your company to build the perfect tool for your business needs, from the ground up. They’ll fit around your organization’s unique process and value proposition.
  • Quick scaling - With an in-house build, you can get around otherwise rigid design patterns in a third-party software. Your software will be customized around your unique workflows and as a result, may be able to scale faster.
  • Owning IP - Is a data observability solution going to add value to your core business, if you own the IP and the product roadmap? It very well might, if you work in an adjacent space.

Can you put a dollar amount on any of the following advantages of buying the third-party data observability platform?

  • A finished product - The product has already been built, tested, deployed into the market, and presumably already has customers that have provided feedback. The team has been in place to respond to feature requests. You don’t need to plan for any of that; it’s been done for you.
  • Faster time to value - Your business has all sorts of competing priorities, but a third-party platform will keep your data observability as a priority. After all, you’re a valuable customer. You may want to trade in bespoke features and flexibility for faster time-to-production and time-to-implementation.
  • Support resources - When you purchase a third-party platform, you purchase all of the customer support and resources that come with it. From training to troubleshooting to onboarding and knowledge bases, it’s all there for you. If you’re experiencing a challenge, odds are your vendor has seen it before.

So the holistic picture is important, but really, you're probably mostly concerned with your bottom line. How much time, and how much money exactly, will this cost you? Use the following framework to estimate, assuming the following:

  • We’ll assume you're a mid-sized business with technical teams of 5+ engineers, business analysts, and project/product managers.
  • This tool will have a build phase, an implementation phase, and a maintenance phase.
  • In the build phase, the data and/or software engineer resources will be scoping and building out the main product. You will also likely have data analysts and business analysts conducting research, speaking with adjacent teams, and gathering requirements. Here, we’ll assume a 9 to 12 month commitment.
  • In the implementation phase, you can assume that data and/or product managers will be overseeing implementation over the data infrastructure, plus coordinating any integrations and business impacts downstream. Here, we’ll assume a 3 to 6 month commitment, from the beginning to the middle of the project.
  • In the maintenance phase, data and/or product managers will oversee timely upgrades, consistent functionality, and productive business value. This phase will likely be ongoing, but we’ll assume an upfront push from the middle of the project toward the end, through implementation and beyond. In short, if you do not have close to $1 million and around one year’s time to build out a bespoke solution, you may want to consider a “buy” option. For those choosing to buy, they’ve calculated that data engineering time is too valuable to be spent building a non customer-facing tool that takes more than one year to start delivering ROI.

Put another way, your costs might look something like what follows:

ResourceMonthly costNumber of resourcesTime (months)Total costSoftware / data engineer$15,000312$540,000Data analyst$12,00026$144,000Business analyst$10,00013$30,000Data / product manager$20,00026$240,000Total cost$954,000

In short, if you do not have close to $1 million and around one year’s time to build out a bespoke solution, you may want to consider a “buy” option. For those choosing to buy, they’ve calculated that data engineering time is too valuable to spend it building a non customer-facing tool that takes more than one year to start delivering ROI.

Considering open source solutions

The option to utilize open source solutions has often been billed as an “in-between” option between buy and build. There are some obvious advantages to relying on open source solutions. Advantages like:

  • Open source enables technology agility - offering crowd-sourced, varying ways to solve problems. You have a lot of agency to create capabilities yourself based on the collective wisdom of the knowledge base.
  • Open source is cost-effective - Open source solutions are typically cheaper than proprietary solutions. They might make sense for super budget-conscious organizations who don’t have any wiggle room on tooling spend.
  • The open source community prioritizes security - Commercial open source solutions have a good record when it comes to information security. The community prioritizes information security and in general, vendors have high responsiveness rates for identifying and fixing problems.

Some other, not-so-positive considerations to keep in mind include:

  • Open source solutions are not free - While the code itself is free, open source solutions need to be deployed and managed – this means AWS server costs and engineering time. Keeping your open source solution up to date, secure, and working well can quickly become a very expensive endeavor.
  • Open source solutions don’t offer complete control or flexibility - While you can certainly choose to fork the code at any point and build off the open-source solution, in reality, you’re often dependent on the road map of contributors, and can be left out in the cold if key contributors decide to leave the project. Maintainers of even popular open source projects are paid nothing, which means you can’t always expect prompt customer support or bug fixes.

Building and buying: Two case studies

Buy: Perpay

Perpay, a smaller fintech company of only fifty employees, needed a reverse ETL tool to move its data from its online services into its marketing tool.

The team initially decided to build, but integrating with the various data source API’s – maintaining the infrastructure that would do back-offs, respect rate-limits, etc, ended up being too cumbersome.

The answer ultimately led them to buy a reverse ELT tool, Census, which had a built-in integration with the marketing tool used.

Build: Lyft

Lyft was looking for a data catalog and data governance product to more effectively organize the massive amounts of data that the organization was generating. There were three options under evaluation:

  • buying off-the-shelf data catalog solutions from Atlassian or Calibra
  • adopting Atlas, an open source solution
  • building their own data catalog tool

Ultimately, Lyft decided to build their own tool, because the existing off-the-shelf solutions required a certain philosophy about organizing data that did not fit with Lyft’s at the time.

Changing course

Regular re-evaluation can help keep companies nimble and allow them to let go of data tooling that has become outdated. A common scenario, for example, is that in an initial evaluation, the company decides that off-the-shelf solutions are not sufficiently mature or are missing certain features. But over the course of the next few years, the third-party solution improves, to the point where it eventually makes sense to switch over.

Cost/benefit analyses should not only happen at the start of the project but should continue throughout implementation and maintenance. Once a year, companies should update their list of requirements, and determine whether the tooling they have, either bought or built, is fulfilling them. While pivoting away from legacy built solutions can be a difficult, even emotional process, this sort of objective performance evaluation makes it part of the natural growth process of a company.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.