DRE Con 2022: That's a wrap
The Data Reliability Engineering Conference (DRE Con) hosted its worldwide virtual conference for the second consecutive year on May 25th and 26th. Here's what happened.
The Data Reliability Engineering Conference (DRE Con) hosted its worldwide virtual conference for the second consecutive year on May 25th and 26th. Over two days, a multitude of experts discussed all things data reliability engineering: the tools, processes, and culture of bringing engineering best practices to the world of data.
With hands-on workshops, presentations, live Q&As, and plenty of networking, the conference brought practitioners together to learn and share. The data reliability community has seen staggering growth over the past few years, and we were delighted to see all of the engagement, connection, and lightbulb moments happening at DRE Con.
If you missed out, you can watch the action on-demand below. Read on for a recap and make sure to tune in to any talks or workshops that intrigue you!
DRE Con 2022 was organized around the 7 principles of Data Reliability Engineering (DRE), with speaker keynotes and workshops bookending the day.
Kicking things off on Day 1, we had Shailvi Wakhlu, the senior director of data at Strava, the largest sports community in the world. She leads a large team of analysts and engineers. She broke down what makes data good or bad, why we should care, and how we solve for the highest quality data. We particularly enjoyed her concise and actionable data sanity checklist, which is useful for teams of all shapes and sizes:
- What is the problem?
- Where does it occur?
- When does it occur?
- Why is it happening?
- Who caused it and who can fix it?
- How will it be fixed?
Under the first principle, Embrace Risk, we heard from two experienced speakers. Christianna Clark is the managing supervisor and lead machine learning and data engineer at Methods+Mastery. She talked about embracing the “imperfect” and building risk into your data reliability planning. She shared some interesting thoughts and experiences from her storied career, like what she believes are the three qualities the best engineers embody. She told us, “It’s impossible to build something perfect and as a matter of fact – you shouldn’t.” We also heard from Miriah Peterson, the data reliability engineer at Tailscale. In her talk, “Where we are going, we don’t need standards”, she focused on contextualizing and pinpointing the problems in the data pipeline, before we leap into action and start building fixes. She noted, “Engineers are pushed to create, get to production, and get things working. We don’t always take the time to understand.”
Under the second principle, Set Standards, we heard from two speakers who focused on how to define, share, measure, and trust the levels of data quality we work with. Sudhir Tonse, engineering leader at DoorDash, talked to us about the challenges and approaches to achieving high quality data standards in large organizations. The first step, according to him, is about understanding the data ecosystem and all of the personas within it, and he brought us through how to start doing that.
Our other speaker, Scott Shi, is the founder of ZettaBlock. He spent two years building Uber’s safety data report, which included managing and classifying billions of support tickets. In 2019, nearly 4 million Uber trips happened every day in the US — more than 45 rides every second. Only 0.0003% of trips had a report of a critical safety incident. So how did Scott distill and simplify this volume of data? He shared his wisdom and insights from that process.
The third principle of DRE, Reduce Toil, had two speakers. Peter Fishman, the co-founder of Mozart Data, talked all about reducing toil in the pipeline, where “solving problems brings the prize of more problems.” From baseball to Ralph Waldo Emerson, he wove in a lot of real-life examples and use cases for reducing toil and freeing up teams to understand their data.
We also heard from Randy Pitcher, a solutions architectfrom dbt Labs. With Randy, we walked through a live build of a tested and automated dev -> test -> prod promotion strategy using dbt. We focused on removing the toil and fragility of manual environment promotion, as well as the basics of automating quality checks and easier change reversions.
Then, to round out the first day, our co-founder and CEO Egor Gryaznov walked us through the process of building out our own data observability solution on Snowflake. Participants helped each other troubleshoot and asked questions in real-time, and we walked away with a practical application of what we’d learned throughout the day.
Day 2 started off under the fourth principle of DRE, Monitor Everything. Our speaker Harish Srigiriraju is a senior technical product manager at Verizon. We got to borrow his expertise as he walked us through the most common monitoring approaches and challenges faced by data and machine learning teams. He recommended actionable ways to focus the monitoring process with access controls, alerts through auto email, comparing actuals vs. targets, and advanced analytics for KPIs.
Meanwhile, Chris Handy and Dan Lynn from Crux dissected the ways you can manage data quality when it’s outside your control. If external data are the lifeblood of great decision-makers. As we look to ML and AI to guide those decisions, how do we make sure to avoid “garbage in, garbage out” data practices?
The fifth principle of DRE is Use Automation. Speaker Kevin Kho, a community engineer at Prefect, is an expert in creating code paths to handle data failures. He walked us through basic workflow orchestration functionality, highlighting features that help data teams handle failure with grace and agility.
Linda Liu, head of data at HyreCar, gave us a deep dive into data analytics, and how to drive business utilization and end user value. There are so many pieces of the data puzzle that can positively impact business: reduced silos, faster insights, increased utilization, an improved data culture, and more. We walked away with tons of ideas and insights on how to achieve some of those aims.
Moving along to our sixth principle of DRE, our two next talks fell under the umbrella of Control Releases. Speaker Pavani Rangavahjula is a senior data engineer at Ecobee. Safe data pipeline releases protect teams from so much - uncertain behaviors, unwanted disruptions, and instability. But how do you ensure safety? Pavani walked us through the process. What does the future look like for data pipeline safety? Among another things: automated integration testing, fixed templates for dataflow jobs, standard image testing, and beyond.
Additionally, our co-founder and CTO Egor Gryaznov and Loc Nguyen from Mayan walked us through ensuring that data across different tables and transformations stays reliable. To increase confidence through data CI, it takes powerful data reliability tools. We got a firsthand look at some of them in this talk.
The last principle of DRE is Maintain Simplicity. Along those lines, Segun Adelowo is a lead machine learning engineer at Interswitch, which is Nigeria’s largest payment processor. He brought us through the biggest data-centric challenges of a global payment processor, and how innovations in databases, ETL services, and beyond help solve them. What does it take to get good data in the payment processing space? Management buy-in, agreed-upon SLAs, data dictionaries and service catalogs, and more.
We also had the pleasure of hearing from Glen-Erik Cortes, an engineering manager at Royal Caribbean. According to his presentation, deployment of machine learning solutions remains an industry challenge, with nearly 80% of projects never making it into production. But ML Ops makes it easier for ML teams to deliver solutions and get them out the door. In this talk he covered accelerating ML solutions with sandboxed environments, version control, and modular design.
We wrapped up the day with an AMA with Kyle Kirwan of Bigeye, Miriah Peterson of Tailscale, and Jerry Shen of OpenSea. Our audience had the chance to pose any questions about data reliability, observability, and tying data back to the business. It was a lively chat that covered so much, from the future of data engineering to monitoring versus observability to how to build an observability platform from scratch to getting into data science as a career. They noted that “all data engineers are self-taught” and that it’s an exciting space, where more and more data is becoming public and if you can analyze and make use of that data, you have all kinds of opportunities that lie ahead, both for your career and your organization.
If you missed out on the action, you can check out the sessions on-demand here. We’re so grateful to all the speakers and attendees that made this event happen. We’ll see you at #DRECon2023!
Monitoring
Schema change detection
Lineage monitoring