Thought leadership
-
June 25, 2024

Conference Recap: Key Trends from Snowflake and Databricks

A recap of key trends from the 2024 SnowBricks conference season.

Kyle Kirwan

Attended by thousands in San Francisco, the 2024 Snowflake and Databricks conferences were a melting pot of innovation, with Snowflake and Databricks making significant announcements that will shape the future of data management, AI, and analytics.

Here, we'll recap the key highlights from both conferences.

Snowflake's Big Announcements: Polaris, Cortex, and More

Polaris Catalog Integration with Iceberg

One of the major announcements from Snowflake was the introduction of the Polaris catalog, which integrates with Iceberg. This new feature allows users to access Iceberg data directly through Snowflake and other technologies. The goal is to open-source the Polaris catalog within 90 days, promoting a more accessible and collaborative data storage format. This interoperability is a game-changer, enabling data to be stored in Iceberg format and accessed by various compute engines like Spark, Snowflake, and Trino.

Cortex AI Suite

AI was a central theme at the Snowflake conference, with the introduction of Cortex, a suite of services designed to simplify AI and ML operations. Cortex includes features like chatbot creation through Snowflake Services, providing a seamless interface for AI tasks. The live demo showcased the ease of building AI applications with minimal SQL commands, highlighting Snowflake's commitment to making AI accessible within its ecosystem.

Governance and Observability with Horizon

Snowflake also emphasized data governance and observability through its Horizon suite. This set of tools includes features for labeling, lineage, privacy, and security, all integrated with the Polaris catalog. The ability to automate tagging and enforce governance policies directly within Snowflake ensures comprehensive data management. Additionally, the interoperability with external tools like Jira, Slack, and email for alerting enhances the observability capabilities of Snowflake.

NVIDIA Integration with Nemo Retrieval Framework

In a significant move, Snowflake announced the integration of the Nemo retrieval framework from NVIDIA, enhancing its AI capabilities. This integration aids in the efficient tokenization and embedding of unstructured data, making it easier to build AI applications. The collaboration with NVIDIA ensures that Snowflake users can leverage powerful AI models and tools directly within the Snowflake environment.

Databricks' Innovations: Unity Catalog, Mosaic AI, and More

Unity Catalog Goes Open Source

Databricks made a bold statement by open-sourcing Unity Catalog during their keynote. This move aims to eliminate fragmented governance and promote interoperability. Unity Catalog serves as a multimodal governance layer, supporting data, ML models, and AI within a single catalog. This comprehensive approach ensures seamless integration and management of diverse data assets.

Mosaic AI: Democratizing AI

Mosaic AI was another highlight, showcasing Databricks' commitment to making AI accessible across organizations. Mosaic includes an agent framework for building AI applications, a model training toolkit, and governance features. The live demo featuring a Shutterstock image model built on Mosaic illustrated the platform's potential to leverage proprietary data effectively.

Lakeflow: Simplifying Data Pipelines

Databricks introduced Lakeflow, a GUI interface for creating data pipelines natively within Databricks. This tool simplifies the process of building and managing data pipelines, from extraction and loading (via point-and-click CDC connectors) to transformation and orchestration. The integration of AI-powered suggestions further enhances the efficiency and accuracy of pipeline creation.

AI & BI with ABI

Databricks also announced ABI, a low-code data visualization tool with AI agents for building visualizations. This feature includes Genie, an AI-driven natural language querying tool that learns and adapts to user queries. The focus on making data visualization and analysis more intuitive and accessible aligns with the broader trend of democratizing data and AI tools.

Key Takeaways and Trends

AI and ML Integration

Both Snowflake and Databricks are heavily investing in AI and ML integration, emphasizing the importance of making these technologies accessible and efficient within their platforms. The focus on simplifying AI operations and integrating AI tools directly into data platforms is a clear indication of where the industry is headed. The era of enterprise AI is here, and organizations should leverage these tools to stay competitive.

Data Governance and Interoperability

Data governance remains a critical theme, with both companies enhancing their governance features. The move towards open-source catalogs and interoperability ensures that organizations can manage their data more effectively while avoiding vendor lock-in. This trend towards comprehensive, integrated governance solutions is likely to continue.

Vertical Integration and Ecosystem Expansion

Both Snowflake and Databricks are expanding their ecosystems, integrating more features and tools to provide a comprehensive data management solution. This vertical integration strategy aims to make their platforms a one-stop-shop for all data needs, from storage and compute to AI and governance.

The Rise of Small Language Models

An emerging trend highlighted at the Databricks conference was the focus on small language models for specific tasks. This approach contrasts with the pursuit of AGI and emphasizes the practical application of AI for targeted use cases. This trend is likely to gain traction as organizations seek efficient and effective AI solutions. Small language models can provide significant value with lower computational costs and faster deployment times.

Conclusion

The announcements at Snowflake and Databricks have set the stage for the next wave of data technology. With significant advancements in AI integration, data governance, and ecosystem expansion, companies are pushing the boundaries of what is possible.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.