QuadraByte

Snowflake Summit 2022 Recap – Top Features and Products

With our first Snowflake Summit behind us, our team was genuinely amazed and energized by everything the conference offered. We had the opportunity to meet with some fantastic attendees, watch excellent speaker sessions, and were blown away by the many products built on top of Snowflake. More than anything, Snowflake continues to impress by announcing some new features, big and small.

In case you missed the Summit and could not attend any of the live streams, I have compiled a list of some of the highlights from our week-long adventure into the Data Cloud.

Feature Announcements

One thing I love about Snowflake is its commitment to continue providing the best experience possible while helping companies of all sizes solve everyday problems with increasing ease. Here are some prominent features that Snowflake showed off during their Basecamp expo.

Snowpark for Python

I have always been envious of Java and Scala users due to their early adoption into Snowpark, the first languages that Snowflake natively integrated into their engine. As a strong Python user with years of experience working with the language, I was excited when Snowflake announced that they would add it as a supported native integration.

Although I knew Python was in Private Beta, which closed before I could get in on the action, the development team was pretty tight-lipped around what all the final releases would entail. Besides the Python API and UDF support, Snowflake announced support for Vectorized UDFs, Table Functions, and Stored Procedures. Coupled with the announcement a few months back about support for Anaconda integration, Snowflake has quickly become a significant player in the data science space.

Introducing Snowpark for Python means you can do Machine Learning (ML) modeling and training, eliminate the overhead of managing versions and library dependencies, and use native Snowsight worksheets. In short, all of the tools you need to build and deploy rapidly.

Streamlit Integration

In March this year, Snowflake announced its intent to acquire Streamlit for $800M. The tool provides Python developers with a powerful and fast way to transform Python scripts into beautiful web applications. Developers can deploy an app within as little as 2-3 days and with as many lines of Python code. Merging these two technologies to behave natively opens the doors for the data community to build awe-inspiring applications for both public and internal consumption.

What to see the power of Streamlit and Snowpark for Python? Below is a sample application that shows COVID-19 case by county for California:

import streamlit as st
import pandas as pd
import snowflake.connector

# Create a new Snowflake Connection
snow_conn = snowflake.connector.connect(**st.secrets["snowflake"])

# Generate California county list from COVID dataset
counties = pd.read_sql("SELECT DISTINCT area FROM open_data.vw_cases ORDER BY area ASC;", snow_conn)

# Allow the user to select a county
county_option = st.selectbox('Select a county:', counties)

# Determine the case counts for the last 30 days in the selected county
covid_cases = pd.read_sql(f"SELECT date day, SUM(cases), Cases FROM open_data.vw_cases WHERE date > dateadd('days', -30, CURRENT_DATE()) AND area = %(option)s GROUP BY day ORDER BY day ASC;", snow_conn, params={"option":option})

covid_cases = cases.set_index(['DAY'])

# Generate a line chart for the cases
f"Daily Cases in {option} for the last 30 days"
st.line_chart(cases)

Amazing! You can quickly create a beautiful line chart with such a small amount of code. Our team is excited to incorporate this functionality into our projects and helps to power our “real data, real fast” approach to getting our clients faster insights.

Native Application Framework

Snowflake has dubbed this feature the easiest way to build, distribute and use applications in the Data Cloud. Developers can now build applications using native Snowflake and distribute them to the Snowflake Marketplace, allowing them to be monetized and deployed directly to a customer’s Snowflake account.

I think this is the next big step for the Snowflake ecosystem. Not to compare apples to oranges, but anyone who has been a long-time user of Salesforce may recognize this as a similar move that propelled Salesforce to the forefront of the cloud-platform ecosystem. Will the Snowflake Marketplace become the next “AppExchange” in the cloud economy? I think we are well on our way.

Account Replication

A long-time feature of Snowflake, which our developers use frequently, is the ability to replace data across regions. Snowflake has upped the ante by extending replication support to warehouses, users, roles, and more. Pipeline support was also announced, allowing you to replicate a Pipeline to another region without duplicating the data. While the use cases continue to stack up, the biggest will be support for multi-environment support (which has the potential to supercharge CI/CD) and quick and easy failover to a disaster recovery site.

Apache Iceberg Support

Snowflake has supported external tables since 2019 but has recently expanded this support to include Apache Iceberg. Iceberg has seen a massive uptick in customer adoption by addressing the various issues with object stores, and the community behind the product has continued to grow at exponential rates. Iceberg is open-source and was developed by Netflix before being donated to the Apache Software Foundation.

Apache Iceberg opens developers to a world of native support for data migrations and query-in-place support and expands the horizons for data compliance. Now data can stay in a compliant Iceberg instance and be queried directly within Snowflake to enrich data and create a more vibrant analytical picture of your data.

Unistore – Hybrid Tables At Work

Unistore is a workload that delivers a modern approach to working with transactional and analytical data together in a single platform. Hybrid tables were created to support and power Unistore and to support common transactional capabilities for application devs. With interoperability support, you can join standard and hybrid tables in your queries to level up your analytical queries.

Say goodbye to data silos and say hello to a new world of insights.

Governance

Two new types of governance were announced during the Summit. The first was Cost Governance, which allows you to assign a resource group to Snowflake objects for cost-tracking with your customers or other departments within your organization. Many data teams support many end-users, both customers and departments or teams, and historically have struggled to track cost consumption properly. This means that most data teams take a huge budget hit as they cannot accurately allocate budget in the form of internal and external chargebacks to offset the cost.

The second to be announced is tag-based data masking. Developers can assign tags to columns and associate those tags with a masking policy. Tag-based masking allows you to maintain data privacy and quickly assign policies to new columns in your data cloud.

Notable Booths

Snowflake native support is excellent, but the Expo hall showcased some exceptional talent and products built for the data cloud. I tried to stop by all the booths to get a feel for what was being offered. While not an exhaustive list of everything that was there, these were some of my favorite demos that I attended.

Habu

Habu positions themselves as the “date clean room” you never knew you were missing. Their product was built with data privacy and regulations in mind. Its mission is to empower companies to utilize and understand data fully, collaborate with others, and thrive in a privacy-first era.

Why does Habu stand out to us? As a leader in M&A projects and the creators of one of the most robust data migration methodologies, we know that going into any project kick-off, there is a very high risk that data privacy concerns will handcuff our team. The biggest of these is the need to keep data within approved systems. Habu can potentially remove the complicated middle-server architecture to meet company compliance by introducing a data clean room where we can quickly and safely move data from a legacy system to Snowflake.

Datorios

Datorios removed complex data pipelines by simplifying the ETL process with a robust library of native connectors and support for Python-based APIs. Data transformations are a struggle for any data developer, and many tools are bulky, hard to learn, and are accompanied by a high price tag.

What I liked most about Datorios was the easy-to-use interface, how quickly I could pick up on what the demo operator was showing, and a very reasonable price tag.

Bodo

Bodo supercharges your Python code by introducing parallel computing for data analytics. Their scaling functionality lets you develop in a small prototype environment and scale it to endless production potential without having to rewrite your code or use packages with heavy overhead. Perhaps most striking, initial benchmarks put Bodo at exceeding Snowflake in performance, outpacing PySpark by more than 20x, and coming in at a tenth of the cost of an AWS EC2.

Modelbit

Any data developer will tell you that they are envious of the rest of the programming world regarding rapid development and deployment. Programmers for a long time have had a wealth of DevOps tools at their fingertips, supercharging their CI/CD process, and data developers have been largely forgotten. While several DataOps platforms are popping up in the market, some built explicitly for Snowflake, Modelbit stood out from the crowd.

In short, Modelbit allows you to train your Machine Learning models and deploy them all within the cloud. Think of it like MLOps, and you’ll be close to accurate. What stands out about their platform is how easy it is to use! It is a three-step setup. Don’t believe me? Head over to their website and check them out.

Summary

Snowflake and their Built for Snowflake partners have outdone themselves. There was so much to learn from this event, and we didn’t even dive into the fantastic sessions by Snowflake customers on how they are using the data cloud to power their business. The products and features go far beyond data warehousing and show how Snowflake is driving the art of the possible in the Intelligence Economy.

Other Articles

intelligence-economy
What is the Intelligence Economy?
Python-PostIt-Note
5 Simple But Powerful Python Scripts For Cleaning Data
7 ways analytics provides value to your business
7 ways analytics provides value to your business