https://blog.devgenius.io/why-is-snowflake-so-expensive-92b67203945

 
Get unlimited access
Open in app
 
 
Home
 
Notifications
 
Lists
 
Stories
---------------------------------------------------------------------
 
Write
 
Dev Genius
Published in
 

Dev Genius

 
Stas Sajin
Stas Sajin
Follow

Aug 16

*
9 min read

Why is Snowflake so expensive?

I was recently reviewing Snowflake's quarterly financial statements
and I came to the realization that their product is so well loved
that they could stop adding new customers and still get double-digit
revenue growth. This chart from Clouded Judgement really highlights
how sticky the product is: a customer that joins a year ago and
spends $1 is paying out well over $1.7 a year later. Snowflake
doesn't have yet an expansive catalog of products that they can
upsell or cross-sell, so the majority of this new revenue is driven
because of increased usage, low churn, and because ... Snowflake's
pricing model doesn't scale.

[0]
Clouded Judgement; This figure shows that Snowflake has a very high
net new revenue indicating that once you start using the platform,
you start expanding your usage.

Imagine the outcome of this conversation between a performance
engineer working at Snowflake and RevOps management.

  * Engineer: "I figured a way to improve query performance by 20%."
  * RevOps: "Great, we're not going to use that..."

Snowflake has no incentive to push a code change that makes things
20% faster because that can correspond to 10-20% drop in short-term
revenue. In a typical Innovator's Dilemma, Snowflake prioritizes
other things that generate an ever larger menu of compute options,
like Snowpark and data apps built on Streamlit, that will bleed your
organization dry.

This is a natural course through which a SaaS model functions and
many other companies have their own non-scalable pricing structures.
For example, what person at Google thought it was a good idea to let
BigQuery perform a full table scan when running this query?

// most databases scan 10 rows of data when running this query while Biquery will scan the full tableselect * from table limit 10 

The choice to scan all the data might seem dumb, but it's not when
you realize that they charge per Gb of data scanned, and it's in
their interest to leave optimization gremlins in.

What Snowflake could do better?

As an investor, I expect Snowflake to show amazing profitability and
record-breaking revenue numbers. As an Engineer, if Snowflake
continues on the current path of ignoring performance, I expect them
to lose share to the open-source community or some other competitor,
eventually walking down the path of Oracle and Teradata. Here are a
few things I think they can do to stay relevant in five years.

Disclose Hardware Specs

Snowflake charges you based on your consumption. You're not buying
any specific hardware and instead pay as you go using virtual
warehouse credits that have no hardware definitions. For folks in
technical or engineering backgrounds, this is a red flag. Whether
your query runs on a machine with SSD or hard drive, low or high RAM,
slow or fast CPU, high or low network bandwidth makes a measurable
impact on performance. Snowflake is very secretive about their
hardware and when I interacted with a Sales team during migration
from Redshift, I could never get any SLAs on their query performance
nor information on the hardware specs. This is distinctly different
from Redshift, Firebolt, and Databricks which are very transparent
and provide more flexibility in customizing your performance through
hardware.

Lack of transparency can also lead to bad incentives where Snowflake
could revert to less optimal machinery due to internal pressures to
improve its margins.

Not adopting benchmarks

Several months ago, DataBricks published a study highlighting that
they outperform Snowflake on the TPC-DS benchmark, to which Snowflake
posted a rebuttal. Snowflake's statement on benchmarks was very
clear:

    In the same way that we had clarity about many things we wanted
    to do, we also had conviction about what we didn't want to do.
    One such thing was engaging in benchmarking wars and making
    competitive performance claims divorced from real-world
    experiences. This practice is simply inconsistent with our core
    value of putting customers first.

This is a very shortsighted statement and contradicts why data
warehouses became so popular and what customers care about. Data
warehouses became popular precisely because they delivered on
performance promises, where you don't have to wait for days or weeks
for reporting to be done. Benchmarks, although not ideal and can be
gamed, although can be divorced from reality, although can suffer
from reproducibility and generalizability, are starting points for
driving discussions around performance. If I pick a technology that
will cost my employer north of $1M/year, I would always insist on
seeing some numbers. I hope Snowflake reverses its stance and can
start implementing benchmarks as part of its release notes. An
industry that collectively will be worth north of $1T+ desperately
needs closure on benchmarking.

Optimizer gremlins

If you work long enough with Snowflake you'll find several gremlins
that lead to vast performance degradation. I'll give you an example
that I experienced.

Micro-partitioning pruning is disabled based on predicates in a
sub-query, even if the sub-query returns a constant. A query like in
the example below will not benefit from fast pruning, even if you
partition on created_houryet queries like this are all over the place
if you build incremental models or if you use import CTEs.

// this query will perform a full table scan even if you partition // on created_hour
select
   count(*)
from table1
where
   created_hour > (select max(hour) from table2)

If you use DBT, you should really be mindful of this issue because
this query pattern is very common and represents north of 50% of
query costs. I almost never see these types of gremlins in open
source databases (if they exist, they get fixed), yet they are ever
present in enterprise data warehouses. This only makes me wonder
whether there is a performance optimization team that has a voice
within Snowflake.

Improve the workload manager to increase throughput

The query workload manager in Snowflake is inefficient. Let's imagine
you have one warehouse and two users. One user submits a
compute-intensive query while another user submits a small query that
scans very little data. The workload management software in Snowflake
seems to allocate all its resources based on FIFO (first in first
out) pattern. I really hope this is not the case, but after staring
dozens of times at query queues I think it is primarily FIFO. If the
large query runs first, you leave the second user hanging in a queue,
massively decreasing the throughput. A better option is to pause the
big query and relocate the resources to the smaller query or split
the resource consumption so that 80% of capacity goes to the big
query and 20% to the small query. Snowflake recommends two other
solutions instead:

  * Scale out or up your warehouse, so the second user does not
    suffer from huge queue delays. This forces more spending.
  * Or use the newly announced query accelerator that allows you to
    scale on a per query basis. This again forces more spending if
    you don't have good data governance in place.

From my days of administering Redshift, I can definitely say that
Redshift's workload manager was a lot smarter in preventing massively
inefficient queries/users from bringing your system down and forcing
you to scale out. In an era where ML is so advanced and we have so
much metadata on queries and tables, it's surprising to me that
Snowflake has not built something that smartly manages query
workloads. If I can predict that a query will take a while to run
based on the number of columns required, joins present, and the size
of tables involved, why can't a workload manager do that with ML? I
imagine this idea floated around internally within Snowflake, but
because it is not a revenue driver, it was not on anyone's roadmap.

Not providing observability to monitor and reduce costs

Snowflake provides very limited tooling to narrow down on your
biggest cost drivers and perform per role/user level query timeouts
or budget constraints. For example, I would like to do the following:

  * Have a list of users/systems that are responsible for most of my
    costs. Am I dealing with users that have little knowledge of SQL,
    perform cross joins, and leave their queries to timeout? Do I
    have ETL jobs that folks forgot existed, yet they capture a large
    share of my costs? This level of cost attribution is not possible
    to do today unless you start performing query tagging and build
    tools on top of Snowflake. You only know the total cost of a
    warehouse, which requires you to split workloads into multiple
    warehouses to get a more granular attribution.
  * Have the ability to specify per user/role timeouts or
    constraints. Today this is still limited on a per-warehouse
    basis, limiting your ability to control costs, since it forces a
    scale out/up pattern.
  * Have tight integration with pagerduty/slack/email alerting users
    that they are being over-budget or allowing users to understand
    that their query patterns are not sustainable.

I think Snowflake will likely add better observability or it will be
created through custom tooling, nonetheless today those features
don't exist and many companies find that Snowflake bills are
unexpectedly large.

What companies that use Snowflake could do better?

Snowflake, like any SaaS or IaaS, is expensive often because it's a
tool that gets misused. Although I think Snowflake has a share of the
blame for performance scalability, some blame exists with the
consumer.

Not negotiating

We have 5-6 very good open-source data warehouse alternatives. We
have Redshift, DataBricks, Firebolt, BigQuery, and likely a few other
enterprise offerings, yet it is surprising how little training most
companies have in negotiating and re-negotiating vendor contracts or
in pushing for heavily discounted pricing. You should not pay the
same rate as your usage increases.

Lack of education

On the topic of misuse, if you have the right monitoring in place on
your infrastructure, you'll generally find an interesting pattern
where 5% of your internal users drive 95% of your costs. Let me give
you some examples:

  * Almost every person I know learned SQL on the job and made their
    fair share of mistakes, including writing queries with
    cross-joins, not filtering on the right predicates, selecting all
    columns, or pulling massive amounts of data into R and Python
    because they prefer those languages. Only through a trial-by-fire
    process do users learn how to work more efficiently. Nonetheless,
    today I would probably pay $25 and ask users to take the online
    SQL classes from DataCamp for a few days and just learn the craft
    efficiently and more scalably.
  * When my previous company used Looker, I noticed that some folks
    would just keep adding dimensions and measures on the order of
    100+ without reflecting on the number of joins that happened in
    the background. Then they would run a query that would obviously
    reach our internal time-outs. This led us to change policies for
    how you use Looker where creators of dashboards had to belong to
    user groups that knew SQL. The BI tool is not a replacement for
    the lack of education unless you're a passive consumer of
    dashboards.

Generally speaking, finding that 5% of users and simply letting them
know that what they are doing is not sustainable can make a
meaningful impact on costs. For example, you could set up a slack bot
that periodically alerts users that they consume a lot of resources
or simply provides teams with slack reports on cost attribution. Or
if you have a large organization, you could dedicate a team of 2-3
people to optimize the overall infrastructure costs using the
following Cloud Finops framework.

Should you still use Snowflake?

If you're a small company with a budget of less than $500k/year for
warehouse costs, I think you don't have a choice but to use an
enterprise tool. The number of features a tool like Snowflake offers,
the usability, and the opportunity cost of not using data to drive
better product development push you into a corner where you have to
go with a buy decision. If your yearly costs are above $500k mark,
it's useful to consider the benefits of an off-ramp.

I expect the ecosystem to change over the next five years, where both
big and small companies will go with open-source. We have DucksDB,
Clickhouse, Pinot, Trino, Dremio, Druid, Iceberg, Doris, Starrocks,
Delta Lake and likely a few other open-source alternatives that are
used in large and small companies. The challenge is that some of
these tools have not yet reached a point of high popularity or ease
of developer experience. As things consolidate and optionality gets
narrowed down, as more data engineers become familiar with a core set
of tools, it will be a lot cheaper to go with open-source. In the
meantime, for Snowflake to be relevant, they need to ensure that its
performance engineering team is completely independent of teams that
are responsible for revenue drivers, so it can truly reinvent itself.

--

--

8

More from Dev Genius

Coding, Tutorials, News, UX, UI and much more related to development

Read more from Dev Genius

Recommended from Medium

 
Anuradha Mudalige
 

Anuradha Mudalige

 
Guide to version controlling your Android project (Shared/Distinct
codebases)

 
[1]
 
David Anaya
 

David Anaya

in

 

Koa Health

 
Unit testing in Flutter: bloc pattern

 
[1]
 
Peter Shi
 

Peter Shi

 
Can cloud IT engineers be expected to remember to switch off
resources when they're done?

 
[1]
 
Adam Bogdal
 

Adam Bogdal

in

 

Blog | Mirumee

 
Deployment of SPA/PWA Applications in the Real World

 
[1]
 
Pirate X Pirate | PXP
 

Pirate X Pirate | PXP

 
Whitelist Registration for Pre-sale: Genesis Ship: Flying Dutchman

 
[1]
 
Daniel Anggrianto
 

Daniel Anggrianto

 
Running Automation Tests on AWS Device Farm using Appium and TestNG

 
[1]
 
_Aid();
 

_Aid();

 
Here's How You Sort That Heroku App With A Go-Daddy Domain

 
[1]
 
Vinayak Pandey
 

Vinayak Pandey

in

 

AVM Consulting Blog

 
Grant Root/Non-Root Access To Users For Session Manger

 
[1]
 

AboutHelpTermsPrivacy

---------------------------------------------------------------------

Get the Medium app

A button that says 'Download on the App Store', and if clicked it
will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will
lead you to the Google Play store
Get started
[                    ]
 
Stas Sajin

Stas Sajin

169 Followers

Data Scientist

Follow
 

More from Medium

 
Kevin Hu, PhD
 

Kevin Hu, PhD

in

 

Towards Data Science

 
Concepts and practices to ensure data quality

 
Climbers on top of mountain.
 
Saikat Dutta
 

Saikat Dutta

in

 

CodeX

 
7 Best Practices for Data Ingestion

 
[1]
 
Patrik Braborec
 

Patrik Braborec

in

 

GoodData Developers

 
How To Build a Modern Data Pipeline

 
[1]
 
Amit Singh Rathore
 

Amit Singh Rathore

in

 

Geek Culture

 
Data Observability & discovery platform-- OpenMetadata

 
[1]
 

Help

 

Status

 

Writers

 

Blog

 

Careers

 

Privacy

 

Terms

 

About

 

Knowable