[HN Gopher] Why is Snowflake so expensive
       ___________________________________________________________________
        
       Why is Snowflake so expensive
        
       Author : eyeball
       Score  : 302 points
       Date   : 2022-08-22 13:41 UTC (9 hours ago)
        
 (HTM) web link (blog.devgenius.io)
 (TXT) w3m dump (blog.devgenius.io)
        
       | manassolanki wrote:
       | Snowflake is expensive if not monitored properly, on top of that
       | they provide minimal observability. There are some good features
       | like auto suspend and auto resume for cost savings but still
       | there is scope of optimisations. For ex, they will charge you for
       | minimum 1 minutes even if your query is running only for 2
       | seconds.
        
       | wsostt wrote:
       | Snowflake is so expensive that Capital One has developed a
       | toolkit for managing your instance.
       | 
       | https://www.capitalone.com/software/solutions/
        
         | marymac wrote:
         | I'd love to talk to someone who has tried this out - I think
         | it's called Slingshot
        
       | glenjamin wrote:
       | I don't know if this is the case at Snowflake, but there are
       | similar seemingly misaligned incentives with CircleCI's build-
       | seconds-based pricing model.
       | 
       | However, the generally accepted wisdom there was that improving
       | performance had always led to more builds being run - and so
       | still come out as a net-positive. This had happened a bunch of
       | times as we upgraded CPUs or storage drivers or the version -
       | there'd be a short term drop in direct revenue, but then it would
       | bounce back quickly as people took advantage of being able to do
       | more stuff in the same amount of time.
       | 
       | I'm told the revenue and finance people were pretty concerned the
       | first time it happened though!
        
         | idoh wrote:
         | I work at Circle, but not on this specifically, and echo the
         | same experience. This (https://en.wikipedia.org/wiki/Khazzoom-
         | Brookes_postulate) was cited last week in a meeting that I was
         | in, for example.
        
         | morelisp wrote:
         | I would guess that this is less likely to be true of Snowflake
         | than CircleCI.
         | 
         | Most dev teams are underinvested in CI. That is, if you queried
         | some random team, they'd probably have a dozen ideas for tests
         | or processes they'd like to write/run if they had the
         | resources, most of which would provide some real value - the
         | ideas likely coming from some previous actual bugs that hit
         | prod.
         | 
         | Most BI teams are _overinvested_ in data. They have way more
         | than is valuable. Large scale analysis is mostly exploratory
         | and speculative, and rarely yields results. Any induced usage
         | is more from fear they might throw away the magic bits than
         | real value being unlocked by better efficiency. (And I think
         | this is probably necessarily true. Any BI process that gets to
         | the point the data is clear and regularly actionable also gets
         | operationalized and right-sized through a more normal dev
         | process.)
        
       | epberry wrote:
       | > Not providing observability to monitor and reduce costs
       | 
       | Vantage just launched this - https://www.vantage.sh/blog/vantage-
       | launches-snowflake-suppo.... The problems the author describes
       | are almost exactly what we heard from customers:
       | 
       | - list of users/queries that are the most expensive
       | 
       | - alerts and notifications for costs
       | 
       | - query timeout. Not something a third party can do but there is
       | an interesting 'query tagging' feature for snowflake which
       | Vantage supports.
        
       | mejakethomas wrote:
       | It's not expensive.
       | 
       | What it can do, successfully, with three engineers was previously
       | impossible with dozens.
       | 
       | What IS expensive is not being careful with it.
        
         | marymac wrote:
         | THIS. Apply the correct guardrails and learn to optimize.
        
       | carlineng wrote:
       | [Disclaimer: former Snowflake employee]
       | 
       | Snowflake is not expensive because of perverse incentives, which
       | is the primary claim of the article. It is expensive because it
       | is a highly differentiated and very sticky product.
       | 
       | As others have mentioned, competition is the ultimate incentive
       | to work on performance. Every dollar of Snowflake revenue is a
       | dollar of revenue that Amazon, Google, Microsoft and Databricks
       | are fighting for.
        
         | mejakethomas wrote:
         | This, 100%.
         | 
         | It eats/consolidates formerly-disparate costs around the org.
         | Because it's so good.
         | 
         | Which makes it look expensive.
        
         | discodave wrote:
         | > Every dollar of Snowflake revenue is a dollar of revenue that
         | Amazon, Google, Microsoft and Databricks are fighting for.
         | 
         | This is true, but misses one detail...
         | 
         | Snowflake _runs in the cloud_ so every dollar of Snowflake
         | revenue is roughly $0.40^1 of Amazon /Google/Microsoft revenue
         | anyway.
         | 
         | ^1: Snowflakes gross margin is in the range of 50-60%
         | https://www.macrotrends.net/stocks/charts/SNOW/snowflake/gro...
        
         | klysm wrote:
         | They aren't exclusive. They also have perverse incentives to
         | leave optimization gremlins in, even if they are very low
         | hanging fruit to remove. They also have the incentive to not
         | document them well.
        
           | daniel-cussen wrote:
           | Oh like injecting jitter so there's no consistency in
           | measurement?
        
       | teej wrote:
       | From what I can tell, the author is incorrect about the example
       | given in "Optimizer gremlins". I tested an example on my own data
       | and micro-partition pruning was active.
       | 
       | The issue with dbt models in Snowflake is that if you ever
       | perform a full-refresh and don't sort it, you ruin any natural
       | clustering that arises from an incremental model. I've run into
       | this issue many times. Auto-clustering gets too expensive at
       | scale and Snowflake doesn't give you much guidance on
       | alternatives.
        
       | toto444 wrote:
       | The competition is tough in the data warehousing industry, if
       | Snowflake is expensive people will know. Current customers may
       | not leave but it's going to be harder for them to get new
       | customers.
        
         | KingOfCoders wrote:
         | Everyone seems expensive (Looker seems to be the most
         | expensive), and vendors are hard to compare. When evaluting
         | some of them for a migration project, they would not let us run
         | performance tests with our data to compare them and make a
         | decision (paid).
        
       | buremba wrote:
       | I believe they need to focus on the performance at least nowadays
       | because both Databricks & BigQuery are also great products and
       | they push Snowflake in terms of feature-parity and performance.
       | 
       | That being said, Snowflake is also pushing for the marketplace
       | model where you publish your app natively to move your code where
       | the customers environment is. If they become successful, the
       | performance might not be the one of the incentives for the
       | companies to go with Snowflake and the switching cost might be
       | higher as companies will move more of their business logic
       | embedded in the system.
        
       | jmacd wrote:
       | Retrospectively, this is very similar to how most SaaS behaved
       | when per user per month billing was first introduced. There were
       | almost never any actual limits on the number of users you could
       | add to the software, but you purchased a license for a certain
       | number. Occasionally your account would be audited and you would
       | be billed of the overage. It was always a significant penalty.
       | The same was true for CPU based licenses for things like IIS, SQL
       | Service, Oracle, etc.
        
       | 0xbadcafebee wrote:
       | > Snowflake has no incentive to push a code change that makes
       | things 20% faster because that can correspond to 10-20% drop in
       | short-term revenue.
       | 
       | If they improve performance they can lower the cost to customers,
       | which will make the product more attractive to prospective
       | customers. But if they are already swimming in cash they may not
       | feel the need to gain more customers.
       | 
       | Only threats prompt companies to improve things. Threat of a
       | competitor, threat of losing all their money, threat of bad PR,
       | threat of regulation, threat to the stock price, etc.
       | 
       | I see this every day in companies that don't care about managing
       | their cloud costs. They waste money like crazy because they
       | literally don't care if they lose money, because some exec
       | doesn't care, or they got enough funding until the next round,
       | etc. A couple years later another exec asks why the CISO/CTO is
       | spending so much money without any ROI, and then everybody has to
       | stop everything they're doing to shave pennies off cloud costs.
       | 
       | Companies run by individual executives are insane. I don't
       | understand why people allow companies to be run this way. I think
       | a co-op where employees could be active participants in the
       | running of the company would allow for more sane decision-making.
        
       | cedricd wrote:
       | I'm glad the author also points out how customer (mis)use can
       | blow up data warehouse costs too. No matter how efficient
       | Snowflake could get, using the warehouse too much or with
       | unnecessary queries will ultimately have a larger impact.
       | 
       | The trend in the data space currently is for usage to increase --
       | as more companies adopt dbt they're running more and more
       | prebuilt (materialized views) queries on a scheduled basis,
       | rather than on demand. This is overall a good thing in that data
       | is becoming easier to manage and use, but it does come at an
       | increase in warehousing costs.
       | 
       | I think eventually the pendulum will swing back to tools that
       | help optimize warehouse usage, as long as they allow for the same
       | increase in productivity as dbt (disclosure - I work for one such
       | company)
        
       | alberth wrote:
       | This is all much simpler than the post makes it sound.
       | 
       | It's usage-based pricing and customers are using more of it.
       | 
       | > a customer that joins a year ago and spends $1 is paying out
       | well over $1.7 a year later
       | 
       | The entire article is based on this 1.7x "net dollar expansion"
       | statement.
       | 
       | After integrating Snowflake, customers have found value in using
       | Snowflake and are using _more of it_ 1 year later.
       | 
       | Since Snowflake is billed on usage, that explains the net-dollar
       | expansion.
        
       | imwillofficial wrote:
       | It's easy to point out ways leaving in foot guns look predatory.
       | But that's not always the case.
       | 
       | I work for AWS in billing, and the way we calculate bills is to
       | try to et the customer the maximum discount.
       | 
       | Things like calculating savings plan coverage from smallest to
       | largest to maximize utilization, or turning on Reserved Instance
       | sharing on by default within an org.
       | 
       | I would say that the seemingly gouging behavior is more often
       | than not technical or time constraints.
        
       | msluyter wrote:
       | Some of these complaints seem fair to me, some not as much. tl;dr
       | -- Snowflake requires a fair bit of knowledge/effort to use
       | optimally.
       | 
       | I spent a number of months last year focused on lowering
       | Snowflake spend. In the process I learned a ton about Snowflake
       | and gained a fair amount of respect for the product. Respect as
       | in "this is really great" as well as respect as in "I need to be
       | on guard here or I'm going to get hurt."
       | 
       | I think my biggest misconception at the outset was thinking of
       | Snowflake like it's a relational database. It's not. Or rather,
       | it is with a large number of caveats. Snowflake doesn't have
       | b-tree indexes -- rather it has "clustering keys," which are sort
       | of like coarse grained indexes that colocate data in
       | micropartions, allowing queries to do micropartition pruning. If
       | you have a well clustered table and you're filtering on your
       | clustering keys, things will be great. But if not, or, for
       | example you have to do multi-table joins on non-clustered
       | columns, you'll suffer. So unless you have search optimization
       | enabled (which costs more!), you have to retrain yourself away
       | from "oh, just add an index here or there to make things fast"
       | type of thinking you may have had working with Postgres or
       | whatnot.
       | 
       | Regarding the author's complaints about lack of observability, I
       | generally found it pretty easy to analyze what was going on via
       | the query_history table. And the built in query analyzer is quite
       | helpful. We did add tags to our dbt runs, which was pretty easy,
       | and I wrote a handful of queries to find like the most expensive
       | dbt models. It wasn't really that hard.
       | 
       | That said, dbt in particular provides a number of foot guns wrt
       | Snowflake. Subqueries, as the author mentions, is one. We created
       | some custom dbt macros to do things like instead of `select *
       | from foo where x in (select * from blah)` -- if blah was small --
       | do a query on blah and write the query using a literal list, like
       | `select * from foo where x in ('a', 'b', 'c', 'etc...').
       | 
       | Another issue we discovered is that in dbt it's trivial to create
       | views. But we found that if views get too deeply nested,
       | Snowflake can't adequately do predicate pushdown. So big stacks
       | of views on views are suboptimal.
       | 
       | Another interesting one was tests. Dbt makes it trivial to
       | perform null or uniqueness checks against a column. We found we
       | were spending a lot on those tests that simply were doing
       | something like `select * from blah where col is null`. On non-
       | cluster key columns or complex views, these were causing full
       | table scans. We took a number of steps to mitigate those issues.
       | (Combining queries; changing where we did these checks in the
       | dag). The way tests are scheduled is problematic as well. One
       | "long pole" test will keep your warehouse up and using credits
       | even after the other 99.9% of the tests have completed. After
       | some analysis we separated long pole tests from the others and
       | put them on different warehouses.
       | 
       | I could go on and on, actually, but I think that provides a taste
       | of some of the complexities involved. Like almost any tool, you
       | have to really understand it to use it effectively. But it's all
       | too easy for, say, analysts, who may be blissfully unaware of the
       | issues above, to write really poorly performing SQL on Snowflake.
        
       | NonNefarious wrote:
        
         | forbiddenlake wrote:
         | It makes perfect sense if you know that Snowflake is a
         | product/company. It just needs to be capitalized (and the
         | trailing question mark restored).
        
           | wink wrote:
           | If only there was the possibility to link the first
           | occurrence of a word to an external URL on a website.
        
             | NonNefarious wrote:
             | Or add a descriptive phrase to a headline. Heaven forbid.
        
           | NonNefarious wrote:
           | What an asinine excuse. "It makes sense if it makes sense to
           | you." And "snowflake" wasn't capitalized, so it wasn't a
           | proper name. And even if it were (as it is now, having been
           | fixed after I posted the above complaint), it would be just
           | another douchily obscure headline on HN. If you're too lazy
           | to say WTF you're talking about in a headline, don't burst
           | into tears when you're called out on it. And, oh man, you're
           | not even the OP... even more pathetic.
           | 
           | It's depressing to see insecure infants infecting HN with
           | Reddit-style tantrums just because somebody said something
           | mildly critical. If you're too gutless to demand better, at
           | least STFU when others do.
        
       | benjaminwootton wrote:
       | The monthly bill does make me wince, but Snowflake of course
       | includes all server and compute costs, no installation, initial
       | configuration or upgrades etc. It's genuine SaaS.
       | 
       | It's also very simple to manage and optimise so less DBA or
       | DevOps type manpower.
       | 
       | Then of course you can perfectly right size your instances and
       | pay by the second for compute and by the byte for storage.
       | 
       | Expensive, but lower TCO than alternate approaches I suspect.
        
         | Keyframe wrote:
         | _It's also very simple to manage and optimise so less DBA or
         | DevOps type manpower.
         | 
         | Then of course you can perfectly right size your instances and
         | pay by the second for compute and by the byte for storage._
         | 
         | These two are connected vessels.
        
         | jeffwask wrote:
         | Yeah...100%. It's expensive til you try running a data
         | warehouse yourself and have to hire in to support it.
         | 
         | Like any other service there are scale points where it no
         | longer makes sense but for most smaller orgs it's still a
         | bargain over DIY
        
           | nojito wrote:
           | We did a cost analysis and found databricks and BQ to be
           | cheaper than a similar snowflake build out.
           | 
           | I think people are falling into a trap of not considering
           | costs because "it takes care of everything".
        
             | dominotw wrote:
             | > cost analysis and found databricks and BQ to be cheaper
             | than a similar snowflake build out.
             | 
             | Wouldn't this mean snowflake has priced their product not
             | competitively. Why would they do that if its so obvious
             | that everyone would just save money from switching to DB.
             | 
             | > I think people are falling into a trap
             | 
             | This is their product strategy? to take advantage of
             | gullible businesses falling into their trap.
             | 
             | Surely building a whole business around customers falling
             | into trap has to backfire at some point.
        
       | georgewfraser wrote:
       | The core claim of this article, that Snowflake doesn't implement
       | optimizations that would reduce usage, is not true. Search
       | optimized tables, partitioned tables, and per-second billing are
       | all counterexamples.
        
       | [deleted]
        
       | [deleted]
        
       | darksaints wrote:
       | > We have 5-6 very good open-source data warehouse alternatives.
       | We have Redshift, DataBricks, Firebolt, BigQuery, and likely a
       | few other enterprise offerings, yet it is surprising how little
       | training most companies have in negotiating and re-negotiating
       | vendor contracts or in pushing for heavily discounted pricing.
       | 
       | Small nit: Redshift isn't open source. I would also add
       | Clickhouse, Citus, and TimescaleDB as majorly capable open source
       | technologies with commercial offerings in this space.
        
       | falcolas wrote:
       | Snowflake is a bit generic to easily find - and the article has
       | no hyperlinks - anybody have a one sentence summary?
       | 
       | EDIT: There it is: https://www.snowflake.com/
       | 
       | Data warehousing, basically.
        
         | thesandlord wrote:
         | It's a data warehouse, like Google BigQuery or AWS Redshift /
         | Athena
        
       | rsweeney21 wrote:
       | This is a great example of misaligned incentives.
       | 
       | Another example of misaligned incentives is LinkedIn. LinkedIn
       | charges $3/message. The more messages sent on their platform, the
       | more money they make. They are not incentivized to help sales or
       | recruiters target the right people. It can be a cash cow in the
       | short term, but it creates a negative experience for your users.
       | 
       | The fact that it has worked for so long is a testament to how
       | strong network effects are.
       | 
       | In the case of Snowflake, high switching costs will protect them
       | for a while.
        
       | twawaaay wrote:
       | Snowflake is not expensive. Snowflake is super cheap, _IF_ you
       | know what it is for and how to use it. Compared to if you had to
       | solve the problem on your own.
       | 
       | The best way to describe Snowflake is that it is a brute force
       | method to run complex queries without creating indexes.
       | 
       | If you have a more traditional database, you will notice you need
       | to set up indexes to be able to get anything from it in finite
       | time. What if you don't know the indexes upfront? What if you
       | want your users to be able to ask arbitrary queries and get
       | answers before bedtime?
       | 
       | That's what Snowflake is for. It automates using _ENORMOUS_
       | amount of hardware to get your query executed fast, very
       | inefficiently.
       | 
       | It is not for free though. That inefficiency will cause a lot of
       | resources used for queries. It is meant for those few queries
       | when your users try to get some insight into your data and you
       | can't predict indexes beforehand. Sometimes this is exactly what
       | you want, like when you let your data people in to figure stuff
       | out. Or when you have very rare functionality that allows the
       | user to build their own queries -- which you should avoid like
       | hell (and there are tricks to make it index pretty well) but
       | can't always avoid.
       | 
       | For everything else, whenever you can predict your indexes, you
       | always want to use more traditional database that can be very
       | efficient on queries properly supported by indexes.
       | 
       | The issue is a lot of people try to use Snowflake as a database
       | or to support frequently executing queries of the same kind. This
       | is bad and it will cost you.
        
         | danielmarkbruce wrote:
         | Materialized views help with this. It might not be perfect, but
         | it isn't as bad as you say.
        
         | ssalka wrote:
         | > The issue is a lot of people try to use Snowflake as a
         | database or to support frequently executing queries of the same
         | kind. This is bad and it will cost you.
         | 
         | It seems totally natural to expect these use cases to be well-
         | supported & cost-efficient. That they're not I think is likely
         | to be misunderstood by a great many people, even technical
         | folks.
        
         | JustLurking2022 wrote:
         | Honestly, in the financial world, I think the value proposition
         | may be less about anything to do with the query capabilities
         | and more about the permissions model. Making it simple to
         | provide clients with visibility into their data in a structured
         | way that doesn't involve shoveling around text files (with
         | numerous formatting gremlins to worry about) is a huge win in
         | and of itself.
        
         | zurfer wrote:
         | It is fair to critize that some workloads on Snowflake are
         | expensive.
         | 
         | What I found however is that Snowflake is indeed super cheap if
         | we look at Total Cost of Ownership (TCO). Compared with other
         | cloud data warehouses it is even easy for to cost control
         | (warehouse size with autosuspend and resource monitors).
         | 
         | I work with many Snowflake customers and the biggest cost they
         | are concerned with is usually training users so they don't
         | shoot themselves (wrong joins, external programs "pinging" the
         | service, ...).
         | 
         | Snowflake is mainly expensive because of usage, not because of
         | bad query optimization.
         | 
         | (Co-Founder at https://www.sled.so/)
        
       | awinder wrote:
       | I think the main metric that this is built on may be too coarse
       | to derive the meaning that the article is. There's conjecture
       | that what's driving this is more querying over the same dataset
       | (more streamlit dashboards) but it could just as easily be
       | expanding usage inside of companies. That's what's going on at my
       | company right now, more teams using snowflake, more data being
       | pushed in to replace existing workflows, etc.
       | 
       | I'm also not sure I understand the dig at streamlit dashboards.
       | If you're running hardware and introduce new read workflows,
       | eventually you'll need more read replicas and you'll pay more for
       | it. Maybe you can argue that snowflake is doing this at a higher
       | cost but the metric data is not available in the sources to make
       | that claim.
        
       | pykello wrote:
       | (I am not affiliated with Keebo, although I had a recruiting
       | meeting with them earlier this year)
       | 
       | FWIW, Keebo (https://keebo.ai/) tries to solve this problem &
       | reduce your Snowflake bill by using Data Learning techniques. It
       | can be configured to return exact results or approximate results.
        
         | not-my-account wrote:
         | It is always interesting seeing companies building up on the
         | products / services of other companies. Kinda like TurboTax
         | built on the IRS, these "children" (is there a better term?)
         | companies are quite dependent on the "parent" company not
         | changing or improving its product / service.
         | 
         | I don't see AWS changing so dramatically that companies like
         | DataBricks are put in hot water (but I could be wrong), but I
         | could see Snowflake improving its product due to competition,
         | putting Keebo in a tough situation.
        
           | morelisp wrote:
           | By the time I reached this comment I counted no fewer than
           | five completely separate links to offerings to help reduce
           | your Snowflake bill. For something that is already a focused
           | SaaS product, I have to say that starts to smell a bit.
        
       | hobs wrote:
       | I am like 95% sure that the MAX issue he mentions is wrong - I
       | just modified some windowing function based approaches to the one
       | he mentions and its several OOM faster because of partition
       | elimination.
       | 
       | Nonetheless I agree with the basic points of the article.
        
       | jwie wrote:
       | You would think they would be saving (and charging the customer!)
       | a bundle not enforcing constraints on their tables.
       | 
       | I'd be very interested to hear the Snowflake side of this
       | decision, but to the customer it's simply unforgivable to have
       | cosmetic constraints on a database.
        
         | dominotw wrote:
         | Because snowflake doesn't build foreign key indexes. Imagine
         | clickstream data where every insert is being checked against an
         | index of customers. This isn't a typical usecase for big data
         | warehouses.
        
           | jwie wrote:
           | I understand that. But why have constraints that don't do
           | anything?
        
             | atwebb wrote:
             | Metadata
             | 
             | Tools and scripts can work off of it, design decisions are
             | documented, suggestions can be made, inferences can be made
             | (some dangerous, some not).
             | 
             | Why tag S3 objects if it doesn't enforce a schema? Maybe a
             | bad analogue but I'm going quick right now :).
        
             | evtx wrote:
             | There are plenty of reasons why MPP databases allow the
             | definition of constraints but don't enforce them. I'll list
             | two: 1) BI tools can use them to optimize joins 2) Data
             | modeling tools can use them to reverse engineers models
             | without having to pattern match the keys.
             | 
             | That said, Snowflake does support constraints if you use
             | hybrid tables (a preview feature announced at their last
             | conference).
        
         | veeti wrote:
         | Do you really need functional constraints in a OLAP database?
         | Surely such validations already exist wherever your data is
         | coming from.
        
           | Foobar8568 wrote:
           | Ohohoh yeah sure, you mean application based constraints? Or
           | an Entity-attribute-value base application ? What about
           | documents?
        
         | marcinzm wrote:
         | Do you have any data on the pricing of distributed databases
         | that do support proper foreign key constraints? And how it
         | stacks against Snowflake pricing?
        
       | flyinglizard wrote:
       | Where does all this data go? It's processed and then what? Sent
       | to decision makers? Used to run automated processes?
       | 
       | I'm genuinely curious and would appreciate anyone who could show
       | a real life example of this kind of pipeline where data is
       | accumulated, then processed, then turned into revenue at the
       | other end.
       | 
       | I've implemented systems that do this but my experience is that
       | accumulating data is (too) easy, processing it in a meaningful
       | way is slightly more challenging but ultimately driving positive
       | business processes according to this data, which require a lot of
       | friction with employees (training, procedures, maintenance,
       | support) is the most difficult part.
        
         | lysecret wrote:
         | Same experience. I think the most interesting and most public
         | example of such a pipeline is Google/ building a search index.
         | This is also where a lot of the methods originally came from.
         | Nowadays a lot of this will be used to build recommendation
         | systems / feature pipelines for ML.
        
           | frankbinette wrote:
           | These are a bit too advanced examples. Think of simple
           | descriptive statistics which is still so important yet not
           | sexy as ML/DL/AI. ML is great, but the main usage behind
           | these data technologies is still simple business
           | intelligence.
           | 
           | Every business in every market need to understand what is
           | going on with their processes. How many sales did I do
           | yesterday, last week, last month, compared to last year, in
           | which stores, what is the average basket amount, customers
           | buy what with what, what size t-shirt do I sell the most,
           | etc.
        
         | frankbinette wrote:
         | Seems like you kind of answered your own question... this data
         | is used for business intelligence purposes.
        
       | jjfoooo4 wrote:
       | This is a kind of poor engineering writing in which the author
       | finds a product to not be tailored to his precise tastes and
       | concludes it is because the company is user hostile and/or
       | doomed.
       | 
       | The bit about Snowflake not being incentivized to care about
       | costs are trivially untrue. The rest of the article perceives
       | trade offs as simple feature gaps.
       | 
       | For example, Snowflake gives the user more latitude to distribute
       | workloads among "warehouses" than other offerings. With poor
       | distribution the author will experience the workload provisioning
       | issues he describes.
        
       | beoberha wrote:
       | I disagree with the assertion that Snowflake has no incentive to
       | improve performance. While I don't work for Snowflake, I work for
       | a competitor and we're constantly looking to improve performance
       | to make customers happy.
       | 
       | For the exact reason that the article claims Snowflake wouldn't
       | innovate, I'd assert that they would. If they are expensive and
       | slow, and a competitor is faster and cheaper, eventually they
       | will see business move to the competitor. We see it all the time.
        
         | wpietri wrote:
         | Could you say more about the relative market position of your
         | two companies?
         | 
         | I don't know the market at all, but Snowflake is certainly
         | large and successful (IPOed in 2020, $50bn market cap). I could
         | readily imagine that a company doing so well might not feel the
         | incentive to improve very strongly. Or that they might see
         | themselves more as a sales/marketing-led company than one where
         | technical quality is a key driver. Whereas you folks as a
         | challenger would have a lot more incentive to differentiate
         | yourselves.
        
           | beoberha wrote:
           | You could probably google my username and find out, but I'll
           | say we're bigger than Snowflake and are very much entrenched
           | in the enterprise database market :)
        
         | PaulWaldman wrote:
         | Chrun for these services take a long time. They are "sticky"
         | and have the baggage of enterprise agreements. With the
         | switching costs never being zero, if SLAs are being met, it's
         | exceedingly difficult to switch vendors.
         | 
         | Alternatively there is a faster impact on new sign-ups when
         | falling behind competitors on costs and benchmarks.
        
           | tomnipotent wrote:
           | > have the baggage of enterprise agreements
           | 
           | Snowflake let's you roll into pay-as-you-go after a contract
           | expires.
        
           | danielmarkbruce wrote:
           | They are all out to get new logos. They spent about $800m on
           | S&M TTM v $1.4 bill rev. They aren't milking their customer
           | base for cashflow.
           | 
           | And large customers are moving to them in droves.
        
           | dominotw wrote:
           | Their stock price is pegged at new customer acquisition. They
           | signed up over 6k new customers last qtr. This is one of
           | their top stats that they present to investors.
        
           | beoberha wrote:
           | I worded it poorly, but I don't necessarily mean a full
           | exodus from the platform. In my experience, large enterprises
           | have a lot of workloads running on different technologies
           | (for whatever reasons) and the migration to cloud is a multi-
           | year effort. If someone is just dipping their toe into
           | Snowflake with easy-to-migrate workloads (which is very
           | likely given their relative age in the market) and see
           | performance and cost issues with those workloads, they may be
           | hesitant to migrate the bigger ones and use that as leverage
           | to get Snowflake to improve.
        
           | cs702 wrote:
           | Exactly. For enterprise customers in particular, replacing a
           | SaaS tool that's deeply intertwined with many internal
           | systems is about as easy and convenient as it is for a
           | homeowner to rip out his/her home's existing HVAC system to
           | replace it with a newer, more efficient one. No one ever
           | wants to do _that_ -- unless there 's absolutely no other
           | choice.
        
       | cs702 wrote:
       | Great article. On the surface, it's about Snowflake. At a deeper
       | level, the article is about the perverse incentives motivating
       | SaaS businesses to do seemingly dumb, inefficient things and
       | avoid seemingly obvious optimizations by default.
       | 
       | Many SaaS businesses are perfectly happy to let customers shoot
       | themselves in the foot if it generates more revenue. The BigQuery
       | example (presently, by default, `select * from table limit 10`
       | obediently scans the entire table at _your_ expense!) is spot-on.
       | 
       | As the article so well puts it, every SaaS company has a vested
       | financial interest "to leave optimization gremlins in."
        
         | kolinko wrote:
         | in case of BigQuery it makes sense though - they use map reduce
         | on distributed clusters, so there is no easy way to stop after
         | 10 results are found
        
           | JimmyAustin wrote:
           | It's pretty easy to limit the number of results returned by
           | each partition to by limited to 10, then have that further
           | reduced to 10 total during the reduce step.
        
         | danielmarkbruce wrote:
         | It's a terrible article. The author misunderstands
         | _competition_ and how much it drives products in this area.
         | Snowflake is incentivized to make their product better on every
         | dimension. If Snowflake don 't improve, customers will leave in
         | droves - _like when they moved to Snowflake_.
         | 
         | In practice, as has been pointed out in other comments, they do
         | improve their performance (for competitive reasons) and it does
         | cost them money when they do it.... They did it a couple qtrs
         | ago and left $97 mill on the table.
         | 
         | https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...
        
           | rurp wrote:
           | There are many degrees of optimization and clearly there's
           | _some_ cost to bad performance, but Snowflake still has a
           | massive perverse incentive to not spend too much effort on
           | improving performance. If Snowflake is like every software
           | company I 've ever been involved with there are many
           | competing projects at any given time and direct revenue
           | impact is a big factor in what gets prioritized.
           | 
           | My own experience with Snowflake absolutely backs up the
           | article's point. At my work we routinely encounter abysmal
           | performance for certain types of queries, due to a flaw on
           | Snowflake's side. We have had numerous talks with them and
           | there is no question that they have an issue, but they have
           | shown absolutely no urgency to fix it. Their recommendation
           | is that _we_ spend more money to work around the problem on
           | their end.
        
             | geoduck14 wrote:
             | >At my work we routinely encounter abysmal performance for
             | certain types of queries, due to a flaw on Snowflake's
             | side.
             | 
             | Do tell! I'm a current Snowflake customer, I'd like to know
             | what to look out for.
        
           | uoaei wrote:
           | I don't think it misunderstands business competition. In fact
           | it understands the concept of competition very well, and
           | develops an insightful critique into the perverse incentives
           | that are borne from competition.
           | 
           | It benefits no one except for a couple thousand people to so
           | blatantly play their customers in this way. In fact, it's
           | worse, as it incentivizes that same behavior of other market
           | actors in the space.
        
             | danielmarkbruce wrote:
             | What exactly in the article suggest the author understands
             | the pressure of competition on incentives?
             | 
             | The author states that Snowflake are not incentivized to
             | increase performance due to short term revenue concerns but
             | doesn't mention they are also incentivized to do the
             | opposite from a competitive perspective. The result is
             | incomplete enough that it ends up being flat wrong with
             | respect to the behavior that the company actually engages
             | in.
             | 
             | The author missed the fact Snowflake did the very thing
             | he/she suggested they were incentivized not to do,
             | recently, at a cost of $97 million. The CEO explained _why_
             | they are doing it and how they are _actually_ incentivized.
             | I don 't know how the article could miss the mark by more
             | than it has. The company literally does the opposite of
             | what he/she suggested.It's not like they are the only one
             | either, AWS has a history of reducing prices. Why? Once
             | again, competition.
        
               | morelisp wrote:
               | > The CEO explained why they are doing it and how they
               | are actually incentivized.
               | 
               | The CEO explained why he thinks it's a good long term
               | plan... but for now, they get money i.e. are _actually
               | incentivized_ by slow code. The CEO 's incentives are
               | theoretical ones.
               | 
               | And the market, which ultimately control whether the CEO
               | gets to continue that plan or not, did not seem to agree
               | it was a good plan.
        
               | danielmarkbruce wrote:
               | By this reasoning, everyone would shirk at work. If you
               | think incentives only act over short time horizons, I
               | don't know how you explain an enormous amount of human
               | behavior.
               | 
               | The market didn't even understand it. Most of the people
               | trading equities, especially around earnings
               | announcements, don't know what a data warehouse is or
               | what matters in that market. All they saw was "miss".
        
               | morelisp wrote:
               | I didn't say the CEO was wrong or that long-term thinking
               | is bad! I said the _actual incentives_ are still
               | misaligned. (I mean, a lot of people _do_ shirk at work,
               | and it even works out well for them.)
               | 
               | I think you have a weird and probably not useful
               | definition of "actual" if "monthly revenue" is not actual
               | but "projected monthly revenue two years from now" is
               | actual. (Or maybe I've just lived in Germany too long.)
        
               | danielmarkbruce wrote:
               | You are right, I've used the word "actual" incorrectly.
               | What I should have said was "net". Ie, both short term
               | and long term revenue incentivize behavior and in this
               | case the net result was increasing performance, ie long
               | term incentive > short term incentive.
        
           | AdamProut wrote:
           | We regularly benchmark the "big 3" Cloud Data warehouses -
           | Redshift, Snowflake and Big Query at SingleStore. Their
           | performance is very close to the same (within 10-20%) on most
           | benchmarks on reasonable sized data sets (10s of TB).
           | 
           | I agree if the performance of one of them fell behind the
           | others for any prolonged period of time the cost to the
           | laggard in market share would be much much worse then short
           | term revenue gain of "being slow on purpose".
        
           | simo7 wrote:
           | The main flaw of the article is not controlling for product
           | category.
           | 
           | I suspect most data warehouses have similar NDRs.
           | 
           | In many companies a data warehouse is the place where you
           | dump all your data and let everyone run poorly written
           | programs against it.
           | 
           | Add to that poor engineering culture in data teams (often
           | lead by non-technical people) and costs are bound to
           | skyrocket.
        
           | didgetmaster wrote:
           | While I think it is definitely in a company's best long term
           | interest to implement features that benefit its customers; it
           | might not be in the best interest of those who are currently
           | running the company.
           | 
           | We have seen many, many examples of executives who are
           | willing to sacrifice the future of the company to get a
           | personal short-term gain. Jack up the revenues (or slash
           | costs) in ways that alienate customers is a great strategy
           | when you plan to jump off with your golden parachute in a
           | couple years when all your stock options vest.
        
             | jjfoooo4 wrote:
             | Sure but to not even mention churn as something Snowflake
             | is worried about is pretty silly. With the funding
             | environment taking a dramatic turn they (and every other
             | SaaS company) are going to be deeply concerned about price
             | competition and churn
        
             | danielmarkbruce wrote:
             | Agreed. But a good article should have shown an example
             | rather than a counter example. Intel might have been a good
             | example. A good article would have shown the competing
             | incentives at play rather than a single incentive.
        
         | thehappypm wrote:
         | I worked at a BiqQuery shop and they have a terrific feature
         | where right next to the "Run query" button there is an estimate
         | of the cost of the query, in bytes. It becomes extremely
         | obvious when a query is a full table scan.
        
         | dcow wrote:
         | I was thinking about this too. Why don't SaaS companies just
         | force price increases to offset their broken pricing model?
         | Nobody would care, you're paying the same you were paying
         | yesterday. If you're still the best in class product with
         | sticky features people will stay. If not and you're competing,
         | then you have the opportunity to reduce the price in the future
         | or simply not increase it and let users see lower bills which
         | might also retain them.
        
         | makk wrote:
         | > As the article so well puts it, every SaaS company has a
         | vested financial interest "to leave optimization gremlins in."
         | 
         | It depends on the time scale. A SaaS optimizing for, say, a 1-3
         | year financial return will see their interests through a
         | different lens than one optimizing for a multi-decade return.
         | Leaving optimization gremlins in isn't aligned with customers'
         | interests in the long run, so the customers will eventually
         | find alternatives if the SaaS doesn't eventually align itself
         | with customers.
        
           | smugma wrote:
           | "As an investor, I expect Snowflake to show amazing
           | profitability and record-breaking revenue numbers. As an
           | Engineer, if Snowflake continues on the current path of
           | ignoring performance, I expect them to lose share to the
           | open-source community or some other competitor, eventually
           | walking down the path of Oracle and Teradata. Here are a few
           | things I think they can do to stay relevant in five years."
        
             | danielmarkbruce wrote:
             | The point is incentives.
        
         | Aulig wrote:
         | It feels like these companies haven't found the right value
         | metric to price along. Ideally it should align with the value
         | the customer receives.
        
           | altdataseller wrote:
           | But that's almost impossible to measure by Snowflake. How
           | would they know how much more revenue you earned because you
           | use Snowflake?
        
             | Rastonbury wrote:
             | I don't think their customers could quantify it if they
             | tried (and i'm not implying Snowflake doesn't give value,
             | it probably does but how does a company attribute it)
        
           | polskibus wrote:
           | Only competition can enforce this. The article ideally
           | demonstrates the problems with monopolies and vendor lock-in.
        
             | danielmarkbruce wrote:
             | Snowflake is nowhere near a monopoly, and plenty of
             | customers have moved from other vendors (Teradata, Netezza,
             | etc) to Snowflake - showing that vendor lock-in is not as
             | strong as it might seem.
        
           | carimura wrote:
           | Close. Product pricing is based on a variety of _perceived_
           | factors (value, cost of change, risk of loss, etc.)
        
         | deepGem wrote:
         | Wow their statement about not participating in benchmarking
         | wars is alarming. In this day and age, when benchmarking tools
         | are so inexpensive and almost everything is very transparent,
         | why not participate.
         | 
         | Or even better engage with a neutral third party such as Jepsen
         | to get on an even playing field and duke it out.
        
           | [deleted]
        
           | danielmarkbruce wrote:
           | Because their value prop isn't being #1 on benchmarks. It's
           | about
           | 
           | * being easy to manage * being able to scale up and down
           | compute so you can get good performance without having to
           | keep a bunch of machines running.
        
           | datavirtue wrote:
           | Because their business is providing a solution that IT failed
           | to. Despite the large cost, which the business was already
           | accustomed to from previous IT attempts, pales in comparison
           | to the additional costs of doing it themselves.
           | 
           | It's like the cloud in general, the cost is high but so is
           | the hype. When all that dust settles over the coming years
           | the business will start shopping on price. They will then
           | realize they have been locked in to some extent and will need
           | to start wriggling loose of the lock-in.
        
           | lokar wrote:
           | Benchmark results rarely predict actual application perf. You
           | need to run your own queries against your own data. Do a real
           | POC.
        
         | twistedpair wrote:
         | FWIW, BigQuery tables can be configured to require a partition
         | filter clause [0] in the SQL query, so that you cannot shoot
         | yourself in the foot like that. Now if they'd just make an
         | Organization Policy to let you turn it on by default for all
         | new tables.
         | 
         | [0] https://cloud.google.com/bigquery/docs/querying-
         | partitioned-...
        
           | cs702 wrote:
           | Yes. That's exactly the OP's point: It's up to _you_ to
           | remember to do the extra work necessary to avoid shooting
           | yourself in the foot by default.
        
         | tluyben2 wrote:
         | Funny that most people here advocate aws while they have tons
         | and tons of foot shooting tools that cost people 1000s of usd
         | all the time. And we just accept it. Like if you want to kill a
         | complex cluster with one api call or button click, it won't let
         | you for xyz; that's not because they cannot, it's because you
         | will just let it be and that makes money.
        
         | soheil wrote:
         | If that lowers the barrier to entry without having expert level
         | knowledge to know what a full table scan even means why not?
         | Instead of hiring a dba maybe you could hire an intern instead
         | and happily eat the cost of Snowflake.
        
           | florbo wrote:
           | It doesn't lower barriers to entry, it's contrary to logical
           | expectations for someone unfamiliar with how BQ works. If the
           | query is limited to 10 results you wouldn't expect it to scan
           | all 2 trillion of your records. Granted there are numerous
           | warnings in the GUI for these types of things but make this
           | mistake in Python and you're none the wiser.
        
             | soheil wrote:
             | Wait are you saying the BQ db engine is not following
             | logical expectations? You do realize a "limit" clause
             | doesn't prevent a full table scan in all cases, right?
        
               | horsawlarway wrote:
               | and that db expert you just recommended against hiring
               | could surely tell you that... The intern won't.
        
           | kalimoxto wrote:
           | I think the point of the article is that an optimizer doesn't
           | affect the barrier to entry at all, but adding it would save
           | end users quite a bit of money. So they don't do it because
           | end users' money is revenue for Snowflake/Alphabet
        
             | soheil wrote:
             | If you could just add an optimizer why doesn't the db
             | engine just do that?
        
               | whimsicalism wrote:
               | Take a step back and reread the article and the comments
               | you are replying to.
        
         | spmurrayzzz wrote:
         | Well said. I'd also add a cynical note that the recurring
         | revenue model is incentivized to keep the gremlins around not
         | just because of the impact to metered costs, but also because
         | off-ramping is that much more difficult once engineers
         | implement workaround/solutions to mitigate the impact of those
         | smells.
         | 
         | Just another way that vendor lock-in occurs (intentionally or
         | otherwise).
        
         | scarface74 wrote:
         | Standard disclaimer: I work at AWS in consulting and could
         | easily be accused of drinking the Kool Aid.
         | 
         | Everyone from consultants, SAs, Sales, support etc is
         | constantly working toward getting customers to "optimize" their
         | spend. Of course any business wants you to give them more
         | money. But, none of us are pushed to get them to spend money on
         | services or methods to do things inefficiently.
         | 
         | I specifically work in consulting specializing in "application
         | modernization". That means most of my implementations are cheap
         | and I'm constantly spending time making sure my implementation
         | is cheap as possible and still meet the requirements. I first
         | noticed this attitude from AWS when I was working for a
         | startup.
         | 
         | This isn't just with AWS. I spent years working in enterprise
         | shops and saw the same attitude working with Microsoft.
         | 
         | I can't speak for any other large organizations - AWS and
         | Microsoft are the only two I've worked with as either a
         | customer or employee where there was huge spending on
         | infrastructure or software.
         | 
         | Now I could easily get started about my opinion of Oracle from
         | the customer standpoint. But I won't.
        
       | benreesman wrote:
       | Alright I'll bite finally. What do these companies do? Neither
       | Snowflake's front-facing website, nor the Wikipedia article, nor
       | this post tell me why people pay all this money.
       | 
       | I know a bit about the effort involved in chucking around 100
       | petabyte datasets, and there are numerous niches a SaaS could
       | fill in there, but it's very murky from the outside.
        
         | Croftengea wrote:
         | I was wondering the same thing. This sums up pretty good I
         | guess:
         | 
         | > The best way to describe Snowflake is that it is a brute
         | force method to run complex queries without creating indexes.
         | 
         | (https://news.ycombinator.com/item?id=32554072)
        
           | benreesman wrote:
           | Column stores on DFS are without a doubt tricky beasts. It's
           | a very rich field technically.
           | 
           | I guess I'm trying to get a read on whether their core
           | competency / moat is distributed columnar query technology or
           | sales/support/marketing.
        
             | colinmhayes wrote:
             | Snowflake is slower and more expensive than competitors.
             | I'd say its moat is mostly that its extremely easy to set
             | up and start using without technical support. If you've
             | just got a small team and no one wants to do data
             | engineering snowflake makes that possible, or at least much
             | easier. Most users are generally happy, and they've
             | followed the cloud playbook of making it hard to switch
             | off, so even when teams have scaled to the level where
             | secondary indexes and data support staff makes sense the
             | team is still happy with snowflake.
        
           | joelthelion wrote:
           | But why not create indexes? I mean, I understand why
           | sometimes you're you don't want an index. But building an
           | entire warehouse around the idea of "no indexes", really ?
        
             | [deleted]
        
             | idunno246 wrote:
             | these tend to be for one-off analytical queries. you want
             | ever user with flag X >10 joined against five other tables
             | each with similar filters. you don't know ahead of time
             | what that query is, your analyst thought of it this
             | morning, so you cant make indices ahead of time. and itll
             | never run again so you don't need to take the performance
             | hit keeping an index. and someone has to decide which
             | indices to keep, but app engineers arent best utilized
             | figuring out indices for analysts.
             | 
             | the indices is nice, but the bigger selling feature for me
             | is if you have many services, and each services data are in
             | the warehouse, you can join against them all together.
        
             | buttaphingas wrote:
             | It's all around the ethos of ease of use. Snowflake does a
             | lot of smarts in the background so that you don't have the
             | overhead of managing indexes. And not just indexes, there
             | is just less human intervention required overall compared
             | to something like Teradata or even a modern lakehouse.
             | 
             | That said, they've kind of introduced it with the Search
             | Optimization Service, which is like an index across the
             | whole table for fast lookups, but even that is
             | automatically maintained in your behalf.
        
             | benreesman wrote:
             | My experience with "Big Data" is pretty dated, 5 years at
             | least. At that time I think a good cutoff for "big data"
             | might have been like a petabyte +/- a factor of 10
             | depending on your gear. I imagine now even 1PB is probably
             | pretty mild by "big data" standards.
             | 
             | But once you're up in that "I can't even fit this in an
             | 4-8U sled" territory (whatever it is in a given decade)
             | you're probably doing some kind of map/reduce thing, so
             | there's a strong incentive to have a column-major layout.
             | If you can periodically sort by some important column so
             | much the better (log2 n binary search), but mostly you've
             | got a bunch of mappers (which you work hard to get locality
             | on relative to the DFS replicas where the disks live, maybe
             | on the same machine, maybe in the same top-of-rack switch
             | or whatever) zipping through different columns or column
             | sets and producing eligible conceptual "rows" to go into
             | your "shuffle/sort/reduce" pipeline to deal with joins and
             | sorts and stuff like that.
             | 
             | I don't know how Google does it, but I think most everyone
             | else started with something like the Hadoop ecosystem and
             | many with something like Hive/HQL to give a SQL-like way to
             | express that job, especially for ad-hoc queries (long-
             | lived, rarely changing overnight jobs might get optimized
             | into some lower-level representation).
             | 
             | Around the time I was getting out of that game, Spark was
             | starting to get really big, which was due to some
             | combination of RAM getting really abundant and just kind of
             | a re-think on what was by then a pretty old cost model. I
             | have no idea what people are doing now.
             | 
             | I'd love it if someone with up-to-date knowledge about how
             | this stuff works these days chimed in.
        
         | DebtDeflation wrote:
         | Snowflake is a data warehouse in the cloud. In the past,
         | companies would have spent a fortune on Oracle or Teradata
         | licenses and a fortune on on-prem hardware to run it on. Now
         | they spend it on Snowflake and run it on AWS, etc. Same story
         | as with any SaaS product - cheap and easy to get started, only
         | pay for what you use, but over time the costs........get big.
        
       | mritchie712 wrote:
       | I predict[0] we'll see more people choosing Clickhouse over
       | Snowflake in the next 5 years. Clickhouse will get reasonably
       | feature compatible with Snowflake and give people a better escape
       | hatch if they want to self-host their data stack. Clickhouse, Inc
       | is building a cloud product that abstracts away the complexity
       | and there's already companies like Altinity that will spin up a
       | cluster for you in minutes.
       | 
       | 0 - https://blog.luabase.com/clickhouse-for-data-nerds/
        
         | ramesh31 wrote:
         | Isn't Clickhouse a hosted SQL DBMS? Not really comparable to a
         | cloud data lake.
         | 
         | Snowflake/Databricks scales infinitely across cloud object
         | stores like S3. Clickhouse is run as a single (or sharded)
         | process that uses the local file system like any other SQL
         | database, and requires volume provisioning as your data scales.
         | It also has a fixed run cost (EC2 or wherever it's hosted)
         | versus an "on-demand" model where read clusters are spun up to
         | run queries against static objects that have no fixed cost
         | other than storage pricing.
        
           | morelisp wrote:
           | ClickHouse can access non-local storage without issue (or at
           | least, with only issues for some of them - HDFS and S3 seem
           | to work fine, I've had less luck with real-time Kafka). I'm
           | not sure how well it scales horizontally for such uses; you
           | can hack something up with macros that isn't too painful but
           | there may also be better options.
           | 
           | However, it's probably not a great pick if you're already
           | struggling with the operations side of things, which seems to
           | be the main selling point for services like Snowflake.
        
           | [deleted]
        
           | hodgesrm wrote:
           | ClickHouse only has fixed run cost if you configure it that
           | way. We run ClickHouse clusters in AWS / GCS using block
           | storage in our cloud platform. You can scale VMs up and down
           | vertically in minutes, and scale horizontally in the same
           | amount of time. The model works great for SaaS use cases that
           | require constant response at all times and scale over days or
           | weeks rather than minutes. Real-time analytic apps that show
           | tenant dashboards or generate recommendations for users on
           | ecommerce sites have this characteristic.
           | 
           | I don't think there's really a right or wrong answer here,
           | just trade-offs.
           | 
           | Disclaimer: I work on Altinity.Cloud, a platform for managed
           | ClickHouse
        
           | KingOfCoders wrote:
           | In which way not comparable?
        
             | nycdatasci wrote:
             | From the article: "JOIN's are also not nearly as performant
             | as in other cloud data warehouses." This seems like a
             | pretty significant limitation.
        
               | morelisp wrote:
               | That's... literally comparing them. The comparison for
               | some use cases might not be favorable for ClickHouse, but
               | they're comparable.
               | 
               | (IMO the slowness of ClickHouse joins has been
               | overstated, especially since its many-column table
               | support is so good you'll probably be fine joining on
               | insert instead.)
        
               | mritchie712 wrote:
               | Yes, this is one major hurdle they need to overcome, but
               | I think they'll (Clickhouse Inc + the community) pull it
               | off. It's a current weakness but by no means unsolvable.
        
         | SnowHill9902 wrote:
         | Clickhouse is incredible software. It only feels a little
         | foreign when coming from Postgres (e.g. some CamelCase terms).
        
           | mritchie712 wrote:
           | Yeah, the CamelCase throws me too, especially since it's
           | mixed in with snake_case (e.g. date_trunc[0])
           | 
           | 0 - https://clickhouse.com/docs/en/sql-
           | reference/functions/date-...
        
             | zX41ZdbW wrote:
             | camelCase - native functions
             | 
             | SQL_STYLE_CASE - compatible functions
        
       | kjw wrote:
       | _" Snowflake has no incentive to push a code change that makes
       | things 20% faster because that can correspond to 10-20% drop in
       | short-term revenue. In a typical Innovator's Dilemma, Snowflake
       | prioritizes other things that generate an ever larger menu of
       | compute options, like Snowpark and data apps built on Streamlit,
       | that will bleed your organization dry."_
       | 
       | This is not true. Snowflake has done just that - it has
       | continuously improved performance resulting in reduced credit
       | consumption and revenue from customers on a unit compute/storage
       | basis. And it has negatively impacted their revenues and stock
       | price. Snowflake's incentive is to strengthen their competitive
       | position and to hopefully generate more long-term revenue from
       | their customers.
       | 
       | The CFO forecasted a $97 million dollar short fall when guiding
       | for 2022 revenue resulting from product improvements. Snowflake
       | stock dropped immediately after.
       | 
       | See Q4 transcript -- https://www.fool.com/earnings/call-
       | transcripts/2022/03/02/sn...
       | 
       |  _" Similarly, phased throughout this year, we are rolling out
       | platform improvements within our cloud deployments. No two
       | customers are the same, but our initial testing has shown
       | performance improvements ranging on average from 10% to 20%. We
       | have assumed an approximately $97 million revenue impact in our
       | full-year forecast, but there is still uncertainty around the
       | full impact these improvements can have. While these efforts
       | negatively impact our revenue in the near term, over time, they
       | lead customers to deploy more workloads to Snowflake due to the
       | improved economics."_
       | 
       | Also see the Bloomberg article --
       | https://www.bloomberg.com/news/articles/2022-03-02/snowflake....
       | 
       |  _" Snowflake Inc., a software company that helps businesses
       | organize data in the cloud, dropped the most ever in a single day
       | Thursday after projecting that annual product sales growth would
       | slow from its previous triple-digit-percentage pace.
       | 
       | Executives said improvements to the company's data storage and
       | analysis products will let customers get the same results by
       | spending less, which will hurt revenue in the short term, but
       | attract more clients in the future.
       | 
       | "The full-year impact of that next year is quite significant,"
       | Chief Executive Officer Frank Slootman said on a conference call
       | Wednesday after the results were released. But "when customers
       | see their performance per credit get cheaper, they realize they
       | can do other things cheaper in Snowflake and they move more data
       | into us to run more queries.""_
        
       | stassajin wrote:
       | I'm the author of the article. Didn't expect it to blow up. Let
       | me clarify a few points:
       | 
       | 1. I like Snowflake and I think they brought several innovations
       | to the field: Instant scale out/up, time-travel, unstructured
       | data query support. 2. Snowflake obviously makes innovations and
       | performance improvements, otherwise they would not be the market
       | leader they are. But I'm also suspecting that they make just
       | enough performance improvements to be at par and then use the
       | vendor lock in features to make switching hard.
       | 
       | My argument is that their rate of performance innovation has
       | considerably gone down and DataBricks, Firebolt, and open source
       | alternatives just seem more attractive from a cost/performance
       | ratio. I agree that Snowflake is still the best data-warehouse to
       | start with if you have 100k, but not if you truly plan for a
       | multi-year horizon and your usage expands.
       | 
       | - Redshift also brought a lot of innovation that allowed people
       | to execute analytical queries 100x-1000x faster than any OLTP
       | that existed out there. I've used Redshift for four years and
       | they kept ignoring performance and features until Snowflake came
       | out. All of a sudden because of competitor pressure, they put
       | more effort into the product to maintain and gain market share.
       | My hope is that Snowflake finds a solution to their innovator's
       | dilemma, since competitors are hot on their tails.
       | 
       | - Some people point out that 70% usage growth just shows that
       | Snowflake is useful. Nobody disagrees with that. The issue is
       | that majority of the companies don't experience a 70% revenue
       | growth to catch up with the growth in costs. At some point, you
       | have to clamp down on costs, which means that you have to look
       | for alternatives to run things more efficiently.
        
         | mejakethomas wrote:
         | Totally agree with Redshift sentiments. It's been lovely seeing
         | BigQuery and Redshift step their game up over the past 1.5yrs,
         | because they really should have been doing certain things for
         | many years prior.
         | 
         | Re: Firebolt, I don't consider it to be in the same class as
         | Snowflake whatsoever (even though their advertising seems to
         | indicate otherwise). Snowflake is like a very powerful swiss
         | army knife. Firebolt is good for a very specific (dare I say
         | niche?) workload but falls all over itself for the vast
         | majority of data org needs.
        
       | datadisruptor wrote:
       | [disclaimer: comment written by one of cofounders of iomete - a
       | YC-backed startup - active in the same market as Snowflake]
       | 
       | I think Snowflake is (still) expensive because it is a venture-
       | backed enterprise software company and goes through a typical
       | trajectory...
       | 
       | Story goes like this: founders are product-driven and first
       | movers -> find PMF -> need VC funding -> VCs only fund enterprise
       | software ventures with 70%+ gross margins and high retention
       | rates -> product/service gets priced to achieve these metrics ->
       | VCs happy to fund sales & marketing machine needed to obtain
       | sales growth, nobody cares about profitability until after IPO ->
       | startup is everyone's darling until ~2 years after IPO.
       | 
       | Then: economic crisis hits, customers become more price
       | sensitive, competition intensifies. Plus now management is
       | exposed to quarterly pressure of financial markets to deliver on
       | top-line and margin expectations.
       | 
       | Meanwhile a bunch of startups are building (lower priced)
       | alternatives. Perhaps not as mature or feature-rich as Snowflake,
       | but good enough for 80% of use cases that Snowflake covers.
       | 
       | Therefore the assertion that Snowflake is not optimizing their
       | product sounds a bit crazy to me. It would be optimizing for
       | short-term gain, while jeopardizing its reputation as the leader
       | in the space. Obtaining excessive margins through excessive
       | pricing only works under monopolistic conditions or if they had a
       | truly distinctive product. Both are not the case imo. Also, it's
       | early days. Not exactly sure what Snowflake's market share is,
       | but I bet it is < 5%.. so they haven't locked in everyone yet...
       | 
       | I bet that Snowflake will be forced to compete "also on price" in
       | the next five years because free enterprise is a powerful thing.
       | The title of the article could be "Why Snowflake is (still)
       | expensive but will get more affordable over the next few years"..
        
       | wiradikusuma wrote:
       | So, what is Snowflake? (I assume it's snowflake.com) From
       | Googling it looks like Google's BigQuery. So it's a DB?
        
       | brianwawok wrote:
       | Ran into the same exact thing at CircleCI.
       | 
       | Me: My builds are really slow
       | 
       | CircleCI: Here are a few very low effort answers
       | 
       | Me: git checkout is taking literally 60 seconds, but it takes 3
       | seconds locally, why?
       | 
       | CircleCI: Mumble Mumble.
       | 
       | They charge per minute, so why would they care if builds are
       | slow? Was about a year of this getting worse and worse, till I
       | finally cancelled the service last week and built my own server
       | in my basement.
       | 
       | I know get 200% faster builds, and the hardware payback time is
       | not very long (6 months of my CircleCI bill?).
       | 
       | I think it's a huge red flag anytime the metric you care about is
       | something that being "worse" makes the provider more money.
        
         | icedchai wrote:
         | At a previous startup, we dumped CircleCI and switched to
         | Jenkins on our own EC2 instance. We had a lot less problems.
         | (This was way back in 2016, I'm sure things have improved now.)
        
           | brianwawok wrote:
           | Yup!
           | 
           | I ended up doing TeamCity over Jenkins, but they do the same
           | thing.
           | 
           | Amazing how fast a 32C / 64T EYPC server in my basement can
           | be..
        
             | icedchai wrote:
             | I can only imagine! I have a 3950X here (16C / 32T) with
             | 128 gigs of RAM and it is incredible. Total overkill for a
             | home lab though.
        
         | sremani wrote:
         | From the front page of CircleCI.
         | __________________________________________________ Industry-
         | leading speed As soon as you think it, you can deliver it. Your
         | developers' time is too important to waste. No other CI/CD
         | platform takes performance as seriously as we do. Your
         | pipelines should accelerate your business, not slow you down.
         | __________________________________________________
         | 
         | Rule of thumb: Anyone talking about their honesty is not
         | honest.
        
           | thexumaker wrote:
           | We did the same thing but with self hosted runners with
           | github actions.
           | 
           | https://github.com/philips-labs/terraform-aws-github-runner
           | 
           | phillips-labs has some good resources for scaling this up as
           | well.
        
         | Fatnino wrote:
         | Those app rental scooters that are littered around city
         | centers: you pay for distance as well as for time. And that's
         | why they don't go very fast.
        
           | gkoberger wrote:
           | No, they're legally limited to 15 MPH for safety. You also
           | don't pay for distance, just time. Not everything is a
           | conspiracy theory.
           | 
           | SF: https://www.williamweisslaw.com/sf-e-scooter-laws/ NYC: h
           | ttps://www1.nyc.gov/html/dot/html/bicyclists/ebikes.shtml#:..
           | ..
        
           | CrazyStat wrote:
           | Also because going fast on those things is fucking dangerous,
           | especially when (like most people riding them) you're not
           | wearing a helmet.
        
             | djbusby wrote:
             | And have been drinking (it's what I saw a lot of)
        
             | wpietri wrote:
             | Absolutely. I don't have a problem with most scooter
             | owners, but here in a city with a lot of tourism, the
             | rental scooters are often a menace. I was waiting outside
             | of a restaurant and took half a step back to let some
             | people through. I brushed against something moving fast,
             | and it was some tall yahoo going downhill at max speed on a
             | scooter. If I'd taken a full step back, somebody would have
             | needed medical treatment.
             | 
             | Learning how to ride one of those in a city takes time,
             | practice, and thought. Which you will surely get if you buy
             | one. But apparently not so for the rentals.
        
           | SkyMarshal wrote:
           | In that particular case another reason could be that they
           | quite reasonably don't want you going very fast for safety
           | and liability concerns.
        
         | mikewhy wrote:
         | I love that CircleCI flaunts it's speed compared to other
         | providers, meanwhile we can clearly see the CircleCI steps take
         | the longest in our builds.
         | 
         | Not to mention the constant failures.
        
         | hangonhn wrote:
         | 100% and not just in tech: when a party's incentives aren't
         | aligned with yours, you'll often find yourself getting little
         | help or even working in opposition to each other. We recently
         | experienced this with filing for a health related insurance
         | claim. My wife wondered why they kept losing stuff, not doing
         | what they promised, or asking for more paper work. I kept
         | explaining to her that while not necessarily malicious, they
         | have very little incentive to improve that department.
         | 
         | Always try to find partners or counter parties who win when you
         | do as well. I know we don't always have that luxury but
         | sometimes a little headache initially is better than being
         | stuck with someone who works in opposition to you in the long
         | run.
         | 
         | Thanks so much for sharing your story. We are in the process of
         | outsourcing some of our Jenkins functionality and these stories
         | are useful to hear.
        
         | josephcsible wrote:
         | > They charge per minute, so why would they care if builds are
         | slow?
         | 
         | It's worse than just not caring: they have a direct financial
         | incentive to make sure your builds are as slow as you'll
         | tolerate.
        
       | KingOfCoders wrote:
       | I have no Snowflake experience, but some limited BigQuery
       | experience. And it's very easy for a small company to get to
       | $100k/year bills without massive data.
        
         | mejakethomas wrote:
         | Completely agree. Currently staring at 700k+ BigQuery costs
         | annually and accomplished MUCH more with Snowflake at the same
         | price.
        
         | tootie wrote:
         | Anytime your cloud spend with a single vendor starts to get out
         | of hand, you just call and negotiate. If you make a multi-year
         | commitment, they'll apply a substantial discount. Also,
         | $100k/yr is still cheap compared to the cost of developers. Not
         | just in terms of actual price tag, but risk management because
         | a SaaS won't quit for a better offer.
        
           | dotopotoro wrote:
           | So you dont need developers when you use SaaS?
        
             | yazaddaruvala wrote:
             | If you need to hire 1 more developer at $100k to help
             | maintain your data warehouse or pay $100k for Snowflake or
             | BQ, its a no-brainer to use SaaS.
             | 
             | Also humans cost more than their salary: Recruiting,
             | management, benefits, attrition, vacation, the risk that
             | they are just not capable.
             | 
             | A human will also cost you more year over year (raises,
             | promotions, etc), SaaS will typically cost you less year
             | over year (optimizations, negotiations, competition, etc).
        
         | dominotw wrote:
         | they should switch to flat rate billing capped at slots they
         | are willing to pay for.
        
       | spullara wrote:
       | Snowflake increases performance all the time and their customers
       | just use more of it.
        
       | wigster wrote:
        
       | ramesh31 wrote:
       | I'm of the mind that Snowflake and Databricks are losing their
       | value prop now that Delta Lake is open source and Iceberg is
       | maturing. What's to stop me from rolling my own Spark clusters
       | and just using one of those? Is anyone doing this?
        
         | nemothekid wrote:
         | > _What 's to stop me from rolling my own Spark clusters and
         | just using one of those? Is anyone doing this?_
         | 
         | Ops. Unless your core competency is running reports and spark
         | nodes, it's probably cheaper to outsource the management of
         | Spark and friends than to hire people to make sure it's always
         | up and running. To be fair I haven't touched Spark in many
         | years but having to page someone who was good enough to spark
         | to debug why a job stopped at 3am isn't fun.
        
           | ramesh31 wrote:
           | >Ops. Unless your core competency is running reports and
           | spark nodes, it's probably cheaper to outsource the
           | management of Spark and friends than to hire people to make
           | sure it's always up and running.
           | 
           | I think as an end user I would absolutely agree on this
           | point. But many companies use Databricks as part of their
           | automated backend systems that they resell to customers. The
           | cost per "DBU" unit is astronomical for the amount of raw
           | compute in use. It feels a bit like running a restaurant
           | where you serve takeout.
        
           | nojito wrote:
           | I can spin up and down 100+ node clusters on the 4 largest
           | cloud providers at will.
           | 
           | What ops am I missing?
        
           | joshhart wrote:
           | [Disclaimer: Databricks employee] There's also a lot of value
           | in DBSQL, Unity catalog (data management), and serverless for
           | autoscaling that can all save money in terms of just running
           | raw Spark. But if you want to operate Spark yourself, cool do
           | it. We're happy for that, it builds the base of Spark
           | committers over time and increases the quality of our
           | products.
        
         | eximius wrote:
         | You'll find plenty of the customer base of Databricks used to
         | run their own clusters.
         | 
         | It's a tradeoff. It might cost less dollars but more time. The
         | time and expertise to run their own clusters effectively is not
         | something every org can or desires to do.
        
           | buttaphingas wrote:
           | And to get the very best price for those clusters your you'd
           | need to commit to the CSP for three years!
           | 
           | Would love to know the TCO trade-off between procuring,
           | securing and deploying on your own clusters vs having them
           | managed via SaaS.
        
       | throw8383833jj wrote:
       | it all comes down to the cost of switching and willingness of
       | users to switch. the higher the cost of switching the higher you
       | can make your product's price. Otherwise, with an extremely low
       | cost of switching, the cost will ultimately be driven to near
       | zero as more and more competitors enter the landscape.
        
       | YouWhy wrote:
       | I often analyze tools as reduction from the space of problems x
       | resources to the space of outcomes.
       | 
       | Let's consider Snowflake in this paradigm
       | 
       | - Problems: analytics on data that is not laid out in a way
       | that's directly accessible for analysts.
       | 
       | - Resources: SQL analysts, few or no competent data engineers,
       | spare cash
       | 
       | - Outcomes: run analytics at an industrial scale without
       | requiring competent engineers or DevOps.
       | 
       | Since Snowflake's optimal client gets very easily locked in, it
       | follows up that saving said client's money is not something even
       | the client would care about
        
       | [deleted]
        
       | dstola wrote:
       | "optimization gremlin" = dark-pattern to take as much money away
       | from you as possile
        
       | tablespoon wrote:
       | > RevOps management
       | 
       | And now "XxxOps" is a meaningless buzzword.
        
       | [deleted]
        
       | dboreham wrote:
       | Because someone needs a new boat?
        
       | shrimalpreeti wrote:
       | [Disclaimer: I work for a company that offers a Snowflake Cost
       | Optimizer product] We're an open-source monitoring & alerting
       | tool and many of our users were using it to set alerts on their
       | warehousing (Snowflake) costs. The problem with Snowflake is
       | particularly worse due to its lack of query level attribution of
       | costs and no in-built features for monitoring or recommendations
       | on improvements. We're building a Snowflake Cost Optimizer
       | (https://www.chaosgenius.io/snowflake-cost-optimizer.html) and
       | are hearing the same feedback from our customers as the author
       | mentions. Snowflake is definitely coming up with features towards
       | better cost transparency but I wonder if it's too little too
       | late.
        
         | evtx wrote:
         | In my experience Snowflake is very receptive to enhancement
         | requests. If you feel Snowflake should be doing something
         | better for surfacing optimizations, I'd ask them.
         | 
         | That said, I'm not sure your comment is fully accurate: 1)
         | "lack of query level attribution of costs" Snowflake doesn't
         | charge per query so there can't be default query level
         | attribution of cost. Snowflake charges by second of warehouse
         | use. But you CAN easily see which queries ran on which
         | warehouse and allocate costs back to that using your own
         | criteria (by query second, usually better than by number of
         | queries). 2) "no in-built features for monitoring" Snowflake
         | has built in cost monitoring dashboards:
         | https://docs.snowflake.com/en/user-guide/cost-overview.html And
         | resource monitors: https://docs.snowflake.com/en/user-
         | guide/resource-monitors.h...
         | 
         | That said, I'm sure improvements could be made. Ask for them.
         | There must be a market for this because Capital One and
         | Acceldata and others offer similar solutions for optimization
         | recommendations.
        
           | mejakethomas wrote:
           | This. Snowflake introspection five years ago looks very, very
           | different than today. Mostly due to enhancement requests.
        
       ___________________________________________________________________
       (page generated 2022-08-22 23:01 UTC)