[HN Gopher] Why is Snowflake so expensive
___________________________________________________________________
Why is Snowflake so expensive
Author : eyeball
Score : 302 points
Date : 2022-08-22 13:41 UTC (9 hours ago)
(HTM) web link (blog.devgenius.io)
(TXT) w3m dump (blog.devgenius.io)
| manassolanki wrote:
| Snowflake is expensive if not monitored properly, on top of that
| they provide minimal observability. There are some good features
| like auto suspend and auto resume for cost savings but still
| there is scope of optimisations. For ex, they will charge you for
| minimum 1 minutes even if your query is running only for 2
| seconds.
| wsostt wrote:
| Snowflake is so expensive that Capital One has developed a
| toolkit for managing your instance.
|
| https://www.capitalone.com/software/solutions/
| marymac wrote:
| I'd love to talk to someone who has tried this out - I think
| it's called Slingshot
| glenjamin wrote:
| I don't know if this is the case at Snowflake, but there are
| similar seemingly misaligned incentives with CircleCI's build-
| seconds-based pricing model.
|
| However, the generally accepted wisdom there was that improving
| performance had always led to more builds being run - and so
| still come out as a net-positive. This had happened a bunch of
| times as we upgraded CPUs or storage drivers or the version -
| there'd be a short term drop in direct revenue, but then it would
| bounce back quickly as people took advantage of being able to do
| more stuff in the same amount of time.
|
| I'm told the revenue and finance people were pretty concerned the
| first time it happened though!
| idoh wrote:
| I work at Circle, but not on this specifically, and echo the
| same experience. This (https://en.wikipedia.org/wiki/Khazzoom-
| Brookes_postulate) was cited last week in a meeting that I was
| in, for example.
| morelisp wrote:
| I would guess that this is less likely to be true of Snowflake
| than CircleCI.
|
| Most dev teams are underinvested in CI. That is, if you queried
| some random team, they'd probably have a dozen ideas for tests
| or processes they'd like to write/run if they had the
| resources, most of which would provide some real value - the
| ideas likely coming from some previous actual bugs that hit
| prod.
|
| Most BI teams are _overinvested_ in data. They have way more
| than is valuable. Large scale analysis is mostly exploratory
| and speculative, and rarely yields results. Any induced usage
| is more from fear they might throw away the magic bits than
| real value being unlocked by better efficiency. (And I think
| this is probably necessarily true. Any BI process that gets to
| the point the data is clear and regularly actionable also gets
| operationalized and right-sized through a more normal dev
| process.)
| epberry wrote:
| > Not providing observability to monitor and reduce costs
|
| Vantage just launched this - https://www.vantage.sh/blog/vantage-
| launches-snowflake-suppo.... The problems the author describes
| are almost exactly what we heard from customers:
|
| - list of users/queries that are the most expensive
|
| - alerts and notifications for costs
|
| - query timeout. Not something a third party can do but there is
| an interesting 'query tagging' feature for snowflake which
| Vantage supports.
| mejakethomas wrote:
| It's not expensive.
|
| What it can do, successfully, with three engineers was previously
| impossible with dozens.
|
| What IS expensive is not being careful with it.
| marymac wrote:
| THIS. Apply the correct guardrails and learn to optimize.
| carlineng wrote:
| [Disclaimer: former Snowflake employee]
|
| Snowflake is not expensive because of perverse incentives, which
| is the primary claim of the article. It is expensive because it
| is a highly differentiated and very sticky product.
|
| As others have mentioned, competition is the ultimate incentive
| to work on performance. Every dollar of Snowflake revenue is a
| dollar of revenue that Amazon, Google, Microsoft and Databricks
| are fighting for.
| mejakethomas wrote:
| This, 100%.
|
| It eats/consolidates formerly-disparate costs around the org.
| Because it's so good.
|
| Which makes it look expensive.
| discodave wrote:
| > Every dollar of Snowflake revenue is a dollar of revenue that
| Amazon, Google, Microsoft and Databricks are fighting for.
|
| This is true, but misses one detail...
|
| Snowflake _runs in the cloud_ so every dollar of Snowflake
| revenue is roughly $0.40^1 of Amazon /Google/Microsoft revenue
| anyway.
|
| ^1: Snowflakes gross margin is in the range of 50-60%
| https://www.macrotrends.net/stocks/charts/SNOW/snowflake/gro...
| klysm wrote:
| They aren't exclusive. They also have perverse incentives to
| leave optimization gremlins in, even if they are very low
| hanging fruit to remove. They also have the incentive to not
| document them well.
| daniel-cussen wrote:
| Oh like injecting jitter so there's no consistency in
| measurement?
| teej wrote:
| From what I can tell, the author is incorrect about the example
| given in "Optimizer gremlins". I tested an example on my own data
| and micro-partition pruning was active.
|
| The issue with dbt models in Snowflake is that if you ever
| perform a full-refresh and don't sort it, you ruin any natural
| clustering that arises from an incremental model. I've run into
| this issue many times. Auto-clustering gets too expensive at
| scale and Snowflake doesn't give you much guidance on
| alternatives.
| toto444 wrote:
| The competition is tough in the data warehousing industry, if
| Snowflake is expensive people will know. Current customers may
| not leave but it's going to be harder for them to get new
| customers.
| KingOfCoders wrote:
| Everyone seems expensive (Looker seems to be the most
| expensive), and vendors are hard to compare. When evaluting
| some of them for a migration project, they would not let us run
| performance tests with our data to compare them and make a
| decision (paid).
| buremba wrote:
| I believe they need to focus on the performance at least nowadays
| because both Databricks & BigQuery are also great products and
| they push Snowflake in terms of feature-parity and performance.
|
| That being said, Snowflake is also pushing for the marketplace
| model where you publish your app natively to move your code where
| the customers environment is. If they become successful, the
| performance might not be the one of the incentives for the
| companies to go with Snowflake and the switching cost might be
| higher as companies will move more of their business logic
| embedded in the system.
| jmacd wrote:
| Retrospectively, this is very similar to how most SaaS behaved
| when per user per month billing was first introduced. There were
| almost never any actual limits on the number of users you could
| add to the software, but you purchased a license for a certain
| number. Occasionally your account would be audited and you would
| be billed of the overage. It was always a significant penalty.
| The same was true for CPU based licenses for things like IIS, SQL
| Service, Oracle, etc.
| 0xbadcafebee wrote:
| > Snowflake has no incentive to push a code change that makes
| things 20% faster because that can correspond to 10-20% drop in
| short-term revenue.
|
| If they improve performance they can lower the cost to customers,
| which will make the product more attractive to prospective
| customers. But if they are already swimming in cash they may not
| feel the need to gain more customers.
|
| Only threats prompt companies to improve things. Threat of a
| competitor, threat of losing all their money, threat of bad PR,
| threat of regulation, threat to the stock price, etc.
|
| I see this every day in companies that don't care about managing
| their cloud costs. They waste money like crazy because they
| literally don't care if they lose money, because some exec
| doesn't care, or they got enough funding until the next round,
| etc. A couple years later another exec asks why the CISO/CTO is
| spending so much money without any ROI, and then everybody has to
| stop everything they're doing to shave pennies off cloud costs.
|
| Companies run by individual executives are insane. I don't
| understand why people allow companies to be run this way. I think
| a co-op where employees could be active participants in the
| running of the company would allow for more sane decision-making.
| cedricd wrote:
| I'm glad the author also points out how customer (mis)use can
| blow up data warehouse costs too. No matter how efficient
| Snowflake could get, using the warehouse too much or with
| unnecessary queries will ultimately have a larger impact.
|
| The trend in the data space currently is for usage to increase --
| as more companies adopt dbt they're running more and more
| prebuilt (materialized views) queries on a scheduled basis,
| rather than on demand. This is overall a good thing in that data
| is becoming easier to manage and use, but it does come at an
| increase in warehousing costs.
|
| I think eventually the pendulum will swing back to tools that
| help optimize warehouse usage, as long as they allow for the same
| increase in productivity as dbt (disclosure - I work for one such
| company)
| alberth wrote:
| This is all much simpler than the post makes it sound.
|
| It's usage-based pricing and customers are using more of it.
|
| > a customer that joins a year ago and spends $1 is paying out
| well over $1.7 a year later
|
| The entire article is based on this 1.7x "net dollar expansion"
| statement.
|
| After integrating Snowflake, customers have found value in using
| Snowflake and are using _more of it_ 1 year later.
|
| Since Snowflake is billed on usage, that explains the net-dollar
| expansion.
| imwillofficial wrote:
| It's easy to point out ways leaving in foot guns look predatory.
| But that's not always the case.
|
| I work for AWS in billing, and the way we calculate bills is to
| try to et the customer the maximum discount.
|
| Things like calculating savings plan coverage from smallest to
| largest to maximize utilization, or turning on Reserved Instance
| sharing on by default within an org.
|
| I would say that the seemingly gouging behavior is more often
| than not technical or time constraints.
| msluyter wrote:
| Some of these complaints seem fair to me, some not as much. tl;dr
| -- Snowflake requires a fair bit of knowledge/effort to use
| optimally.
|
| I spent a number of months last year focused on lowering
| Snowflake spend. In the process I learned a ton about Snowflake
| and gained a fair amount of respect for the product. Respect as
| in "this is really great" as well as respect as in "I need to be
| on guard here or I'm going to get hurt."
|
| I think my biggest misconception at the outset was thinking of
| Snowflake like it's a relational database. It's not. Or rather,
| it is with a large number of caveats. Snowflake doesn't have
| b-tree indexes -- rather it has "clustering keys," which are sort
| of like coarse grained indexes that colocate data in
| micropartions, allowing queries to do micropartition pruning. If
| you have a well clustered table and you're filtering on your
| clustering keys, things will be great. But if not, or, for
| example you have to do multi-table joins on non-clustered
| columns, you'll suffer. So unless you have search optimization
| enabled (which costs more!), you have to retrain yourself away
| from "oh, just add an index here or there to make things fast"
| type of thinking you may have had working with Postgres or
| whatnot.
|
| Regarding the author's complaints about lack of observability, I
| generally found it pretty easy to analyze what was going on via
| the query_history table. And the built in query analyzer is quite
| helpful. We did add tags to our dbt runs, which was pretty easy,
| and I wrote a handful of queries to find like the most expensive
| dbt models. It wasn't really that hard.
|
| That said, dbt in particular provides a number of foot guns wrt
| Snowflake. Subqueries, as the author mentions, is one. We created
| some custom dbt macros to do things like instead of `select *
| from foo where x in (select * from blah)` -- if blah was small --
| do a query on blah and write the query using a literal list, like
| `select * from foo where x in ('a', 'b', 'c', 'etc...').
|
| Another issue we discovered is that in dbt it's trivial to create
| views. But we found that if views get too deeply nested,
| Snowflake can't adequately do predicate pushdown. So big stacks
| of views on views are suboptimal.
|
| Another interesting one was tests. Dbt makes it trivial to
| perform null or uniqueness checks against a column. We found we
| were spending a lot on those tests that simply were doing
| something like `select * from blah where col is null`. On non-
| cluster key columns or complex views, these were causing full
| table scans. We took a number of steps to mitigate those issues.
| (Combining queries; changing where we did these checks in the
| dag). The way tests are scheduled is problematic as well. One
| "long pole" test will keep your warehouse up and using credits
| even after the other 99.9% of the tests have completed. After
| some analysis we separated long pole tests from the others and
| put them on different warehouses.
|
| I could go on and on, actually, but I think that provides a taste
| of some of the complexities involved. Like almost any tool, you
| have to really understand it to use it effectively. But it's all
| too easy for, say, analysts, who may be blissfully unaware of the
| issues above, to write really poorly performing SQL on Snowflake.
| NonNefarious wrote:
| forbiddenlake wrote:
| It makes perfect sense if you know that Snowflake is a
| product/company. It just needs to be capitalized (and the
| trailing question mark restored).
| wink wrote:
| If only there was the possibility to link the first
| occurrence of a word to an external URL on a website.
| NonNefarious wrote:
| Or add a descriptive phrase to a headline. Heaven forbid.
| NonNefarious wrote:
| What an asinine excuse. "It makes sense if it makes sense to
| you." And "snowflake" wasn't capitalized, so it wasn't a
| proper name. And even if it were (as it is now, having been
| fixed after I posted the above complaint), it would be just
| another douchily obscure headline on HN. If you're too lazy
| to say WTF you're talking about in a headline, don't burst
| into tears when you're called out on it. And, oh man, you're
| not even the OP... even more pathetic.
|
| It's depressing to see insecure infants infecting HN with
| Reddit-style tantrums just because somebody said something
| mildly critical. If you're too gutless to demand better, at
| least STFU when others do.
| benjaminwootton wrote:
| The monthly bill does make me wince, but Snowflake of course
| includes all server and compute costs, no installation, initial
| configuration or upgrades etc. It's genuine SaaS.
|
| It's also very simple to manage and optimise so less DBA or
| DevOps type manpower.
|
| Then of course you can perfectly right size your instances and
| pay by the second for compute and by the byte for storage.
|
| Expensive, but lower TCO than alternate approaches I suspect.
| Keyframe wrote:
| _It's also very simple to manage and optimise so less DBA or
| DevOps type manpower.
|
| Then of course you can perfectly right size your instances and
| pay by the second for compute and by the byte for storage._
|
| These two are connected vessels.
| jeffwask wrote:
| Yeah...100%. It's expensive til you try running a data
| warehouse yourself and have to hire in to support it.
|
| Like any other service there are scale points where it no
| longer makes sense but for most smaller orgs it's still a
| bargain over DIY
| nojito wrote:
| We did a cost analysis and found databricks and BQ to be
| cheaper than a similar snowflake build out.
|
| I think people are falling into a trap of not considering
| costs because "it takes care of everything".
| dominotw wrote:
| > cost analysis and found databricks and BQ to be cheaper
| than a similar snowflake build out.
|
| Wouldn't this mean snowflake has priced their product not
| competitively. Why would they do that if its so obvious
| that everyone would just save money from switching to DB.
|
| > I think people are falling into a trap
|
| This is their product strategy? to take advantage of
| gullible businesses falling into their trap.
|
| Surely building a whole business around customers falling
| into trap has to backfire at some point.
| georgewfraser wrote:
| The core claim of this article, that Snowflake doesn't implement
| optimizations that would reduce usage, is not true. Search
| optimized tables, partitioned tables, and per-second billing are
| all counterexamples.
| [deleted]
| [deleted]
| darksaints wrote:
| > We have 5-6 very good open-source data warehouse alternatives.
| We have Redshift, DataBricks, Firebolt, BigQuery, and likely a
| few other enterprise offerings, yet it is surprising how little
| training most companies have in negotiating and re-negotiating
| vendor contracts or in pushing for heavily discounted pricing.
|
| Small nit: Redshift isn't open source. I would also add
| Clickhouse, Citus, and TimescaleDB as majorly capable open source
| technologies with commercial offerings in this space.
| falcolas wrote:
| Snowflake is a bit generic to easily find - and the article has
| no hyperlinks - anybody have a one sentence summary?
|
| EDIT: There it is: https://www.snowflake.com/
|
| Data warehousing, basically.
| thesandlord wrote:
| It's a data warehouse, like Google BigQuery or AWS Redshift /
| Athena
| rsweeney21 wrote:
| This is a great example of misaligned incentives.
|
| Another example of misaligned incentives is LinkedIn. LinkedIn
| charges $3/message. The more messages sent on their platform, the
| more money they make. They are not incentivized to help sales or
| recruiters target the right people. It can be a cash cow in the
| short term, but it creates a negative experience for your users.
|
| The fact that it has worked for so long is a testament to how
| strong network effects are.
|
| In the case of Snowflake, high switching costs will protect them
| for a while.
| twawaaay wrote:
| Snowflake is not expensive. Snowflake is super cheap, _IF_ you
| know what it is for and how to use it. Compared to if you had to
| solve the problem on your own.
|
| The best way to describe Snowflake is that it is a brute force
| method to run complex queries without creating indexes.
|
| If you have a more traditional database, you will notice you need
| to set up indexes to be able to get anything from it in finite
| time. What if you don't know the indexes upfront? What if you
| want your users to be able to ask arbitrary queries and get
| answers before bedtime?
|
| That's what Snowflake is for. It automates using _ENORMOUS_
| amount of hardware to get your query executed fast, very
| inefficiently.
|
| It is not for free though. That inefficiency will cause a lot of
| resources used for queries. It is meant for those few queries
| when your users try to get some insight into your data and you
| can't predict indexes beforehand. Sometimes this is exactly what
| you want, like when you let your data people in to figure stuff
| out. Or when you have very rare functionality that allows the
| user to build their own queries -- which you should avoid like
| hell (and there are tricks to make it index pretty well) but
| can't always avoid.
|
| For everything else, whenever you can predict your indexes, you
| always want to use more traditional database that can be very
| efficient on queries properly supported by indexes.
|
| The issue is a lot of people try to use Snowflake as a database
| or to support frequently executing queries of the same kind. This
| is bad and it will cost you.
| danielmarkbruce wrote:
| Materialized views help with this. It might not be perfect, but
| it isn't as bad as you say.
| ssalka wrote:
| > The issue is a lot of people try to use Snowflake as a
| database or to support frequently executing queries of the same
| kind. This is bad and it will cost you.
|
| It seems totally natural to expect these use cases to be well-
| supported & cost-efficient. That they're not I think is likely
| to be misunderstood by a great many people, even technical
| folks.
| JustLurking2022 wrote:
| Honestly, in the financial world, I think the value proposition
| may be less about anything to do with the query capabilities
| and more about the permissions model. Making it simple to
| provide clients with visibility into their data in a structured
| way that doesn't involve shoveling around text files (with
| numerous formatting gremlins to worry about) is a huge win in
| and of itself.
| zurfer wrote:
| It is fair to critize that some workloads on Snowflake are
| expensive.
|
| What I found however is that Snowflake is indeed super cheap if
| we look at Total Cost of Ownership (TCO). Compared with other
| cloud data warehouses it is even easy for to cost control
| (warehouse size with autosuspend and resource monitors).
|
| I work with many Snowflake customers and the biggest cost they
| are concerned with is usually training users so they don't
| shoot themselves (wrong joins, external programs "pinging" the
| service, ...).
|
| Snowflake is mainly expensive because of usage, not because of
| bad query optimization.
|
| (Co-Founder at https://www.sled.so/)
| awinder wrote:
| I think the main metric that this is built on may be too coarse
| to derive the meaning that the article is. There's conjecture
| that what's driving this is more querying over the same dataset
| (more streamlit dashboards) but it could just as easily be
| expanding usage inside of companies. That's what's going on at my
| company right now, more teams using snowflake, more data being
| pushed in to replace existing workflows, etc.
|
| I'm also not sure I understand the dig at streamlit dashboards.
| If you're running hardware and introduce new read workflows,
| eventually you'll need more read replicas and you'll pay more for
| it. Maybe you can argue that snowflake is doing this at a higher
| cost but the metric data is not available in the sources to make
| that claim.
| pykello wrote:
| (I am not affiliated with Keebo, although I had a recruiting
| meeting with them earlier this year)
|
| FWIW, Keebo (https://keebo.ai/) tries to solve this problem &
| reduce your Snowflake bill by using Data Learning techniques. It
| can be configured to return exact results or approximate results.
| not-my-account wrote:
| It is always interesting seeing companies building up on the
| products / services of other companies. Kinda like TurboTax
| built on the IRS, these "children" (is there a better term?)
| companies are quite dependent on the "parent" company not
| changing or improving its product / service.
|
| I don't see AWS changing so dramatically that companies like
| DataBricks are put in hot water (but I could be wrong), but I
| could see Snowflake improving its product due to competition,
| putting Keebo in a tough situation.
| morelisp wrote:
| By the time I reached this comment I counted no fewer than
| five completely separate links to offerings to help reduce
| your Snowflake bill. For something that is already a focused
| SaaS product, I have to say that starts to smell a bit.
| hobs wrote:
| I am like 95% sure that the MAX issue he mentions is wrong - I
| just modified some windowing function based approaches to the one
| he mentions and its several OOM faster because of partition
| elimination.
|
| Nonetheless I agree with the basic points of the article.
| jwie wrote:
| You would think they would be saving (and charging the customer!)
| a bundle not enforcing constraints on their tables.
|
| I'd be very interested to hear the Snowflake side of this
| decision, but to the customer it's simply unforgivable to have
| cosmetic constraints on a database.
| dominotw wrote:
| Because snowflake doesn't build foreign key indexes. Imagine
| clickstream data where every insert is being checked against an
| index of customers. This isn't a typical usecase for big data
| warehouses.
| jwie wrote:
| I understand that. But why have constraints that don't do
| anything?
| atwebb wrote:
| Metadata
|
| Tools and scripts can work off of it, design decisions are
| documented, suggestions can be made, inferences can be made
| (some dangerous, some not).
|
| Why tag S3 objects if it doesn't enforce a schema? Maybe a
| bad analogue but I'm going quick right now :).
| evtx wrote:
| There are plenty of reasons why MPP databases allow the
| definition of constraints but don't enforce them. I'll list
| two: 1) BI tools can use them to optimize joins 2) Data
| modeling tools can use them to reverse engineers models
| without having to pattern match the keys.
|
| That said, Snowflake does support constraints if you use
| hybrid tables (a preview feature announced at their last
| conference).
| veeti wrote:
| Do you really need functional constraints in a OLAP database?
| Surely such validations already exist wherever your data is
| coming from.
| Foobar8568 wrote:
| Ohohoh yeah sure, you mean application based constraints? Or
| an Entity-attribute-value base application ? What about
| documents?
| marcinzm wrote:
| Do you have any data on the pricing of distributed databases
| that do support proper foreign key constraints? And how it
| stacks against Snowflake pricing?
| flyinglizard wrote:
| Where does all this data go? It's processed and then what? Sent
| to decision makers? Used to run automated processes?
|
| I'm genuinely curious and would appreciate anyone who could show
| a real life example of this kind of pipeline where data is
| accumulated, then processed, then turned into revenue at the
| other end.
|
| I've implemented systems that do this but my experience is that
| accumulating data is (too) easy, processing it in a meaningful
| way is slightly more challenging but ultimately driving positive
| business processes according to this data, which require a lot of
| friction with employees (training, procedures, maintenance,
| support) is the most difficult part.
| lysecret wrote:
| Same experience. I think the most interesting and most public
| example of such a pipeline is Google/ building a search index.
| This is also where a lot of the methods originally came from.
| Nowadays a lot of this will be used to build recommendation
| systems / feature pipelines for ML.
| frankbinette wrote:
| These are a bit too advanced examples. Think of simple
| descriptive statistics which is still so important yet not
| sexy as ML/DL/AI. ML is great, but the main usage behind
| these data technologies is still simple business
| intelligence.
|
| Every business in every market need to understand what is
| going on with their processes. How many sales did I do
| yesterday, last week, last month, compared to last year, in
| which stores, what is the average basket amount, customers
| buy what with what, what size t-shirt do I sell the most,
| etc.
| frankbinette wrote:
| Seems like you kind of answered your own question... this data
| is used for business intelligence purposes.
| jjfoooo4 wrote:
| This is a kind of poor engineering writing in which the author
| finds a product to not be tailored to his precise tastes and
| concludes it is because the company is user hostile and/or
| doomed.
|
| The bit about Snowflake not being incentivized to care about
| costs are trivially untrue. The rest of the article perceives
| trade offs as simple feature gaps.
|
| For example, Snowflake gives the user more latitude to distribute
| workloads among "warehouses" than other offerings. With poor
| distribution the author will experience the workload provisioning
| issues he describes.
| beoberha wrote:
| I disagree with the assertion that Snowflake has no incentive to
| improve performance. While I don't work for Snowflake, I work for
| a competitor and we're constantly looking to improve performance
| to make customers happy.
|
| For the exact reason that the article claims Snowflake wouldn't
| innovate, I'd assert that they would. If they are expensive and
| slow, and a competitor is faster and cheaper, eventually they
| will see business move to the competitor. We see it all the time.
| wpietri wrote:
| Could you say more about the relative market position of your
| two companies?
|
| I don't know the market at all, but Snowflake is certainly
| large and successful (IPOed in 2020, $50bn market cap). I could
| readily imagine that a company doing so well might not feel the
| incentive to improve very strongly. Or that they might see
| themselves more as a sales/marketing-led company than one where
| technical quality is a key driver. Whereas you folks as a
| challenger would have a lot more incentive to differentiate
| yourselves.
| beoberha wrote:
| You could probably google my username and find out, but I'll
| say we're bigger than Snowflake and are very much entrenched
| in the enterprise database market :)
| PaulWaldman wrote:
| Chrun for these services take a long time. They are "sticky"
| and have the baggage of enterprise agreements. With the
| switching costs never being zero, if SLAs are being met, it's
| exceedingly difficult to switch vendors.
|
| Alternatively there is a faster impact on new sign-ups when
| falling behind competitors on costs and benchmarks.
| tomnipotent wrote:
| > have the baggage of enterprise agreements
|
| Snowflake let's you roll into pay-as-you-go after a contract
| expires.
| danielmarkbruce wrote:
| They are all out to get new logos. They spent about $800m on
| S&M TTM v $1.4 bill rev. They aren't milking their customer
| base for cashflow.
|
| And large customers are moving to them in droves.
| dominotw wrote:
| Their stock price is pegged at new customer acquisition. They
| signed up over 6k new customers last qtr. This is one of
| their top stats that they present to investors.
| beoberha wrote:
| I worded it poorly, but I don't necessarily mean a full
| exodus from the platform. In my experience, large enterprises
| have a lot of workloads running on different technologies
| (for whatever reasons) and the migration to cloud is a multi-
| year effort. If someone is just dipping their toe into
| Snowflake with easy-to-migrate workloads (which is very
| likely given their relative age in the market) and see
| performance and cost issues with those workloads, they may be
| hesitant to migrate the bigger ones and use that as leverage
| to get Snowflake to improve.
| cs702 wrote:
| Exactly. For enterprise customers in particular, replacing a
| SaaS tool that's deeply intertwined with many internal
| systems is about as easy and convenient as it is for a
| homeowner to rip out his/her home's existing HVAC system to
| replace it with a newer, more efficient one. No one ever
| wants to do _that_ -- unless there 's absolutely no other
| choice.
| cs702 wrote:
| Great article. On the surface, it's about Snowflake. At a deeper
| level, the article is about the perverse incentives motivating
| SaaS businesses to do seemingly dumb, inefficient things and
| avoid seemingly obvious optimizations by default.
|
| Many SaaS businesses are perfectly happy to let customers shoot
| themselves in the foot if it generates more revenue. The BigQuery
| example (presently, by default, `select * from table limit 10`
| obediently scans the entire table at _your_ expense!) is spot-on.
|
| As the article so well puts it, every SaaS company has a vested
| financial interest "to leave optimization gremlins in."
| kolinko wrote:
| in case of BigQuery it makes sense though - they use map reduce
| on distributed clusters, so there is no easy way to stop after
| 10 results are found
| JimmyAustin wrote:
| It's pretty easy to limit the number of results returned by
| each partition to by limited to 10, then have that further
| reduced to 10 total during the reduce step.
| danielmarkbruce wrote:
| It's a terrible article. The author misunderstands
| _competition_ and how much it drives products in this area.
| Snowflake is incentivized to make their product better on every
| dimension. If Snowflake don 't improve, customers will leave in
| droves - _like when they moved to Snowflake_.
|
| In practice, as has been pointed out in other comments, they do
| improve their performance (for competitive reasons) and it does
| cost them money when they do it.... They did it a couple qtrs
| ago and left $97 mill on the table.
|
| https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...
| rurp wrote:
| There are many degrees of optimization and clearly there's
| _some_ cost to bad performance, but Snowflake still has a
| massive perverse incentive to not spend too much effort on
| improving performance. If Snowflake is like every software
| company I 've ever been involved with there are many
| competing projects at any given time and direct revenue
| impact is a big factor in what gets prioritized.
|
| My own experience with Snowflake absolutely backs up the
| article's point. At my work we routinely encounter abysmal
| performance for certain types of queries, due to a flaw on
| Snowflake's side. We have had numerous talks with them and
| there is no question that they have an issue, but they have
| shown absolutely no urgency to fix it. Their recommendation
| is that _we_ spend more money to work around the problem on
| their end.
| geoduck14 wrote:
| >At my work we routinely encounter abysmal performance for
| certain types of queries, due to a flaw on Snowflake's
| side.
|
| Do tell! I'm a current Snowflake customer, I'd like to know
| what to look out for.
| uoaei wrote:
| I don't think it misunderstands business competition. In fact
| it understands the concept of competition very well, and
| develops an insightful critique into the perverse incentives
| that are borne from competition.
|
| It benefits no one except for a couple thousand people to so
| blatantly play their customers in this way. In fact, it's
| worse, as it incentivizes that same behavior of other market
| actors in the space.
| danielmarkbruce wrote:
| What exactly in the article suggest the author understands
| the pressure of competition on incentives?
|
| The author states that Snowflake are not incentivized to
| increase performance due to short term revenue concerns but
| doesn't mention they are also incentivized to do the
| opposite from a competitive perspective. The result is
| incomplete enough that it ends up being flat wrong with
| respect to the behavior that the company actually engages
| in.
|
| The author missed the fact Snowflake did the very thing
| he/she suggested they were incentivized not to do,
| recently, at a cost of $97 million. The CEO explained _why_
| they are doing it and how they are _actually_ incentivized.
| I don 't know how the article could miss the mark by more
| than it has. The company literally does the opposite of
| what he/she suggested.It's not like they are the only one
| either, AWS has a history of reducing prices. Why? Once
| again, competition.
| morelisp wrote:
| > The CEO explained why they are doing it and how they
| are actually incentivized.
|
| The CEO explained why he thinks it's a good long term
| plan... but for now, they get money i.e. are _actually
| incentivized_ by slow code. The CEO 's incentives are
| theoretical ones.
|
| And the market, which ultimately control whether the CEO
| gets to continue that plan or not, did not seem to agree
| it was a good plan.
| danielmarkbruce wrote:
| By this reasoning, everyone would shirk at work. If you
| think incentives only act over short time horizons, I
| don't know how you explain an enormous amount of human
| behavior.
|
| The market didn't even understand it. Most of the people
| trading equities, especially around earnings
| announcements, don't know what a data warehouse is or
| what matters in that market. All they saw was "miss".
| morelisp wrote:
| I didn't say the CEO was wrong or that long-term thinking
| is bad! I said the _actual incentives_ are still
| misaligned. (I mean, a lot of people _do_ shirk at work,
| and it even works out well for them.)
|
| I think you have a weird and probably not useful
| definition of "actual" if "monthly revenue" is not actual
| but "projected monthly revenue two years from now" is
| actual. (Or maybe I've just lived in Germany too long.)
| danielmarkbruce wrote:
| You are right, I've used the word "actual" incorrectly.
| What I should have said was "net". Ie, both short term
| and long term revenue incentivize behavior and in this
| case the net result was increasing performance, ie long
| term incentive > short term incentive.
| AdamProut wrote:
| We regularly benchmark the "big 3" Cloud Data warehouses -
| Redshift, Snowflake and Big Query at SingleStore. Their
| performance is very close to the same (within 10-20%) on most
| benchmarks on reasonable sized data sets (10s of TB).
|
| I agree if the performance of one of them fell behind the
| others for any prolonged period of time the cost to the
| laggard in market share would be much much worse then short
| term revenue gain of "being slow on purpose".
| simo7 wrote:
| The main flaw of the article is not controlling for product
| category.
|
| I suspect most data warehouses have similar NDRs.
|
| In many companies a data warehouse is the place where you
| dump all your data and let everyone run poorly written
| programs against it.
|
| Add to that poor engineering culture in data teams (often
| lead by non-technical people) and costs are bound to
| skyrocket.
| didgetmaster wrote:
| While I think it is definitely in a company's best long term
| interest to implement features that benefit its customers; it
| might not be in the best interest of those who are currently
| running the company.
|
| We have seen many, many examples of executives who are
| willing to sacrifice the future of the company to get a
| personal short-term gain. Jack up the revenues (or slash
| costs) in ways that alienate customers is a great strategy
| when you plan to jump off with your golden parachute in a
| couple years when all your stock options vest.
| jjfoooo4 wrote:
| Sure but to not even mention churn as something Snowflake
| is worried about is pretty silly. With the funding
| environment taking a dramatic turn they (and every other
| SaaS company) are going to be deeply concerned about price
| competition and churn
| danielmarkbruce wrote:
| Agreed. But a good article should have shown an example
| rather than a counter example. Intel might have been a good
| example. A good article would have shown the competing
| incentives at play rather than a single incentive.
| thehappypm wrote:
| I worked at a BiqQuery shop and they have a terrific feature
| where right next to the "Run query" button there is an estimate
| of the cost of the query, in bytes. It becomes extremely
| obvious when a query is a full table scan.
| dcow wrote:
| I was thinking about this too. Why don't SaaS companies just
| force price increases to offset their broken pricing model?
| Nobody would care, you're paying the same you were paying
| yesterday. If you're still the best in class product with
| sticky features people will stay. If not and you're competing,
| then you have the opportunity to reduce the price in the future
| or simply not increase it and let users see lower bills which
| might also retain them.
| makk wrote:
| > As the article so well puts it, every SaaS company has a
| vested financial interest "to leave optimization gremlins in."
|
| It depends on the time scale. A SaaS optimizing for, say, a 1-3
| year financial return will see their interests through a
| different lens than one optimizing for a multi-decade return.
| Leaving optimization gremlins in isn't aligned with customers'
| interests in the long run, so the customers will eventually
| find alternatives if the SaaS doesn't eventually align itself
| with customers.
| smugma wrote:
| "As an investor, I expect Snowflake to show amazing
| profitability and record-breaking revenue numbers. As an
| Engineer, if Snowflake continues on the current path of
| ignoring performance, I expect them to lose share to the
| open-source community or some other competitor, eventually
| walking down the path of Oracle and Teradata. Here are a few
| things I think they can do to stay relevant in five years."
| danielmarkbruce wrote:
| The point is incentives.
| Aulig wrote:
| It feels like these companies haven't found the right value
| metric to price along. Ideally it should align with the value
| the customer receives.
| altdataseller wrote:
| But that's almost impossible to measure by Snowflake. How
| would they know how much more revenue you earned because you
| use Snowflake?
| Rastonbury wrote:
| I don't think their customers could quantify it if they
| tried (and i'm not implying Snowflake doesn't give value,
| it probably does but how does a company attribute it)
| polskibus wrote:
| Only competition can enforce this. The article ideally
| demonstrates the problems with monopolies and vendor lock-in.
| danielmarkbruce wrote:
| Snowflake is nowhere near a monopoly, and plenty of
| customers have moved from other vendors (Teradata, Netezza,
| etc) to Snowflake - showing that vendor lock-in is not as
| strong as it might seem.
| carimura wrote:
| Close. Product pricing is based on a variety of _perceived_
| factors (value, cost of change, risk of loss, etc.)
| deepGem wrote:
| Wow their statement about not participating in benchmarking
| wars is alarming. In this day and age, when benchmarking tools
| are so inexpensive and almost everything is very transparent,
| why not participate.
|
| Or even better engage with a neutral third party such as Jepsen
| to get on an even playing field and duke it out.
| [deleted]
| danielmarkbruce wrote:
| Because their value prop isn't being #1 on benchmarks. It's
| about
|
| * being easy to manage * being able to scale up and down
| compute so you can get good performance without having to
| keep a bunch of machines running.
| datavirtue wrote:
| Because their business is providing a solution that IT failed
| to. Despite the large cost, which the business was already
| accustomed to from previous IT attempts, pales in comparison
| to the additional costs of doing it themselves.
|
| It's like the cloud in general, the cost is high but so is
| the hype. When all that dust settles over the coming years
| the business will start shopping on price. They will then
| realize they have been locked in to some extent and will need
| to start wriggling loose of the lock-in.
| lokar wrote:
| Benchmark results rarely predict actual application perf. You
| need to run your own queries against your own data. Do a real
| POC.
| twistedpair wrote:
| FWIW, BigQuery tables can be configured to require a partition
| filter clause [0] in the SQL query, so that you cannot shoot
| yourself in the foot like that. Now if they'd just make an
| Organization Policy to let you turn it on by default for all
| new tables.
|
| [0] https://cloud.google.com/bigquery/docs/querying-
| partitioned-...
| cs702 wrote:
| Yes. That's exactly the OP's point: It's up to _you_ to
| remember to do the extra work necessary to avoid shooting
| yourself in the foot by default.
| tluyben2 wrote:
| Funny that most people here advocate aws while they have tons
| and tons of foot shooting tools that cost people 1000s of usd
| all the time. And we just accept it. Like if you want to kill a
| complex cluster with one api call or button click, it won't let
| you for xyz; that's not because they cannot, it's because you
| will just let it be and that makes money.
| soheil wrote:
| If that lowers the barrier to entry without having expert level
| knowledge to know what a full table scan even means why not?
| Instead of hiring a dba maybe you could hire an intern instead
| and happily eat the cost of Snowflake.
| florbo wrote:
| It doesn't lower barriers to entry, it's contrary to logical
| expectations for someone unfamiliar with how BQ works. If the
| query is limited to 10 results you wouldn't expect it to scan
| all 2 trillion of your records. Granted there are numerous
| warnings in the GUI for these types of things but make this
| mistake in Python and you're none the wiser.
| soheil wrote:
| Wait are you saying the BQ db engine is not following
| logical expectations? You do realize a "limit" clause
| doesn't prevent a full table scan in all cases, right?
| horsawlarway wrote:
| and that db expert you just recommended against hiring
| could surely tell you that... The intern won't.
| kalimoxto wrote:
| I think the point of the article is that an optimizer doesn't
| affect the barrier to entry at all, but adding it would save
| end users quite a bit of money. So they don't do it because
| end users' money is revenue for Snowflake/Alphabet
| soheil wrote:
| If you could just add an optimizer why doesn't the db
| engine just do that?
| whimsicalism wrote:
| Take a step back and reread the article and the comments
| you are replying to.
| spmurrayzzz wrote:
| Well said. I'd also add a cynical note that the recurring
| revenue model is incentivized to keep the gremlins around not
| just because of the impact to metered costs, but also because
| off-ramping is that much more difficult once engineers
| implement workaround/solutions to mitigate the impact of those
| smells.
|
| Just another way that vendor lock-in occurs (intentionally or
| otherwise).
| scarface74 wrote:
| Standard disclaimer: I work at AWS in consulting and could
| easily be accused of drinking the Kool Aid.
|
| Everyone from consultants, SAs, Sales, support etc is
| constantly working toward getting customers to "optimize" their
| spend. Of course any business wants you to give them more
| money. But, none of us are pushed to get them to spend money on
| services or methods to do things inefficiently.
|
| I specifically work in consulting specializing in "application
| modernization". That means most of my implementations are cheap
| and I'm constantly spending time making sure my implementation
| is cheap as possible and still meet the requirements. I first
| noticed this attitude from AWS when I was working for a
| startup.
|
| This isn't just with AWS. I spent years working in enterprise
| shops and saw the same attitude working with Microsoft.
|
| I can't speak for any other large organizations - AWS and
| Microsoft are the only two I've worked with as either a
| customer or employee where there was huge spending on
| infrastructure or software.
|
| Now I could easily get started about my opinion of Oracle from
| the customer standpoint. But I won't.
| benreesman wrote:
| Alright I'll bite finally. What do these companies do? Neither
| Snowflake's front-facing website, nor the Wikipedia article, nor
| this post tell me why people pay all this money.
|
| I know a bit about the effort involved in chucking around 100
| petabyte datasets, and there are numerous niches a SaaS could
| fill in there, but it's very murky from the outside.
| Croftengea wrote:
| I was wondering the same thing. This sums up pretty good I
| guess:
|
| > The best way to describe Snowflake is that it is a brute
| force method to run complex queries without creating indexes.
|
| (https://news.ycombinator.com/item?id=32554072)
| benreesman wrote:
| Column stores on DFS are without a doubt tricky beasts. It's
| a very rich field technically.
|
| I guess I'm trying to get a read on whether their core
| competency / moat is distributed columnar query technology or
| sales/support/marketing.
| colinmhayes wrote:
| Snowflake is slower and more expensive than competitors.
| I'd say its moat is mostly that its extremely easy to set
| up and start using without technical support. If you've
| just got a small team and no one wants to do data
| engineering snowflake makes that possible, or at least much
| easier. Most users are generally happy, and they've
| followed the cloud playbook of making it hard to switch
| off, so even when teams have scaled to the level where
| secondary indexes and data support staff makes sense the
| team is still happy with snowflake.
| joelthelion wrote:
| But why not create indexes? I mean, I understand why
| sometimes you're you don't want an index. But building an
| entire warehouse around the idea of "no indexes", really ?
| [deleted]
| idunno246 wrote:
| these tend to be for one-off analytical queries. you want
| ever user with flag X >10 joined against five other tables
| each with similar filters. you don't know ahead of time
| what that query is, your analyst thought of it this
| morning, so you cant make indices ahead of time. and itll
| never run again so you don't need to take the performance
| hit keeping an index. and someone has to decide which
| indices to keep, but app engineers arent best utilized
| figuring out indices for analysts.
|
| the indices is nice, but the bigger selling feature for me
| is if you have many services, and each services data are in
| the warehouse, you can join against them all together.
| buttaphingas wrote:
| It's all around the ethos of ease of use. Snowflake does a
| lot of smarts in the background so that you don't have the
| overhead of managing indexes. And not just indexes, there
| is just less human intervention required overall compared
| to something like Teradata or even a modern lakehouse.
|
| That said, they've kind of introduced it with the Search
| Optimization Service, which is like an index across the
| whole table for fast lookups, but even that is
| automatically maintained in your behalf.
| benreesman wrote:
| My experience with "Big Data" is pretty dated, 5 years at
| least. At that time I think a good cutoff for "big data"
| might have been like a petabyte +/- a factor of 10
| depending on your gear. I imagine now even 1PB is probably
| pretty mild by "big data" standards.
|
| But once you're up in that "I can't even fit this in an
| 4-8U sled" territory (whatever it is in a given decade)
| you're probably doing some kind of map/reduce thing, so
| there's a strong incentive to have a column-major layout.
| If you can periodically sort by some important column so
| much the better (log2 n binary search), but mostly you've
| got a bunch of mappers (which you work hard to get locality
| on relative to the DFS replicas where the disks live, maybe
| on the same machine, maybe in the same top-of-rack switch
| or whatever) zipping through different columns or column
| sets and producing eligible conceptual "rows" to go into
| your "shuffle/sort/reduce" pipeline to deal with joins and
| sorts and stuff like that.
|
| I don't know how Google does it, but I think most everyone
| else started with something like the Hadoop ecosystem and
| many with something like Hive/HQL to give a SQL-like way to
| express that job, especially for ad-hoc queries (long-
| lived, rarely changing overnight jobs might get optimized
| into some lower-level representation).
|
| Around the time I was getting out of that game, Spark was
| starting to get really big, which was due to some
| combination of RAM getting really abundant and just kind of
| a re-think on what was by then a pretty old cost model. I
| have no idea what people are doing now.
|
| I'd love it if someone with up-to-date knowledge about how
| this stuff works these days chimed in.
| DebtDeflation wrote:
| Snowflake is a data warehouse in the cloud. In the past,
| companies would have spent a fortune on Oracle or Teradata
| licenses and a fortune on on-prem hardware to run it on. Now
| they spend it on Snowflake and run it on AWS, etc. Same story
| as with any SaaS product - cheap and easy to get started, only
| pay for what you use, but over time the costs........get big.
| mritchie712 wrote:
| I predict[0] we'll see more people choosing Clickhouse over
| Snowflake in the next 5 years. Clickhouse will get reasonably
| feature compatible with Snowflake and give people a better escape
| hatch if they want to self-host their data stack. Clickhouse, Inc
| is building a cloud product that abstracts away the complexity
| and there's already companies like Altinity that will spin up a
| cluster for you in minutes.
|
| 0 - https://blog.luabase.com/clickhouse-for-data-nerds/
| ramesh31 wrote:
| Isn't Clickhouse a hosted SQL DBMS? Not really comparable to a
| cloud data lake.
|
| Snowflake/Databricks scales infinitely across cloud object
| stores like S3. Clickhouse is run as a single (or sharded)
| process that uses the local file system like any other SQL
| database, and requires volume provisioning as your data scales.
| It also has a fixed run cost (EC2 or wherever it's hosted)
| versus an "on-demand" model where read clusters are spun up to
| run queries against static objects that have no fixed cost
| other than storage pricing.
| morelisp wrote:
| ClickHouse can access non-local storage without issue (or at
| least, with only issues for some of them - HDFS and S3 seem
| to work fine, I've had less luck with real-time Kafka). I'm
| not sure how well it scales horizontally for such uses; you
| can hack something up with macros that isn't too painful but
| there may also be better options.
|
| However, it's probably not a great pick if you're already
| struggling with the operations side of things, which seems to
| be the main selling point for services like Snowflake.
| [deleted]
| hodgesrm wrote:
| ClickHouse only has fixed run cost if you configure it that
| way. We run ClickHouse clusters in AWS / GCS using block
| storage in our cloud platform. You can scale VMs up and down
| vertically in minutes, and scale horizontally in the same
| amount of time. The model works great for SaaS use cases that
| require constant response at all times and scale over days or
| weeks rather than minutes. Real-time analytic apps that show
| tenant dashboards or generate recommendations for users on
| ecommerce sites have this characteristic.
|
| I don't think there's really a right or wrong answer here,
| just trade-offs.
|
| Disclaimer: I work on Altinity.Cloud, a platform for managed
| ClickHouse
| KingOfCoders wrote:
| In which way not comparable?
| nycdatasci wrote:
| From the article: "JOIN's are also not nearly as performant
| as in other cloud data warehouses." This seems like a
| pretty significant limitation.
| morelisp wrote:
| That's... literally comparing them. The comparison for
| some use cases might not be favorable for ClickHouse, but
| they're comparable.
|
| (IMO the slowness of ClickHouse joins has been
| overstated, especially since its many-column table
| support is so good you'll probably be fine joining on
| insert instead.)
| mritchie712 wrote:
| Yes, this is one major hurdle they need to overcome, but
| I think they'll (Clickhouse Inc + the community) pull it
| off. It's a current weakness but by no means unsolvable.
| SnowHill9902 wrote:
| Clickhouse is incredible software. It only feels a little
| foreign when coming from Postgres (e.g. some CamelCase terms).
| mritchie712 wrote:
| Yeah, the CamelCase throws me too, especially since it's
| mixed in with snake_case (e.g. date_trunc[0])
|
| 0 - https://clickhouse.com/docs/en/sql-
| reference/functions/date-...
| zX41ZdbW wrote:
| camelCase - native functions
|
| SQL_STYLE_CASE - compatible functions
| kjw wrote:
| _" Snowflake has no incentive to push a code change that makes
| things 20% faster because that can correspond to 10-20% drop in
| short-term revenue. In a typical Innovator's Dilemma, Snowflake
| prioritizes other things that generate an ever larger menu of
| compute options, like Snowpark and data apps built on Streamlit,
| that will bleed your organization dry."_
|
| This is not true. Snowflake has done just that - it has
| continuously improved performance resulting in reduced credit
| consumption and revenue from customers on a unit compute/storage
| basis. And it has negatively impacted their revenues and stock
| price. Snowflake's incentive is to strengthen their competitive
| position and to hopefully generate more long-term revenue from
| their customers.
|
| The CFO forecasted a $97 million dollar short fall when guiding
| for 2022 revenue resulting from product improvements. Snowflake
| stock dropped immediately after.
|
| See Q4 transcript -- https://www.fool.com/earnings/call-
| transcripts/2022/03/02/sn...
|
| _" Similarly, phased throughout this year, we are rolling out
| platform improvements within our cloud deployments. No two
| customers are the same, but our initial testing has shown
| performance improvements ranging on average from 10% to 20%. We
| have assumed an approximately $97 million revenue impact in our
| full-year forecast, but there is still uncertainty around the
| full impact these improvements can have. While these efforts
| negatively impact our revenue in the near term, over time, they
| lead customers to deploy more workloads to Snowflake due to the
| improved economics."_
|
| Also see the Bloomberg article --
| https://www.bloomberg.com/news/articles/2022-03-02/snowflake....
|
| _" Snowflake Inc., a software company that helps businesses
| organize data in the cloud, dropped the most ever in a single day
| Thursday after projecting that annual product sales growth would
| slow from its previous triple-digit-percentage pace.
|
| Executives said improvements to the company's data storage and
| analysis products will let customers get the same results by
| spending less, which will hurt revenue in the short term, but
| attract more clients in the future.
|
| "The full-year impact of that next year is quite significant,"
| Chief Executive Officer Frank Slootman said on a conference call
| Wednesday after the results were released. But "when customers
| see their performance per credit get cheaper, they realize they
| can do other things cheaper in Snowflake and they move more data
| into us to run more queries.""_
| stassajin wrote:
| I'm the author of the article. Didn't expect it to blow up. Let
| me clarify a few points:
|
| 1. I like Snowflake and I think they brought several innovations
| to the field: Instant scale out/up, time-travel, unstructured
| data query support. 2. Snowflake obviously makes innovations and
| performance improvements, otherwise they would not be the market
| leader they are. But I'm also suspecting that they make just
| enough performance improvements to be at par and then use the
| vendor lock in features to make switching hard.
|
| My argument is that their rate of performance innovation has
| considerably gone down and DataBricks, Firebolt, and open source
| alternatives just seem more attractive from a cost/performance
| ratio. I agree that Snowflake is still the best data-warehouse to
| start with if you have 100k, but not if you truly plan for a
| multi-year horizon and your usage expands.
|
| - Redshift also brought a lot of innovation that allowed people
| to execute analytical queries 100x-1000x faster than any OLTP
| that existed out there. I've used Redshift for four years and
| they kept ignoring performance and features until Snowflake came
| out. All of a sudden because of competitor pressure, they put
| more effort into the product to maintain and gain market share.
| My hope is that Snowflake finds a solution to their innovator's
| dilemma, since competitors are hot on their tails.
|
| - Some people point out that 70% usage growth just shows that
| Snowflake is useful. Nobody disagrees with that. The issue is
| that majority of the companies don't experience a 70% revenue
| growth to catch up with the growth in costs. At some point, you
| have to clamp down on costs, which means that you have to look
| for alternatives to run things more efficiently.
| mejakethomas wrote:
| Totally agree with Redshift sentiments. It's been lovely seeing
| BigQuery and Redshift step their game up over the past 1.5yrs,
| because they really should have been doing certain things for
| many years prior.
|
| Re: Firebolt, I don't consider it to be in the same class as
| Snowflake whatsoever (even though their advertising seems to
| indicate otherwise). Snowflake is like a very powerful swiss
| army knife. Firebolt is good for a very specific (dare I say
| niche?) workload but falls all over itself for the vast
| majority of data org needs.
| datadisruptor wrote:
| [disclaimer: comment written by one of cofounders of iomete - a
| YC-backed startup - active in the same market as Snowflake]
|
| I think Snowflake is (still) expensive because it is a venture-
| backed enterprise software company and goes through a typical
| trajectory...
|
| Story goes like this: founders are product-driven and first
| movers -> find PMF -> need VC funding -> VCs only fund enterprise
| software ventures with 70%+ gross margins and high retention
| rates -> product/service gets priced to achieve these metrics ->
| VCs happy to fund sales & marketing machine needed to obtain
| sales growth, nobody cares about profitability until after IPO ->
| startup is everyone's darling until ~2 years after IPO.
|
| Then: economic crisis hits, customers become more price
| sensitive, competition intensifies. Plus now management is
| exposed to quarterly pressure of financial markets to deliver on
| top-line and margin expectations.
|
| Meanwhile a bunch of startups are building (lower priced)
| alternatives. Perhaps not as mature or feature-rich as Snowflake,
| but good enough for 80% of use cases that Snowflake covers.
|
| Therefore the assertion that Snowflake is not optimizing their
| product sounds a bit crazy to me. It would be optimizing for
| short-term gain, while jeopardizing its reputation as the leader
| in the space. Obtaining excessive margins through excessive
| pricing only works under monopolistic conditions or if they had a
| truly distinctive product. Both are not the case imo. Also, it's
| early days. Not exactly sure what Snowflake's market share is,
| but I bet it is < 5%.. so they haven't locked in everyone yet...
|
| I bet that Snowflake will be forced to compete "also on price" in
| the next five years because free enterprise is a powerful thing.
| The title of the article could be "Why Snowflake is (still)
| expensive but will get more affordable over the next few years"..
| wiradikusuma wrote:
| So, what is Snowflake? (I assume it's snowflake.com) From
| Googling it looks like Google's BigQuery. So it's a DB?
| brianwawok wrote:
| Ran into the same exact thing at CircleCI.
|
| Me: My builds are really slow
|
| CircleCI: Here are a few very low effort answers
|
| Me: git checkout is taking literally 60 seconds, but it takes 3
| seconds locally, why?
|
| CircleCI: Mumble Mumble.
|
| They charge per minute, so why would they care if builds are
| slow? Was about a year of this getting worse and worse, till I
| finally cancelled the service last week and built my own server
| in my basement.
|
| I know get 200% faster builds, and the hardware payback time is
| not very long (6 months of my CircleCI bill?).
|
| I think it's a huge red flag anytime the metric you care about is
| something that being "worse" makes the provider more money.
| icedchai wrote:
| At a previous startup, we dumped CircleCI and switched to
| Jenkins on our own EC2 instance. We had a lot less problems.
| (This was way back in 2016, I'm sure things have improved now.)
| brianwawok wrote:
| Yup!
|
| I ended up doing TeamCity over Jenkins, but they do the same
| thing.
|
| Amazing how fast a 32C / 64T EYPC server in my basement can
| be..
| icedchai wrote:
| I can only imagine! I have a 3950X here (16C / 32T) with
| 128 gigs of RAM and it is incredible. Total overkill for a
| home lab though.
| sremani wrote:
| From the front page of CircleCI.
| __________________________________________________ Industry-
| leading speed As soon as you think it, you can deliver it. Your
| developers' time is too important to waste. No other CI/CD
| platform takes performance as seriously as we do. Your
| pipelines should accelerate your business, not slow you down.
| __________________________________________________
|
| Rule of thumb: Anyone talking about their honesty is not
| honest.
| thexumaker wrote:
| We did the same thing but with self hosted runners with
| github actions.
|
| https://github.com/philips-labs/terraform-aws-github-runner
|
| phillips-labs has some good resources for scaling this up as
| well.
| Fatnino wrote:
| Those app rental scooters that are littered around city
| centers: you pay for distance as well as for time. And that's
| why they don't go very fast.
| gkoberger wrote:
| No, they're legally limited to 15 MPH for safety. You also
| don't pay for distance, just time. Not everything is a
| conspiracy theory.
|
| SF: https://www.williamweisslaw.com/sf-e-scooter-laws/ NYC: h
| ttps://www1.nyc.gov/html/dot/html/bicyclists/ebikes.shtml#:..
| ..
| CrazyStat wrote:
| Also because going fast on those things is fucking dangerous,
| especially when (like most people riding them) you're not
| wearing a helmet.
| djbusby wrote:
| And have been drinking (it's what I saw a lot of)
| wpietri wrote:
| Absolutely. I don't have a problem with most scooter
| owners, but here in a city with a lot of tourism, the
| rental scooters are often a menace. I was waiting outside
| of a restaurant and took half a step back to let some
| people through. I brushed against something moving fast,
| and it was some tall yahoo going downhill at max speed on a
| scooter. If I'd taken a full step back, somebody would have
| needed medical treatment.
|
| Learning how to ride one of those in a city takes time,
| practice, and thought. Which you will surely get if you buy
| one. But apparently not so for the rentals.
| SkyMarshal wrote:
| In that particular case another reason could be that they
| quite reasonably don't want you going very fast for safety
| and liability concerns.
| mikewhy wrote:
| I love that CircleCI flaunts it's speed compared to other
| providers, meanwhile we can clearly see the CircleCI steps take
| the longest in our builds.
|
| Not to mention the constant failures.
| hangonhn wrote:
| 100% and not just in tech: when a party's incentives aren't
| aligned with yours, you'll often find yourself getting little
| help or even working in opposition to each other. We recently
| experienced this with filing for a health related insurance
| claim. My wife wondered why they kept losing stuff, not doing
| what they promised, or asking for more paper work. I kept
| explaining to her that while not necessarily malicious, they
| have very little incentive to improve that department.
|
| Always try to find partners or counter parties who win when you
| do as well. I know we don't always have that luxury but
| sometimes a little headache initially is better than being
| stuck with someone who works in opposition to you in the long
| run.
|
| Thanks so much for sharing your story. We are in the process of
| outsourcing some of our Jenkins functionality and these stories
| are useful to hear.
| josephcsible wrote:
| > They charge per minute, so why would they care if builds are
| slow?
|
| It's worse than just not caring: they have a direct financial
| incentive to make sure your builds are as slow as you'll
| tolerate.
| KingOfCoders wrote:
| I have no Snowflake experience, but some limited BigQuery
| experience. And it's very easy for a small company to get to
| $100k/year bills without massive data.
| mejakethomas wrote:
| Completely agree. Currently staring at 700k+ BigQuery costs
| annually and accomplished MUCH more with Snowflake at the same
| price.
| tootie wrote:
| Anytime your cloud spend with a single vendor starts to get out
| of hand, you just call and negotiate. If you make a multi-year
| commitment, they'll apply a substantial discount. Also,
| $100k/yr is still cheap compared to the cost of developers. Not
| just in terms of actual price tag, but risk management because
| a SaaS won't quit for a better offer.
| dotopotoro wrote:
| So you dont need developers when you use SaaS?
| yazaddaruvala wrote:
| If you need to hire 1 more developer at $100k to help
| maintain your data warehouse or pay $100k for Snowflake or
| BQ, its a no-brainer to use SaaS.
|
| Also humans cost more than their salary: Recruiting,
| management, benefits, attrition, vacation, the risk that
| they are just not capable.
|
| A human will also cost you more year over year (raises,
| promotions, etc), SaaS will typically cost you less year
| over year (optimizations, negotiations, competition, etc).
| dominotw wrote:
| they should switch to flat rate billing capped at slots they
| are willing to pay for.
| spullara wrote:
| Snowflake increases performance all the time and their customers
| just use more of it.
| wigster wrote:
| ramesh31 wrote:
| I'm of the mind that Snowflake and Databricks are losing their
| value prop now that Delta Lake is open source and Iceberg is
| maturing. What's to stop me from rolling my own Spark clusters
| and just using one of those? Is anyone doing this?
| nemothekid wrote:
| > _What 's to stop me from rolling my own Spark clusters and
| just using one of those? Is anyone doing this?_
|
| Ops. Unless your core competency is running reports and spark
| nodes, it's probably cheaper to outsource the management of
| Spark and friends than to hire people to make sure it's always
| up and running. To be fair I haven't touched Spark in many
| years but having to page someone who was good enough to spark
| to debug why a job stopped at 3am isn't fun.
| ramesh31 wrote:
| >Ops. Unless your core competency is running reports and
| spark nodes, it's probably cheaper to outsource the
| management of Spark and friends than to hire people to make
| sure it's always up and running.
|
| I think as an end user I would absolutely agree on this
| point. But many companies use Databricks as part of their
| automated backend systems that they resell to customers. The
| cost per "DBU" unit is astronomical for the amount of raw
| compute in use. It feels a bit like running a restaurant
| where you serve takeout.
| nojito wrote:
| I can spin up and down 100+ node clusters on the 4 largest
| cloud providers at will.
|
| What ops am I missing?
| joshhart wrote:
| [Disclaimer: Databricks employee] There's also a lot of value
| in DBSQL, Unity catalog (data management), and serverless for
| autoscaling that can all save money in terms of just running
| raw Spark. But if you want to operate Spark yourself, cool do
| it. We're happy for that, it builds the base of Spark
| committers over time and increases the quality of our
| products.
| eximius wrote:
| You'll find plenty of the customer base of Databricks used to
| run their own clusters.
|
| It's a tradeoff. It might cost less dollars but more time. The
| time and expertise to run their own clusters effectively is not
| something every org can or desires to do.
| buttaphingas wrote:
| And to get the very best price for those clusters your you'd
| need to commit to the CSP for three years!
|
| Would love to know the TCO trade-off between procuring,
| securing and deploying on your own clusters vs having them
| managed via SaaS.
| throw8383833jj wrote:
| it all comes down to the cost of switching and willingness of
| users to switch. the higher the cost of switching the higher you
| can make your product's price. Otherwise, with an extremely low
| cost of switching, the cost will ultimately be driven to near
| zero as more and more competitors enter the landscape.
| YouWhy wrote:
| I often analyze tools as reduction from the space of problems x
| resources to the space of outcomes.
|
| Let's consider Snowflake in this paradigm
|
| - Problems: analytics on data that is not laid out in a way
| that's directly accessible for analysts.
|
| - Resources: SQL analysts, few or no competent data engineers,
| spare cash
|
| - Outcomes: run analytics at an industrial scale without
| requiring competent engineers or DevOps.
|
| Since Snowflake's optimal client gets very easily locked in, it
| follows up that saving said client's money is not something even
| the client would care about
| [deleted]
| dstola wrote:
| "optimization gremlin" = dark-pattern to take as much money away
| from you as possile
| tablespoon wrote:
| > RevOps management
|
| And now "XxxOps" is a meaningless buzzword.
| [deleted]
| dboreham wrote:
| Because someone needs a new boat?
| shrimalpreeti wrote:
| [Disclaimer: I work for a company that offers a Snowflake Cost
| Optimizer product] We're an open-source monitoring & alerting
| tool and many of our users were using it to set alerts on their
| warehousing (Snowflake) costs. The problem with Snowflake is
| particularly worse due to its lack of query level attribution of
| costs and no in-built features for monitoring or recommendations
| on improvements. We're building a Snowflake Cost Optimizer
| (https://www.chaosgenius.io/snowflake-cost-optimizer.html) and
| are hearing the same feedback from our customers as the author
| mentions. Snowflake is definitely coming up with features towards
| better cost transparency but I wonder if it's too little too
| late.
| evtx wrote:
| In my experience Snowflake is very receptive to enhancement
| requests. If you feel Snowflake should be doing something
| better for surfacing optimizations, I'd ask them.
|
| That said, I'm not sure your comment is fully accurate: 1)
| "lack of query level attribution of costs" Snowflake doesn't
| charge per query so there can't be default query level
| attribution of cost. Snowflake charges by second of warehouse
| use. But you CAN easily see which queries ran on which
| warehouse and allocate costs back to that using your own
| criteria (by query second, usually better than by number of
| queries). 2) "no in-built features for monitoring" Snowflake
| has built in cost monitoring dashboards:
| https://docs.snowflake.com/en/user-guide/cost-overview.html And
| resource monitors: https://docs.snowflake.com/en/user-
| guide/resource-monitors.h...
|
| That said, I'm sure improvements could be made. Ask for them.
| There must be a market for this because Capital One and
| Acceldata and others offer similar solutions for optimization
| recommendations.
| mejakethomas wrote:
| This. Snowflake introspection five years ago looks very, very
| different than today. Mostly due to enhancement requests.
___________________________________________________________________
(page generated 2022-08-22 23:01 UTC)