[HN Gopher] Launch HN: Baselit (YC W23) - Automatically Reduce S...
___________________________________________________________________
Launch HN: Baselit (YC W23) - Automatically Reduce Snowflake Costs
Hey HN! We are Baselit (https://baselit.ai/), a tool that
automatically optimizes Snowflake costs. Here's a demo video:
https://www.youtube.com/watch?v=Ls6VRzBQ-pQ. Snowflake is one of
the most widely used data warehouses today. It abstracts out the
underlying compute infrastructure into "warehouses" for the user -
compute units with t-shirt sizes (X-Small, Small, Medium etc.). In
general, if you want to lower your data processing costs, the only
thing you can do is to just process less data (i.e. query
optimization). But Snowflake's warehouse abstraction allows an
extra dimension along which you can optimize - by minimizing the
compute you need to process that same data (i.e. warehouse
optimization). Baselit automates Snowflake warehouse optimization
for you. While we were working on another idea last year (AI for
SQL generation), users frequently shared with us how Snowflake
costs had become a top concern, and cost optimization was now their
business priority. Every few months, they would manually look for
opportunities to cut down on costs (removing workloads or
optimizing queries) - a time consuming process. We decided to build
a solution that could automate cost optimization, and complement
the manual effort of data teams. There are two key components of
Baselit: 1. Automated agents that cut down on warehouse idle time.
This happens in one of two ways: cache optimization (when to
suspend a warehouse vs letting it run idle) and cluster
optimization (optimal spin down of clusters). You can easily find
out how much these agents can save you. Here's a SQL query that you
can run on your Snowflake, and it will calculate your savings -
https://baselit.ai/docs/savings-estimate 2. Autoscaler that lets
you create custom scaling policies for multi-cluster warehouses
based on SLAs. Snowflake's default policies (Economy and Standard)
are not cost optimal in most cases, and they don't give you any
control. One use case for Autoscaler is that it helps you
efficiently merge several warehouses into one multi-cluster
warehouse - with a custom scaling policy that is optimal for a
particular type of workload. In Autoscaler, you can set a parameter
called "Allowed Queuing Time" that controls how fast a new cluster
should spin up. For e.g. if you want to merge transformation
workloads, you might want to set a higher queuing time. Baselit
will slow down the cluster spin up, ensuring all clusters are
running at a high utilization, and you'll see a reduction in costs.
We've built a bunch of other features that help in optimizing
Snowflake costs: a dbt optimization feature that automatically
picks the right warehouse size for dbt models through constant
experimentation, a "cost lineage", spend views by
teams/roles/users, and automatic recommendations from scanning
Snowflake metadata. Due to the nature of our product (access to
Snowflake metadata required), we haven't made Baselit self-serve
yet. We invite you to run our savings query
(https://baselit.ai/docs/savings-estimate) and find out your
potential savings. And if you'd like to know more about any of our
features and get a live demo, you can book one at this link -
https://calendly.com/baselit-sahil/baselit-demo We'd love to read
your feedback and ideas on Snowflake optimization!
Author : sahil_singla
Score : 28 points
Date : 2024-05-08 15:54 UTC (7 hours ago)
| iknownthing wrote:
| Does it use AI?
| sahil_singla wrote:
| No AI yet - all algorithms are deterministic under the hood.
| Although we are considering tinkering with LLMs for query
| optimization, as part of our roadmap.
| candiddevmike wrote:
| What happened to your other idea?
| mritchie712 wrote:
| not OP, but for us, LLM's just aren't good enough yet to write
| analytical SQL queries (and they may never be good enough using
| pure SQL). Some more context here:
| https://news.ycombinator.com/item?id=40300171
| sahil_singla wrote:
| +1. We came to a similar conclusion when we were working on
| this idea.
| datadrivenangel wrote:
| Productizing cost optimization experience! Great to see more
| options in this space, as so many companies are surprised by the
| costs of cloud.
|
| For the warehouse size experimentation, how do you value
| processing time?
| sahil_singla wrote:
| We optimize warehouse sizes for a dbt project as a whole. Users
| can set a maximum project runtime as one of the parameters for
| experimentation. The optimization honors this max runtime while
| tuning warehouse sizes for individual models.
| mritchie712 wrote:
| We (https://www.definite.app/) were also working on AI for SQL
| generation. I can see why you pivoted, it doesn't really work! Or
| at least well enough to displace existing BI solutions.
|
| edit: context below is mostly irrelevant to snowflake cost
| optimization, but relevant if you're interested in the AI for SQL
| idea...
|
| I'm pretty hard headed though, so we kept going with it and the
| solution we've found is to run the entire data stack for our
| customers. We do ETL, spin up a warehouse (duckdb), a semantic
| layer (cube.dev) and BI (dashboards / reports).
|
| Since we run the ETL, we know exactly what all the data means
| (e.g. we know what each column coming from Stripe really means).
| All this metadata flows into our semantic layer.
|
| LLM's aren't great at writing SQL, but they're really good at
| writing semantic layer queries. This is for a couple reasons:
|
| 1. better defined problem space (you're not feeding the LLM
| irrelevant context from a sea of tables)
|
| 2. the query format is JSON, so we can better control the LLM's
| output
|
| 3. the context is richer (e.g. instead of table and column names,
| we can provide rich, structured metadata)
|
| This also solves the Snowflake cost issue from a different
| angle... we don't use it. DuckDB has the performance of Snowflake
| for a fraction of the cost. It may not scale as well, but 99% of
| companies don't need the sort of scale Snowflake pitches.
| iknownthing wrote:
| Kind of surprised to hear that given the number of companies
| I've seen pitching natural language to SQL queries.
| michaelmior wrote:
| How does this differ from Keebo?
|
| https://keebo.ai/
| sahil_singla wrote:
| We are different from Keebo in the way we approach warehouse
| optimization. Keebo seems to dynamically change the size of a
| warehouse - we have found that to be somewhat risky, especially
| when it's downsizing. Performance can take a big hit in this
| case. So we've approached this problem in two ways:
|
| 1. Route queries to the right-sized warehouse instead of
| changing the size of a particular warehouse itself. This is
| part of our dbt optimizer module. This ensures that performance
| stays within acceptable limits while optimizing for costs.
|
| 2. Baselit's Autoscaler optimally manages the scaling out of a
| multi-cluster warehouse depending on the load, which is more
| cost effective than upsizing the warehouse.
| kwillets wrote:
| I was thinking about an AI to feed you the proper Snowflake sales
| pitch each time a query runs expensive or fails a benchmark. At
| my previous org it could replace several headcount.
| ukd1 wrote:
| How does this differ from https://espresso.ai ?
| sahil_singla wrote:
| They are more focused on query optimizations whereas we do
| warehouse optimizations. We are inclined towards warehouse
| optimizations due to it being completely hands-off.
| ukd1 wrote:
| cool - they're kinda complimentary?
| sahil_singla wrote:
| Yeah, kind of.
|
| Though I'm not exactly sure how their product works, I saw
| from the landing page that it's broadly focused on query
| optimization.
|
| We've done a lot of experimentation with query
| optimizations, both with and without LLMs, and we don't
| think it's possible to build a fully automated solution.
| However, a workflow solution might be feasible.
| karamazov wrote:
| Chiming in, I'm one of the founders of Espresso AI - we do both
| query optimization and warehouse optimization, both of which
| are hands-off. In particular we're beta-testing a fully-
| automated solution for query optimization (it's taken a lot of
| engineering!).
|
| Based on the responses here I think we're a superset of where
| baselit is today, but I could be wrong.
| mustansirm wrote:
| Not a Snowflake user, but I'm curious as to your business model.
| What barriers are there to prevent Snowflake from reverse
| engineering your work and including it as part of their native
| experience? Is the play here an eventual aquisition?
| jaggederest wrote:
| It has been my experience working on similar projects for
| cutting down e.g. aws spend that the primary billers often have
| a really hard time accepting or incorporating bill-reducing
| features. All the incentives they have are geared to want
| increased spend, regardless of the individual preferences of
| any members of the company, and so that inertia is really hard
| to overcome.
| sahil_singla wrote:
| That resonates with what we have heard from our customers.
| sahil_singla wrote:
| Our belief is that building a good optimization tool is not
| aligned with Snowflake's interests. Instead they seem to be
| more focused on enabling new use cases and workloads for their
| customers (their AI push, for example, with Cortex). On the
| other hand, helping Snowflake users cut down costs is our
| singular focus.
| bluelightning2k wrote:
| It's not really in their interests?
| fock wrote:
| or to phrase it differently: what kind of market is this, where
| big companies are herded into tarpits of SaaS which apparently
| have exactly the same problems as running it the old way had
| (namely inefficient usage of ressource). Just now you have to
| pay some symbiotic start-up instead of hiring some generic
| performance-person.
___________________________________________________________________
(page generated 2024-05-08 23:00 UTC)