[HN Gopher] Show HN: Serverless OLAP with Seafowl and GCP
___________________________________________________________________
Show HN: Serverless OLAP with Seafowl and GCP
Hello HN! I'm an engineer at Splitgraph and recently started
learning Rust so I could make my first contribution to Seafowl [0],
an early stage analytical database. Along the way I figured out a
database hosting hack on GCP and wanted to share it with HN. It's a
way to achieve "true" scale to zero database hosting that could be
useful for certain side projects or spiky traffic situations. A
recurring problem I've faced with side projects is the need for
Postgres, but no desire to deploy or maintain new instances. So
when I learned GCP's "always free" tier includes serverless [1] I
got curious to see if I could run a database. While a lot of
classic databases aren't usually a great fit for serverless,
Seafowl separates compute, storage and catalog (catalog == a SQLite
file of metadata). [2] Last month I was able to introduce GCS
bucket compat to Seafowl, which enabled me to mount the catalog via
gcsfuse (i.e. an adapter that allows attaching GCS buckets to local
filesystems). Upshot: while FUSE does add HTTP requests to
container startup, init time remains comparatively quick, even cold
starts, because fetching is limited to the single catalog SQLite
file only. With this approach you get a URL you can query directly
from your FE if you want, e.g. fetch() can send SELECT * ...
queries straight from your users' browser. You could plot a graph
from a static React frontend, or observablehq.com editor, with no
persistent backend needed. So at times when nobody's using your
app, 100% of your stack can scale to zero with obvious cloud spend
advantages. And even if you exceed free tier limits, being PAYG
offers a good chance you'll come out ahead on hosting costs anyway.
NB: Seafowl is an early stage project, so it's not really suitable
if you need transactions or fast single-row writes. Otherwise, this
could be a nice way to get free database hosting at a big 3 cloud
provider, especially for e.g. read-only analytical reporting
queries. Feedback and suggestions are appreciated. Hope it helps
you! More available if you want [3]. [0]
https://seafowl.io/docs/getting-started/introduction [1]
https://cloud.google.com/run/pricing#cpu-requests [2] Neon is
another interesting project that separates compute and storage.
https://neon.tech/blog/architecture-decisions-in-neon One issue I
observed was a noticeably longer startup time vs this FUSE
approach, which I believe may be related to Postgres connection
setup time/roundtrips. Looking forward to trying Neon again in
future. [3] https://www.splitgraph.com/blog/deploying-serverless-
seafowl
Author : paws
Score : 44 points
Date : 2023-06-06 16:18 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| lee101 wrote:
| [dead]
| [deleted]
| aleda145 wrote:
| I use seafowl hosted on Cloud Run for a side project for Swedish
| Real Estate data. Around a million rows, seafowl works great!
|
| One killer feature (aside from scaling to zero) is that the
| queries can be constructed as GET requests. That means we can
| cache the query results with cloudflare.
|
| I have it exposed here if you want to write some SQL and check it
| out live: https://bostadsbussen.se/sold/query
| paws wrote:
| Glad to hear you've had good results! Yes, @mildbyte did a
| great job making Seafowl comply with HTTP cache semantics (i.e.
| Etags/Cache-Control), and it should give good results for both
| CDNs and browsers. When building Open Data Monitor [0] I
| certainly observed some nice speed ups.
|
| For those interested in how caching works (i.e. if your dataset
| is public it could be an easy win) more info is in the docs [1]
|
| [0] https://open-data-monitor.splitgraph.io/week/2023-05-22
|
| It's a Socrata scraper that renders diffs of public/government
| datasets
|
| [1] https://seafowl.io/docs/getting-started/tutorial-fly-
| io/part...
___________________________________________________________________
(page generated 2023-06-07 23:01 UTC)