[HN Gopher] Show HN: Arc - high-throughput time-series warehouse...
___________________________________________________________________
Show HN: Arc - high-throughput time-series warehouse with DuckDB
analytics
Hi HN, I'm Ignacio, founder at Basekick Labs. Over the past months
I've been building Arc, a time-series data platform designed to
combine very fast ingestion with strong analytical queries. What
Arc does? Ingest via a binary MessagePack API (fast path),
Compatible with Line Protocol for existing tools (Like InfluxDB,
I'm ex Influxer), Store data as Parquet with hourly partitions,
Query via DuckDB engine using SQL Why I built it: Many systems
force you to trade retention, throughput, or complexity. I wanted
something where ingestion performance doesn't kill your analytics.
Performance & benchmarks that I have so far. Write throughput:
~1.88M records/sec (MessagePack, untuned) in my M3 Pro Max (14
cores, 36gb RAM) ClickBench on AWS c6a.4xlarge: 35.18 s cold, ~0.81
s hot (43/43 queries succeeded) In those runs, caching was disabled
to match benchmark rules; enabling cache in production gives ~20%
faster repeated queries I've open-sourced the Arc repo so you can
dive into implementation, benchmarks, and code. Would love your
thoughts, critiques, and use-case ideas. Thanks!
Author : ignaciovdk
Score : 16 points
Date : 2025-10-07 16:40 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| leakycap wrote:
| Did you consider confusion with the Arc browser and still go with
| the name, or were you calling this Arc first and decided to just
| stick with it?
| ignaciovdk wrote:
| Hey, good question!
|
| I didn't really worry about confusion since this isn't a
| browser, it's a completely different animal.
|
| The name actually came from "Ark", as in something that stores
| and carries, but I decided to go with Arc to avoid sounding too
| biblical.
|
| The deeper reason is that Arc isn't just about ingestion; it's
| designed to store data long-term for other databases like
| InfluxDB, Timescale, or Kafka using Parquet and S3-style
| backends that scale economically while still letting you query
| everything with SQL.
| nozzlegear wrote:
| Didn't that browser get mothballed by its devs?
| bl4kers wrote:
| The browser is dead anyway
| simlevesque wrote:
| I'll try this right now. I'm looking to self-host duckdb because
| MotherDuck is way too expensive.
| ignaciovdk wrote:
| Awesome, would love to hear what you think once you try it out!
|
| If it's not too much trouble, feel free to share feedback at
| ignacio [at] basekick [dot] net.
| Nesco wrote:
| Arc Browser, Arc Prize, Arc Institute and now the Arc Warehouse
|
| I am afraid "Arc" became too fashionable this decade and using it
| might decrease brand visibility
| whalesalad wrote:
| > Arc Core is designed with MinIO as the primary storage backend
|
| Noticing that all the benchmarking is being done with MinIO which
| I presume is also running alongside/locally so there is no
| latency and it will be roughly as fast as whatever underlying
| disk its operating from.
|
| Are there any benchmarks for using _actual_ S3 as the storage
| layer?
|
| How does Arc decide what to keep hot and local? TTL based?
| Frequency of access based?
|
| We're going to be evaluating Clickhouse with this sort of hot
| (local), cold (S3) configuration soon
| (https://clickhouse.com/docs/guides/separation-storage-comput...)
| but would like to evaluate other platforms if they are relevant.
| ignaciovdk wrote:
| Hey there, great questions.
|
| The benchmarks weren't run on the same machine as MinIO, but on
| the same network, connected over a 1 Gbps switch, so there's a
| bit of real network latency, though still close to local-disk
| performance.
|
| We've also tried a true remote setup before (compute around
| ~160 ms away from AWS S3). I plan to rerun that scenario soon
| and publish the updated results for transparency.
|
| Regarding "hot vs. cold" data, Arc doesn't maintain separate
| tiers in the traditional sense. All data lives in the
| S3-compatible storage (MinIO or AWS S3), and we rely on caching
| for repeated query patterns instead of a separate local tier.
|
| In practice, Arc performs better than ClickHouse when using S3
| as the primary storage layer. ClickHouse can scan faster in
| pure analytical workloads, but Arc tends to outperform it on
| time-range-based queries (typical in observability and IoT).
|
| I'll post the new benchmark numbers in the next few days, they
| should give a clearer picture of the trade-offs.
___________________________________________________________________
(page generated 2025-10-07 23:01 UTC)