[HN Gopher] Jepsen: Bufstream 0.1
___________________________________________________________________
Jepsen: Bufstream 0.1
Author : aphyr
Score : 172 points
Date : 2024-11-12 14:12 UTC (8 hours ago)
(HTM) web link (jepsen.io)
(TXT) w3m dump (jepsen.io)
| pmdulaney wrote:
| What is this software used for? Instrumentation? Black boxes?
| smw wrote:
| It's a kafka clone. Kafka is a durable queue, mostly.
| sureglymop wrote:
| What is a durable queue and why is it needed (instead of a
| traditional relational db)?
| cyberax wrote:
| RDBs suck for many-to-many high-availability messaging.
| baq wrote:
| Jepsen is for making you cry if you didn't know they're testing
| the database you develop. Of course, those are tears of joy,
| because having jepsen's attention is an achievement in itself.
| SahAssar wrote:
| You describe jepsen as if it's BDSM for databases, which
| might not be too far off.
| refset wrote:
| _> The Kafka transaction protocol is fundamentally broken and
| must be revised._
|
| Ouch. Great investigation work and write-up, as ever!
| c2xlZXB5Cg1 wrote:
| Not to be confused with https://www.warpstream.com/
| perezd wrote:
| Correct. WarpStream doesn't even support transactions.
| jcgrillo wrote:
| Neither does any other Kafka protocol implementation,
| evidently ;)
| akshayshah wrote:
| Zing! Can't lose if you don't play ;)
| williamdclt wrote:
| I'm very surprised by this:
|
| > [with the default enable.auto.commit=true] Kafka consumers may
| automatically mark offsets as committed, regardless of whether
| they have actually been processed by the application. This means
| that a consumer can poll a series of records, mark them as
| committed, then crash--effectively causing those records to be
| lost
|
| That's never been my understanding of auto-commit, that would be
| a crazy default wouldn't it?
|
| The docs say this:
|
| > when auto-commit is enabled, every time the poll method is
| called and data is fetched, the consumer is ready to
| automatically commit the offsets of messages that have been
| returned by the poll. If the processing of these messages is not
| completed before the next auto-commit interval, there's a risk of
| losing the message's progress if the consumer crashes or is
| otherwise restarted. In this case, when the consumer restarts, it
| will begin consuming from the last committed offset. When this
| happens, the last committed position can be as old as the auto-
| commit interval. Any messages that have arrived since the last
| commit are read again. If you want to reduce the window for
| duplicates, you can reduce the auto-commit interval
|
| I don't find it amazingly clear, but overall my understanding
| from this is that offsets are committed _only_ if the processing
| finishes. Tuning the auto-commit interval helps with duplicate
| processing, not with lost messages, as you'd expect for at-least-
| once processing.
| aphyr wrote:
| It is a little surprising, and I agree, the docs here are not
| doing a particularly good job of explaining it. It might help
| to ask: if you don't explicitly commit, how does Kafka know
| when you've processed the messages it gave you? It doesn't! It
| assumes any message it hands you is instantaneously processed.
|
| Auto-commit is a bit like handing someone an ice cream cone,
| then immediately walking away and assuming they ate it.
| Sometimes people drop their ice cream immediately after you
| hand it to them, and never get a bite.
| dangoodmanUT wrote:
| This, it has no idea that you processed the message. It
| assumes processing is successful by default which is
| cosmically stupid.
| williamdclt wrote:
| > if you don't explicitly commit, how does Kafka know when
| you've processed the messages it gave you?
|
| I did expect that auto-commit still involved an explicit
| commit. I expected that it meant that the consumer side would
| commit _after_ processing a message/batch _if_ it had been >=
| autocommit_interval since the last commit. In other words,
| that it was a functionality baked into the Kafka client
| library (which does know when a message has been processed by
| the application). I don't know if it really makes sense, I
| never really thought hard about it before!
|
| I'm still a bit skeptical... I'm pretty sure (although not
| positive) that I've seen consumers with autocommit being
| stuck because of timeouts that were much greater than the
| autocommit interval, and yet retrying the same message in a
| loop
| aphyr wrote:
| Here's a good article from New Relic on the problem, if
| you'd like more detail: https://newrelic.com/blog/best-
| practices/kafka-consumer-conf...
|
| Or here, you can reproduce it yourself using the Bufstream
| or Redpanda/Kafka test suite. Here's a real quick run I
| just dashed off. You can watch it skip over writes: https:/
| /gist.github.com/aphyr/1af2c4eef9aacde7f08f1582304908...
|
| lein run test --enable-auto-commit --bin
| bufstream-0.1.3-rc.12 --time-limit 30 --txn --final-time-
| limit 1/10000
| justinsaccount wrote:
| Weird, I would have guessed that it auto commits the previous
| batch when it polls for the next batch, meaning it would be
| like loop: messages = poll() # poll
| returns new messages and commits previous batch
| process(messages)
|
| but it sounds like it "poll returns new messages and
| immediately commits them."
| williamdclt wrote:
| Information on the internet about this seems unreliable,
| confusing and contradictory... It's crazy for something so
| critical, especially when it's enabled by default.
| jakewins wrote:
| Auto commit has always seemed super shady. Manual commit I
| have assumed is safe though - something something vector
| clocks - and it'd be really interesting to know if that trust
| is misplaced.
|
| What is the process and cost for having you do a Jepsen test
| for something like that?
| aphyr wrote:
| You'll find lots about the Jepsen analysis process here:
| https://jepsen.io/services/analysis
| th0ma5 wrote:
| It is my understanding that the reason why this is is high
| performance situations. You have some other system that can
| figure out if something fail, but with this feature you can
| move the high water mark so that you don't have to redo as
| much. But if you got the timing right and there is a failure
| you can go ahead and assume that when you restart again you'll
| be getting some stuff that you already processed. The problem
| is when you don't have this for mailing before the auto commit.
| It is meant to be done far after processing in my reading of
| it, but it does certainly seem like there's a contradiction
| that it should auto commit but only stuff so many milliseconds
| before the auto commit time?
| kevstev wrote:
| It is a bit of splitting hairs in some sense, but the key
| concept here is just because the message was delivered to the
| Kafka client successfully, does not mean it was processed by
| the application.
|
| You will have to explicitly ack if you want that guarantee. For
| a concrete example, lets say all you do with a message is write
| it to a database. As soon as that message is in your client
| handler callback, that message is ack'ed. But you probably only
| want that ack to happen after a successful insert into the DB.
| The most likely scenario here to cause unprocessed messages is
| that the DB is down for whatever reason (maybe a network link
| is down, or k8s or even a firewall config now prevents you from
| accessing), and at some point during this your client goes
| down, maybe by an eng attempting a restart to see if the
| problem goes away.
| necubi wrote:
| I can maybe give some justification for why this feature
| exists. It's designed for synchronous, single-threaded
| consumers which do something like this: loop {
| 1. Call poll 2. Durably process the messages }
|
| I think a point of confusion here is that the auto-commit check
| happens on the next call to poll--not asynchronously after the
| timeout. So you should only be able to drop writes if you are
| storing the messages without durably processing them (which
| includes any kind of async/defer/queues/etc.) before calling
| poll again.
|
| (I should say--this is the documented behavior for the Java
| client library[0]--it's possible that it's not actually the
| behavior that's implemented today.)
|
| The Kafka protocol is torn between being high-level and low-
| level, and as a result it does neither particularly well. Auto
| commit is a high-level feature that aims to make it easier to
| build simple applications without needing to really understand
| all of the moving pieces, but obviously can fail if you don't
| use it as expected.
|
| I'd argue that today end users shouldn't be using the Kafka
| client directly--use a proper high level implementation that
| will get the details right for you (for data use cases this is
| probably a stream processing engine, for application use cases
| it's something like a duration execution engine).
|
| [0]
| https://kafka.apache.org/32/javadoc/org/apache/kafka/clients...
| --
| bobnamob wrote:
| > We would like to combine Jepsen's workload generation and
| history checking with Antithesis' deterministic and replayable
| environment to make our tests more reproducible.
|
| For those unaware, Antithesis was founded by some of the folks
| who worked on FoundationDB - see
| https://youtu.be/4fFDFbi3toc?si=wY_mrD63fH2osiU- for some of
| their handiwork.
|
| A Jepsen + Antithesis team up is something the world needs right
| now, specifically on the back of the Horizon Post Office scandal.
|
| Thanks for all your work highlighting the importance of db safety
| Aphyr
| bobnamob wrote:
| Furthermore, I'm aware of multiple banks currently using Kafka.
| One would hope that they're not using it in their core banking
| system given Kyle's findings
|
| Maybe they'd be interested in funding a Jepsen ~attack~
| experiment on Kafka
| kevstev wrote:
| As someone who was very deep into Kafka in the not too
| distant past, I am surprised I have no idea what you are
| referring to- can you enlighten me?
| diggan wrote:
| Read the "Future Work" section in the bottom of the post
| for the gist, and also section 5.3.
| kevstev wrote:
| I see. I never trusted transactions and advised our app
| teams to not rely on them, at least without outside
| verification of them.
|
| The situation is actually far worse with any client
| relying on librdkafka. Some of this has been fixed, but
| my company found at least a half dozen bugs/uncompleted
| features in librdkafka, mostly around retryable errors
| that were sometimes ignored, sometimes caused exceptions,
| and other times just straight hung clients.
|
| Despite our company leaning heavily on Confluent to force
| librdkafka to get to parity with the Java client, it was
| always behind, and in general we started adopting a
| stance of not implementing any business critical
| functions on any feature implemented in the past year or
| major release.
| EdwardDiego wrote:
| Yeah, Confluent has really dropped the ball on librdkafka
| dev of late. :(
| philprx wrote:
| I didn't find the GitHub project for bufstream... Any clue?
| aphyr wrote:
| Ack, pardon me. That should be fixed now!
| xiasongh wrote:
| Found this on their website https://github.com/bufbuild/buf
| bpicolo wrote:
| They seem to have pivoted from protobuf tools to kafka
| alternatives. I don't think bufstream is OSS (yet). Or at
| least, they have very much de-emphasized their original
| offering on their site.
| perezd wrote:
| Nope! We're still heavily investing in scaling Protobuf. In
| fact, our data quality guarantees built into Bufstream are
| powered by Protobuf! This is simply an extension of what we
| do...Connect RPC, Buf CLI, etc.
|
| Don't read too much into the website :)
| bpicolo wrote:
| Good to know. Good proto tooling is still high value :)
| mdaniel wrote:
| I don't think bufstream itself is open source but there's
| https://github.com/bufbuild/bufstream-demo which may be close
| to what you want _(but is also unlicensed, bizarrely)_
| perezd wrote:
| That's correct. Bufstream is not open source, but we do have
| a demo that you can try. I've asked the team to include a
| proper LICENSE file as well. Thanks for catching that!
| diggan wrote:
| > While investigating issues like KAFKA-17754, we also
| encountered unseen writes in Kafka. Owing to time constraints we
| have not investigated this behavior, but unseen writes could be a
| sign of hanging transactions, stuck consumers, or even data loss.
| We are curious whether a delayed Produce message could slide into
| a future transaction, violating transactional guarantees. We also
| suspect that the Kafka Java Client may reuse a sequence number
| when a request times out, causing writes to be acknowledged but
| silently discarded. More Kafka testing is warranted.
|
| Seems like Jepsen should do another Kafka deep-dive. Last time
| was in 2013 (https://aphyr.com/posts/293-call-me-maybe-kafka,
| Kafka version 0.8 beta) and seems like they're on the verge of
| discovering a lot of issues in Kafka itself. Things like "causing
| writes to be acknowledged but silently discarded" sounds very
| scary.
| aphyr wrote:
| I would love to do a Kafka analysis. :-)
| jwr wrote:
| I'm still hoping Apple (or Snowflake) will pay you to do an
| analysis of FoundationDB...
| tptacek wrote:
| I do too, but doesn't FDB already do a lot of the same kind
| of testing?
| kasey_junk wrote:
| They are famous for doing simulation testing.
| https://antithesis.com/ Have recently brought to market a
| simulation testing product.
| SahAssar wrote:
| I think they do similar testing, and therefore it might
| be even more interesting to read what Kyle thinks of
| their different approaches to it.
| jwr wrote:
| Yes. But going through Jepsen and surviving is different.
| Gives an entirely new reputation to a database.
| kasey_junk wrote:
| I don't think ayphr would disagree with me when I say
| that FDB's testing regime is the gold standard and Jepsen
| is trying to get there, not the other way around.
| aphyr wrote:
| I'm not sure. I've worked on a few projects now which
| employed simulation testing and passed, only to discover
| serious bugs using Jepsen. State space exploration and
| oracle design are hard problems, and I'm not convinced
| there's a single, ideal path for DB testing that subsumes
| all others. I prefer more of a "complete breakfast"
| approach.
| kasey_junk wrote:
| Has your opinion changed on that in the last few years? I
| could have sworn you were on record as saying this about
| foundation in the past but I couldn't find it in my
| links.
| diggan wrote:
| Not that I'm a Kafka user, but I greatly appreciate your
| posts, so thank you :)
|
| Maybe Kafka users should do a crowdfund for it if the
| companies aren't willing. Realistically, what would the goal
| of the crowdfund have to be for you to consider it?
| monksy wrote:
| I would love to read your Kafka analysis
| didip wrote:
| Has Kyle reviewed NATS Jetstream? I wonder what he thinks of it.
| aphyr wrote:
| I have not yet, though you're not the first to ask. Some folks
| have suggested it might be... how do you say... fun? :-)
| speedgoose wrote:
| If you are looking for fun targets, may I suggest KubeMQ too?
| Its author claims that it's better than Kafka, Redis and
| RabbitMQ. It's also "kubernetes native" but the open source
| version refuses to start if it detects kubernetes.
| SahAssar wrote:
| > It's also "kubernetes native" but the open source version
| refuses to start if it detects kubernetes.
|
| I thought you were kidding, but this is crazy.
| https://github.com/kubemq-io/kubemq-community/issues/32
|
| And it seems like you cannot even see pricing without
| signing up or contacting their sales:
| https://kubemq.io/product-pricing/
| mathfailure wrote:
| This is just pure gold of an anecdote :-)
| Bnjoroge wrote:
| has warpstream been reviewed?
| aphyr wrote:
| Nope. You'll find a full list of analyses here:
| https://jepsen.io/analyses
| Kwpolska wrote:
| I'm looking at the product page [0] and wondering how those two
| statements are compatible:
|
| > Bufstream runs fully within your AWS or GCP VPC, giving you
| complete control over your data, metadata, and uptime. Unlike the
| alternatives, Bufstream never phones home.
|
| > Bufstream pricing is simple: just $0.002 per uncompressed GiB
| written (about $2 per TiB). We don't charge any per-core, per-
| agent, or per-call fees.
|
| Surely they wouldn't run their entire business on the honor
| system?
|
| [0] https://buf.build/product/bufstream
| c0balt wrote:
| Based on the introduction
|
| > As of October 2024, Bufstream was deployed only with select
| customers.
|
| my assumption would be an honor system might be doable. They
| are exposing themselves to risk of abuse of course but it might
| be a worthy trade off for getting certain clients on board.
| perezd wrote:
| That's correct. We hop onto Zoom calls with our customers on
| an agreed cadence, and they share a billing report with us to
| confirm usage/metering. For enterprise customers
| specifically, it works great. They don't want to violate
| contracts, and it also gives us a natural check-in point to
| ensure things are going smoothly with their deployment.
|
| When we say fully air-gapped, we mean it!
| mathfailure wrote:
| A program is either opensourced or not. When its sources aren't
| available - one should never trust "it doesn't phone home"
| claims.
___________________________________________________________________
(page generated 2024-11-12 23:01 UTC)