[HN Gopher] Jepsen: Bufstream 0.1
       ___________________________________________________________________
        
       Jepsen: Bufstream 0.1
        
       Author : aphyr
       Score  : 172 points
       Date   : 2024-11-12 14:12 UTC (8 hours ago)
        
 (HTM) web link (jepsen.io)
 (TXT) w3m dump (jepsen.io)
        
       | pmdulaney wrote:
       | What is this software used for? Instrumentation? Black boxes?
        
         | smw wrote:
         | It's a kafka clone. Kafka is a durable queue, mostly.
        
           | sureglymop wrote:
           | What is a durable queue and why is it needed (instead of a
           | traditional relational db)?
        
             | cyberax wrote:
             | RDBs suck for many-to-many high-availability messaging.
        
         | baq wrote:
         | Jepsen is for making you cry if you didn't know they're testing
         | the database you develop. Of course, those are tears of joy,
         | because having jepsen's attention is an achievement in itself.
        
           | SahAssar wrote:
           | You describe jepsen as if it's BDSM for databases, which
           | might not be too far off.
        
       | refset wrote:
       | _> The Kafka transaction protocol is fundamentally broken and
       | must be revised._
       | 
       | Ouch. Great investigation work and write-up, as ever!
        
       | c2xlZXB5Cg1 wrote:
       | Not to be confused with https://www.warpstream.com/
        
         | perezd wrote:
         | Correct. WarpStream doesn't even support transactions.
        
           | jcgrillo wrote:
           | Neither does any other Kafka protocol implementation,
           | evidently ;)
        
             | akshayshah wrote:
             | Zing! Can't lose if you don't play ;)
        
       | williamdclt wrote:
       | I'm very surprised by this:
       | 
       | > [with the default enable.auto.commit=true] Kafka consumers may
       | automatically mark offsets as committed, regardless of whether
       | they have actually been processed by the application. This means
       | that a consumer can poll a series of records, mark them as
       | committed, then crash--effectively causing those records to be
       | lost
       | 
       | That's never been my understanding of auto-commit, that would be
       | a crazy default wouldn't it?
       | 
       | The docs say this:
       | 
       | > when auto-commit is enabled, every time the poll method is
       | called and data is fetched, the consumer is ready to
       | automatically commit the offsets of messages that have been
       | returned by the poll. If the processing of these messages is not
       | completed before the next auto-commit interval, there's a risk of
       | losing the message's progress if the consumer crashes or is
       | otherwise restarted. In this case, when the consumer restarts, it
       | will begin consuming from the last committed offset. When this
       | happens, the last committed position can be as old as the auto-
       | commit interval. Any messages that have arrived since the last
       | commit are read again. If you want to reduce the window for
       | duplicates, you can reduce the auto-commit interval
       | 
       | I don't find it amazingly clear, but overall my understanding
       | from this is that offsets are committed _only_ if the processing
       | finishes. Tuning the auto-commit interval helps with duplicate
       | processing, not with lost messages, as you'd expect for at-least-
       | once processing.
        
         | aphyr wrote:
         | It is a little surprising, and I agree, the docs here are not
         | doing a particularly good job of explaining it. It might help
         | to ask: if you don't explicitly commit, how does Kafka know
         | when you've processed the messages it gave you? It doesn't! It
         | assumes any message it hands you is instantaneously processed.
         | 
         | Auto-commit is a bit like handing someone an ice cream cone,
         | then immediately walking away and assuming they ate it.
         | Sometimes people drop their ice cream immediately after you
         | hand it to them, and never get a bite.
        
           | dangoodmanUT wrote:
           | This, it has no idea that you processed the message. It
           | assumes processing is successful by default which is
           | cosmically stupid.
        
           | williamdclt wrote:
           | > if you don't explicitly commit, how does Kafka know when
           | you've processed the messages it gave you?
           | 
           | I did expect that auto-commit still involved an explicit
           | commit. I expected that it meant that the consumer side would
           | commit _after_ processing a message/batch _if_ it had been >=
           | autocommit_interval since the last commit. In other words,
           | that it was a functionality baked into the Kafka client
           | library (which does know when a message has been processed by
           | the application). I don't know if it really makes sense, I
           | never really thought hard about it before!
           | 
           | I'm still a bit skeptical... I'm pretty sure (although not
           | positive) that I've seen consumers with autocommit being
           | stuck because of timeouts that were much greater than the
           | autocommit interval, and yet retrying the same message in a
           | loop
        
             | aphyr wrote:
             | Here's a good article from New Relic on the problem, if
             | you'd like more detail: https://newrelic.com/blog/best-
             | practices/kafka-consumer-conf...
             | 
             | Or here, you can reproduce it yourself using the Bufstream
             | or Redpanda/Kafka test suite. Here's a real quick run I
             | just dashed off. You can watch it skip over writes: https:/
             | /gist.github.com/aphyr/1af2c4eef9aacde7f08f1582304908...
             | 
             | lein run test --enable-auto-commit --bin
             | bufstream-0.1.3-rc.12 --time-limit 30 --txn --final-time-
             | limit 1/10000
        
           | justinsaccount wrote:
           | Weird, I would have guessed that it auto commits the previous
           | batch when it polls for the next batch, meaning it would be
           | like                 loop:         messages = poll() # poll
           | returns new messages and commits previous batch
           | process(messages)
           | 
           | but it sounds like it "poll returns new messages and
           | immediately commits them."
        
             | williamdclt wrote:
             | Information on the internet about this seems unreliable,
             | confusing and contradictory... It's crazy for something so
             | critical, especially when it's enabled by default.
        
           | jakewins wrote:
           | Auto commit has always seemed super shady. Manual commit I
           | have assumed is safe though - something something vector
           | clocks - and it'd be really interesting to know if that trust
           | is misplaced.
           | 
           | What is the process and cost for having you do a Jepsen test
           | for something like that?
        
             | aphyr wrote:
             | You'll find lots about the Jepsen analysis process here:
             | https://jepsen.io/services/analysis
        
         | th0ma5 wrote:
         | It is my understanding that the reason why this is is high
         | performance situations. You have some other system that can
         | figure out if something fail, but with this feature you can
         | move the high water mark so that you don't have to redo as
         | much. But if you got the timing right and there is a failure
         | you can go ahead and assume that when you restart again you'll
         | be getting some stuff that you already processed. The problem
         | is when you don't have this for mailing before the auto commit.
         | It is meant to be done far after processing in my reading of
         | it, but it does certainly seem like there's a contradiction
         | that it should auto commit but only stuff so many milliseconds
         | before the auto commit time?
        
         | kevstev wrote:
         | It is a bit of splitting hairs in some sense, but the key
         | concept here is just because the message was delivered to the
         | Kafka client successfully, does not mean it was processed by
         | the application.
         | 
         | You will have to explicitly ack if you want that guarantee. For
         | a concrete example, lets say all you do with a message is write
         | it to a database. As soon as that message is in your client
         | handler callback, that message is ack'ed. But you probably only
         | want that ack to happen after a successful insert into the DB.
         | The most likely scenario here to cause unprocessed messages is
         | that the DB is down for whatever reason (maybe a network link
         | is down, or k8s or even a firewall config now prevents you from
         | accessing), and at some point during this your client goes
         | down, maybe by an eng attempting a restart to see if the
         | problem goes away.
        
         | necubi wrote:
         | I can maybe give some justification for why this feature
         | exists. It's designed for synchronous, single-threaded
         | consumers which do something like this:                 loop {
         | 1. Call poll         2. Durably process the messages       }
         | 
         | I think a point of confusion here is that the auto-commit check
         | happens on the next call to poll--not asynchronously after the
         | timeout. So you should only be able to drop writes if you are
         | storing the messages without durably processing them (which
         | includes any kind of async/defer/queues/etc.) before calling
         | poll again.
         | 
         | (I should say--this is the documented behavior for the Java
         | client library[0]--it's possible that it's not actually the
         | behavior that's implemented today.)
         | 
         | The Kafka protocol is torn between being high-level and low-
         | level, and as a result it does neither particularly well. Auto
         | commit is a high-level feature that aims to make it easier to
         | build simple applications without needing to really understand
         | all of the moving pieces, but obviously can fail if you don't
         | use it as expected.
         | 
         | I'd argue that today end users shouldn't be using the Kafka
         | client directly--use a proper high level implementation that
         | will get the details right for you (for data use cases this is
         | probably a stream processing engine, for application use cases
         | it's something like a duration execution engine).
         | 
         | [0]
         | https://kafka.apache.org/32/javadoc/org/apache/kafka/clients...
         | --
        
       | bobnamob wrote:
       | > We would like to combine Jepsen's workload generation and
       | history checking with Antithesis' deterministic and replayable
       | environment to make our tests more reproducible.
       | 
       | For those unaware, Antithesis was founded by some of the folks
       | who worked on FoundationDB - see
       | https://youtu.be/4fFDFbi3toc?si=wY_mrD63fH2osiU- for some of
       | their handiwork.
       | 
       | A Jepsen + Antithesis team up is something the world needs right
       | now, specifically on the back of the Horizon Post Office scandal.
       | 
       | Thanks for all your work highlighting the importance of db safety
       | Aphyr
        
         | bobnamob wrote:
         | Furthermore, I'm aware of multiple banks currently using Kafka.
         | One would hope that they're not using it in their core banking
         | system given Kyle's findings
         | 
         | Maybe they'd be interested in funding a Jepsen ~attack~
         | experiment on Kafka
        
           | kevstev wrote:
           | As someone who was very deep into Kafka in the not too
           | distant past, I am surprised I have no idea what you are
           | referring to- can you enlighten me?
        
             | diggan wrote:
             | Read the "Future Work" section in the bottom of the post
             | for the gist, and also section 5.3.
        
               | kevstev wrote:
               | I see. I never trusted transactions and advised our app
               | teams to not rely on them, at least without outside
               | verification of them.
               | 
               | The situation is actually far worse with any client
               | relying on librdkafka. Some of this has been fixed, but
               | my company found at least a half dozen bugs/uncompleted
               | features in librdkafka, mostly around retryable errors
               | that were sometimes ignored, sometimes caused exceptions,
               | and other times just straight hung clients.
               | 
               | Despite our company leaning heavily on Confluent to force
               | librdkafka to get to parity with the Java client, it was
               | always behind, and in general we started adopting a
               | stance of not implementing any business critical
               | functions on any feature implemented in the past year or
               | major release.
        
               | EdwardDiego wrote:
               | Yeah, Confluent has really dropped the ball on librdkafka
               | dev of late. :(
        
       | philprx wrote:
       | I didn't find the GitHub project for bufstream... Any clue?
        
         | aphyr wrote:
         | Ack, pardon me. That should be fixed now!
        
         | xiasongh wrote:
         | Found this on their website https://github.com/bufbuild/buf
        
           | bpicolo wrote:
           | They seem to have pivoted from protobuf tools to kafka
           | alternatives. I don't think bufstream is OSS (yet). Or at
           | least, they have very much de-emphasized their original
           | offering on their site.
        
             | perezd wrote:
             | Nope! We're still heavily investing in scaling Protobuf. In
             | fact, our data quality guarantees built into Bufstream are
             | powered by Protobuf! This is simply an extension of what we
             | do...Connect RPC, Buf CLI, etc.
             | 
             | Don't read too much into the website :)
        
               | bpicolo wrote:
               | Good to know. Good proto tooling is still high value :)
        
         | mdaniel wrote:
         | I don't think bufstream itself is open source but there's
         | https://github.com/bufbuild/bufstream-demo which may be close
         | to what you want _(but is also unlicensed, bizarrely)_
        
           | perezd wrote:
           | That's correct. Bufstream is not open source, but we do have
           | a demo that you can try. I've asked the team to include a
           | proper LICENSE file as well. Thanks for catching that!
        
       | diggan wrote:
       | > While investigating issues like KAFKA-17754, we also
       | encountered unseen writes in Kafka. Owing to time constraints we
       | have not investigated this behavior, but unseen writes could be a
       | sign of hanging transactions, stuck consumers, or even data loss.
       | We are curious whether a delayed Produce message could slide into
       | a future transaction, violating transactional guarantees. We also
       | suspect that the Kafka Java Client may reuse a sequence number
       | when a request times out, causing writes to be acknowledged but
       | silently discarded. More Kafka testing is warranted.
       | 
       | Seems like Jepsen should do another Kafka deep-dive. Last time
       | was in 2013 (https://aphyr.com/posts/293-call-me-maybe-kafka,
       | Kafka version 0.8 beta) and seems like they're on the verge of
       | discovering a lot of issues in Kafka itself. Things like "causing
       | writes to be acknowledged but silently discarded" sounds very
       | scary.
        
         | aphyr wrote:
         | I would love to do a Kafka analysis. :-)
        
           | jwr wrote:
           | I'm still hoping Apple (or Snowflake) will pay you to do an
           | analysis of FoundationDB...
        
             | tptacek wrote:
             | I do too, but doesn't FDB already do a lot of the same kind
             | of testing?
        
               | kasey_junk wrote:
               | They are famous for doing simulation testing.
               | https://antithesis.com/ Have recently brought to market a
               | simulation testing product.
        
               | SahAssar wrote:
               | I think they do similar testing, and therefore it might
               | be even more interesting to read what Kyle thinks of
               | their different approaches to it.
        
               | jwr wrote:
               | Yes. But going through Jepsen and surviving is different.
               | Gives an entirely new reputation to a database.
        
               | kasey_junk wrote:
               | I don't think ayphr would disagree with me when I say
               | that FDB's testing regime is the gold standard and Jepsen
               | is trying to get there, not the other way around.
        
               | aphyr wrote:
               | I'm not sure. I've worked on a few projects now which
               | employed simulation testing and passed, only to discover
               | serious bugs using Jepsen. State space exploration and
               | oracle design are hard problems, and I'm not convinced
               | there's a single, ideal path for DB testing that subsumes
               | all others. I prefer more of a "complete breakfast"
               | approach.
        
               | kasey_junk wrote:
               | Has your opinion changed on that in the last few years? I
               | could have sworn you were on record as saying this about
               | foundation in the past but I couldn't find it in my
               | links.
        
           | diggan wrote:
           | Not that I'm a Kafka user, but I greatly appreciate your
           | posts, so thank you :)
           | 
           | Maybe Kafka users should do a crowdfund for it if the
           | companies aren't willing. Realistically, what would the goal
           | of the crowdfund have to be for you to consider it?
        
           | monksy wrote:
           | I would love to read your Kafka analysis
        
       | didip wrote:
       | Has Kyle reviewed NATS Jetstream? I wonder what he thinks of it.
        
         | aphyr wrote:
         | I have not yet, though you're not the first to ask. Some folks
         | have suggested it might be... how do you say... fun? :-)
        
           | speedgoose wrote:
           | If you are looking for fun targets, may I suggest KubeMQ too?
           | Its author claims that it's better than Kafka, Redis and
           | RabbitMQ. It's also "kubernetes native" but the open source
           | version refuses to start if it detects kubernetes.
        
             | SahAssar wrote:
             | > It's also "kubernetes native" but the open source version
             | refuses to start if it detects kubernetes.
             | 
             | I thought you were kidding, but this is crazy.
             | https://github.com/kubemq-io/kubemq-community/issues/32
             | 
             | And it seems like you cannot even see pricing without
             | signing up or contacting their sales:
             | https://kubemq.io/product-pricing/
        
               | mathfailure wrote:
               | This is just pure gold of an anecdote :-)
        
       | Bnjoroge wrote:
       | has warpstream been reviewed?
        
         | aphyr wrote:
         | Nope. You'll find a full list of analyses here:
         | https://jepsen.io/analyses
        
       | Kwpolska wrote:
       | I'm looking at the product page [0] and wondering how those two
       | statements are compatible:
       | 
       | > Bufstream runs fully within your AWS or GCP VPC, giving you
       | complete control over your data, metadata, and uptime. Unlike the
       | alternatives, Bufstream never phones home.
       | 
       | > Bufstream pricing is simple: just $0.002 per uncompressed GiB
       | written (about $2 per TiB). We don't charge any per-core, per-
       | agent, or per-call fees.
       | 
       | Surely they wouldn't run their entire business on the honor
       | system?
       | 
       | [0] https://buf.build/product/bufstream
        
         | c0balt wrote:
         | Based on the introduction
         | 
         | > As of October 2024, Bufstream was deployed only with select
         | customers.
         | 
         | my assumption would be an honor system might be doable. They
         | are exposing themselves to risk of abuse of course but it might
         | be a worthy trade off for getting certain clients on board.
        
           | perezd wrote:
           | That's correct. We hop onto Zoom calls with our customers on
           | an agreed cadence, and they share a billing report with us to
           | confirm usage/metering. For enterprise customers
           | specifically, it works great. They don't want to violate
           | contracts, and it also gives us a natural check-in point to
           | ensure things are going smoothly with their deployment.
           | 
           | When we say fully air-gapped, we mean it!
        
         | mathfailure wrote:
         | A program is either opensourced or not. When its sources aren't
         | available - one should never trust "it doesn't phone home"
         | claims.
        
       ___________________________________________________________________
       (page generated 2024-11-12 23:01 UTC)