hngopher.com

       [HN Gopher] Issues we've encountered while building a Kafka base...
       ___________________________________________________________________
        
       Issues we've encountered while building a Kafka based data
       processing pipeline
        
       Author : poolik
       Score  : 141 points
       Date   : 2021-10-18 09:22 UTC (13 hours ago)
        
 (HTM) web link (sixfold.medium.com)
 (TXT) w3m dump (sixfold.medium.com)
        
       | mfateev wrote:
       | temporal.io provides much higher level abstraction for building
       | asynchronous microservices. It allows one to model async
       | invocations as synchronous blocking calls of any duration (months
       | for example). And the state updates and queueing are
       | transactional out of the box.
       | 
       | Here is an example using Typescript SDK:                  async
       | function main(userId, intervals){           // Send reminder
       | emails, e.g. after 1, 7, and 30 days                  for (const
       | interval of intervals) {            await sleep(interval * DAYS);
       | // can take hours if the downstream service is down
       | await activities.sendEmail(interval, userId);           }
       | // Easily cancelled when user unsubscribes        }
       | 
       | Disclaimer: I'm one of the creators of the project.
        
       | LgWoodenBadger wrote:
       | I didn't quite follow their explanation for why producing to
       | Kafka first didn't/wouldn't work for them (db state potentially
       | being out of sync requiring continuous messaging until fixed).
        
         | anentropic wrote:
         | it's a chicken and egg problem
         | 
         | you can either send a kafka message but potentially not commit
         | the db transaction (i.e. an event is published for which the
         | action did not actually occur) or commit the db transaction and
         | potentially not send the kafka message
         | 
         | it sounds like they implemented something like the
         | Transactional Outbox pattern
         | https://microservices.io/patterns/data/transactional-outbox....
         | 
         | i.e. you use the db transaction to also commit a record of your
         | intent to send a kafka message - you can then move the actual
         | event sending to a separate process and implement at-least-once
         | semantics
         | 
         | This is the job queueing system they described in the article
        
           | LgWoodenBadger wrote:
           | Their solution seems like a "produce to Kafka first" but with
           | extra steps.
           | 
           | Regarding:
           | 
           |  _When we produce first and the database update fails
           | (because of incorrect state) it means in the worst case we
           | enter a loop of continuously sending out duplicate messages
           | until the issue is resolved_
           | 
           | I don't understand where either 1) the incorrect state or 2)
           | the need to continuously send duplicate messages come from.
           | 
           | Regarding:
           | 
           |  _The Job might still fail during execution, in which case
           | it's retried with exponential backoff, but at least no
           | updates are lost. While the issue persists, further state
           | change messages will be queued up also as Jobs (with same
           | group value). Once the (transient) issue resolves, and we can
           | again produce messages to Kafka, the updates would go out in
           | logical order for the rest of the system and eventually
           | everyone would be in sync._
           | 
           | This is the part that is equivalent to Kafka-first, except
           | with all the extra steps of a job scheduling, grouping,
           | tracking, and execution framework on top of it.
        
       | fafle wrote:
       | The issue of running a transaction that spans multiple
       | heterogeneous systems is usually solved with a 2 phase commit.
       | The "jobs" abstraction from the article looks similar to the
       | "coordinator" in 2PC. The article does not talk about how they
       | achieve fault tolerance in case the "job" crashes inbetween the
       | two transactions. Postgres supports the XA standard, which might
       | help with this. Kafka does not support it.
        
         | jgraettinger1 wrote:
         | I can't speak to their solution, but when solving an equivalent
         | problem within Gazette, where you desire a distributed
         | transaction that includes both a) published downstream
         | messages, and b) state mutations in a DB, the solution is to 1)
         | write downstream messages marked as pending a future ACK, and
         | 2) encode the ACK you _intend_ to write into the checkpoint
         | itself.
         | 
         | Commit the checkpoint alongside state mutations in a single
         | store transaction. Only then do you publish ACKs to all of the
         | downstream streams.
         | 
         | Of course, you can fail immediately after commit but before you
         | get around to publishing all of those ACKS. So, on recovery,
         | the first thing a task assignment does is publish (or re-
         | publish) the ACKs encoded in the recovered checkpoint. This
         | will either 1) provide a first notification that a commit
         | occurred, or 2) be an effective no-op because the ACK was
         | already observed, or 3) roll-back pending messages of a
         | partial, failed transaction.
         | 
         | More details:
         | https://gazette.readthedocs.io/en/latest/architecture-exactl...
        
         | nitwit005 wrote:
         | The solution I've seen is to write the message you want to send
         | to the DB along with the transaction, and have some separate
         | thread that tries to send the messages to Kafka.
         | 
         | Although, from various code bases I've seen, a lot of people
         | just don't seem to worry about the possibility of data loss.
        
         | eternalban wrote:
         | IIRC ~2 decades ago we were dequeueing from JMS, updating
         | RDBMS, and then enqueuing all under the cover of JTA (Java
         | Transaction API) for atomic ops.
         | 
         | https://docs.oracle.com/en/middleware/fusion-middleware/12.2...
         | 
         | Using a very broad definition of 'noSQL' approach that would
         | include solutions like Kafka, the issue becomes clear: A 2PC or
         | 'distributed transaction manager' approach ala JTA comes with a
         | performance/scalability cost -- arguably a non-issue for most
         | companies who don't operate at LinkedIn scale (where Kafka was
         | created).
        
           | zeckalpha wrote:
           | And MySQL didn't yet support transactions!
        
       | jpgvm wrote:
       | What you want is called Apache Pulsar. Log when you need it, work
       | queue when you need that instead.
       | 
       | As for ensuring transactional consistency that isn't so bad, you
       | can use an table to track offset inserts making sure you verify
       | from that before you update consumer offsets (or Pulsar
       | subscription if you go that route).
        
       | jgraettinger1 wrote:
       | If you're in the Go ecosystem, Gazette [0] offers transactional
       | integrations [1] with remote DB's for stateful processing
       | pipelines, as well as local stores for embedded in-process state
       | management.
       | 
       | It also natively stores data as files in cloud storage. Brokers
       | are ephemeral, you don't need to migrate data between them, and
       | you're not constrained by their disk size. Gazette defaults to
       | exactly-once semantics, and has stronger replication guarantees
       | (your R factor is your R factor, period -- no "in sync
       | replicas").
       | 
       | Estuary Flow [2] is building on Gazette as an implementation
       | detail to offer end-to-end integrations with external SaaS & DB's
       | for building real-time dataflows, as a managed service.
       | 
       | [0]: https://github.com/gazette/core [1]:
       | https://gazette.readthedocs.io/en/latest/consumers-concepts....
       | [2]: https://github.com/estuary/flow
        
         | ivanr wrote:
         | Small suggestion: If Gazette is ready for a wider adoption, it
         | may be useful to bump it up to 1.0 as a signal of confidence.
        
       | anotherhue wrote:
       | I ran a few dozen kafka clusters at MegaCorp in a previous life.
       | 
       | My answer to anyone who asks for kafka: Show me that you can't do
       | what you need with a beefy Postgres.
        
         | afandian wrote:
         | I had a great time with Kafka for prototyping. Being able to
         | push data from a number of places, have mulitple consumers able
         | to connect, go back and forth though time, add and remove
         | independent consumer groups. Ran in pre-production very
         | reliably too, for years.
         | 
         | But for a production-grade version of the system I'm going with
         | SQL and, where needed, IaC-defined SQS.
        
         | capableweb wrote:
         | > My answer to anyone who asks for kafka: Show me that you
         | can't do what you need with a beefy Postgres.
         | 
         | ...
         | 
         | You could probably hack any database to perform any task, but
         | why would you? Use the right tool for the right task, not one
         | tool for all tasks.
         | 
         | If the right tool is a relational database, then use
         | Postgres/$OTHER-DATABASE
         | 
         | If the right tool is distributed, partitioned and replicated
         | commit log service, then use Kafka/$OTHER-COMMIT-LOG
         | 
         | Not sure why people get so emotional about the technologies the
         | know the best. Sure you could hack Postgres to be a replicated
         | commit log, but I'm sure it'll be easier to just throw in Kafka
         | instead.
        
         | ThinkBeat wrote:
         | This is exactly how I feel about it.
         | 
         | A while back I was on team building a non-critical, low volume
         | application. It just involves people sending a message
         | basically. (there is more too it).
         | 
         | The consultants said he had to use Kafka because the messages
         | could come in really fast.
         | 
         | I said we should stick with Postgres.
         | 
         | No, they said, we really need Kakfa to be able to handle this.
         | 
         | Then I went and spun up Postgres on my work laptop (nothing
         | special), and got a loaner to act as a client. I simulated
         | about 300% more traffic than we had any chance of getting. It
         | worked fine. (did tax my poor work laptop).
         | 
         | No, we could not risk it, when we use Kafka we are safe.
         | 
         | Took it to management, Kafka won since Buzzword.
         | 
         | Now of course we have to write a process to feed the data into
         | Postgres. After all its what everything else depends on
        
           | sparsely wrote:
           | There are various Kafka to Postgres adaptors. Of course, now
           | you're running 3 bits of software, with 3 bottlenecks,
           | instead of just 1.
        
           | jghn wrote:
           | People tend to not realize how big "at scale" problems really
           | are. Instead anything at a scale at the edge of their
           | experience is "at scale" and they reach for the tools they've
           | read one is supposed to use in those situations. It makes
           | sense, people don't know what they don't know.
           | 
           | And thus we have a world where people have business needs
           | that could be powered by my low end laptop but solutions
           | inspired by the megacorps.
        
             | jerf wrote:
             | The GHz aspect of Moore's law died over a decade ago, and I
             | suppose it's fair to say most other stuff has also slowed
             | down, but if you've got a job that is embarrassingly
             | parallel, which a lot of these "big data" jobs are, people
             | badly underestimate how much progress there has been in the
             | server space even so in the last 10 years if they're not
             | paying attention. What was "big data" in 2011 can easily be
             | "spin up a single 32-core instance with 1TB RAM for a
             | couple of hours" in 2021. Even beyond the "big data" that
             | my laptop comfortably handles.
             | 
             | I'm slowly wandering into a data science role lately, and
             | I've been dealing with teams who are all kinds of concerned
             | about whether or not we can handle the sheer, overwhelming
             | volume of their (summary) data. "Well, let's see, how much
             | data are we talking about?" "Oh, gosh, we could generate
             | 200 or 300 megabytes a day." (Of uncompressed JSON.) Well,
             | you know, if I have to bust out a second Raspberry Pi I'll
             | be sure to charge it to your team.
             | 
             | The funny thing is that some of these teams have the
             | experience that they ought to know better. They are
             | legitimately running cloud services with dozens of large
             | nodes continually running at high utilization and chewing
             | through gigabytes of whatever per _second_. In their own
             | worlds they would absolutely know that a couple hundred
             | megabytes is _nothing_. They 'll often have known places in
             | their stack where they burn through a few hundred megabytes
             | in internal API calls or something unnecessarily, and it
             | will barely rise to the level of a P3 bug, quite
             | legitimately so. But when they start thinking in terms of
             | (someone else's) databases it's like they haven't updated
             | their sense of size since 2005.
        
             | yongjik wrote:
             | You are not thinking enterprisey enough. Everything is Big
             | Scale if you add enough layers, because all these overheads
             | add up, to which the solution is of course more layers.
        
           | i_like_waiting wrote:
           | But can you stream data from lets say MS SQL directly into
           | Postgres? Easiest way I found is Kafka, I would love some
           | simple python script instead
        
             | BiteCode_dev wrote:
             | Stream? Unless you have really hight traffic, put that in a
             | python script while loop that regularly check for new rows,
             | and it will be fine.
             | 
             | If you want to get fancy, db now have pub/sub.
             | 
             | There are use case for stream replication, but you need way
             | more data than 99% of biz have.
        
           | dtech wrote:
           | Were you advocating an endpoint + DB or different apps
           | directly writing into a shared DB? The latter is not really a
           | good idea for numerous reasons. Kafka a - potentially
           | overkill - replacement for REST or whatever, not for your DB.
        
         | ceencee wrote:
         | I see posts like this a lot, and it makes me wonder what the
         | heck you were using Kafka for that Postgres could handle, yet
         | you had dozens of clusters? I question if you actually ever
         | used Kafka or just operated it? Sure anyone can follow the
         | "build a queue on a database pattern" but it falls over at the
         | throughputs that justify Kafka. If you have a bunch of trivial
         | 10tps workloads, of course a distributed system is overkill.
        
         | silisili wrote:
         | Can you kinda high level the setup/processes for making
         | Postgres a replacement for Kafka? I've not attempted such a
         | thing before, and wonder about things like
         | expiration/autodeletion, etc. Does it need to be vacuumed
         | often, and is that a problem?
        
         | gilbetron wrote:
         | What velocity have you achieved with postgres, in terms of #
         | messages/sec where messages could range between 1KB-100KB in
         | size?
        
           | lima wrote:
           | Yup. That's what Kafka excels at and where it scales to
           | throughput way beyond Postgres.
           | 
           | But there are many many projects where Kafka is used for low
           | value event sourcing stuff where a SQL DB could be easier.
        
         | superyesh wrote:
         | >My answer to anyone who asks for kafka: Show me that you can't
         | do what you need with a beefy Postgres.
         | 
         | Sorry thats just a clickbait-y statement. I love Postgres, try
         | handling 100-500k rps of data coming in from various sources
         | reading and writing to it. You are going to get bottlenecked on
         | how many connections you can handle, you will end up throwing
         | pgBouncers on top of it.
         | 
         | Eventually you will run out of disk, start throwing more in.
         | 
         | Then end up in VACCUUM hell all while having a single point of
         | failure.
         | 
         | While I agree Kafka has its own issues, it is an amazing tool
         | to a real scale problem.
        
           | fernandotakai wrote:
           | at least for me, every system has its place.
           | 
           | i love posgresql, but i would not use it to replace a
           | rabbitmq instance -- one is an RDBMS, the other is a
           | queue/event system.
           | 
           | "oh but psql can pretend to be kafka/rabbitmq!" -- sure, but
           | then you need to add tooling to it, create libraries to
           | handle it, and handle all the edge cases.
           | 
           | with rmq/kafka, there already a bunch of tools to handle the
           | exact case of a queue/event system.
        
           | dwohnitmok wrote:
           | I think anotherhue would agree that half a million write
           | requests per second counts as a valid answer to "you can't do
           | what you need with a beefy Postgres," but that is also a
           | minority of situations.
        
             | IggleSniggle wrote:
             | It's just hard to know what people mean when they say "most
             | people don't need to do this." I was sitting wondering
             | about a similar scale (200-1000 rps), where I've had issues
             | with scaling rabbitmq, and have been thinking about whether
             | kafka might help.
             | 
             | Without context provided, you might think: "oh, here's
             | somebody with kafka and postgres experience, saying that
             | postgres has some other super powers I hadn't learned about
             | yet. Maybe I need to go learn me some more postgres and see
             | how it's possible."
             | 
             | It would be helpful for folks to provide generalized
             | measures of scale. "Right tool for the job," sure, but in
             | the case of postgres, it often feels like there are a lot
             | of incredible capabilities lurking.
             | 
             | I don't know what's normal for day-to-day software
             | engineers anymore. Was the parent comment describing
             | 100-500 rps really "a minority of situations?" I'm sure it
             | is for most _businesses_. But is it  "the minority of
             | situations" that _software engineers_ are actively trying
             | to solve in 2021? I have no clue.
        
               | Serow225 wrote:
               | that seems like an awfully low number to be running into
               | issues with RabbitMQ ?
        
           | doliveira wrote:
           | Yeah, once you do have to scale a relational database you're
           | in for a world of pain. Band-aid after band-aid... I very
           | much prefer to just start with Kafka already. At the very
           | least you'll have a buffer to help you gain some time when
           | the database struggles.
        
         | dionian wrote:
         | Honest question, how do you expire content in Postgres? Every
         | time I start to use it for ephemeral data I start to wonder if
         | I should have used TimescaleDB or if I should be using
         | something else...
        
           | LoriP wrote:
           | You likely already know that TimescaleDB is an extension to
           | PostgreSQL, so you get everything you'd get with PostgreSQL
           | plus the added goodies of TimescaleDB. All that said, you can
           | drop (or detach) data partitions in PostgreSQL (however you
           | decide to partition...) Does that not do the trick for your
           | use case, though? https://www.postgresql.org/docs/10/ddl-
           | partitioning.html
           | 
           | Transparency: I work for Timescale
        
         | mvc wrote:
         | Record an event log and reliably connect it to a variety of 3rd
         | party sinks using off-the-shelf services
        
         | sparsely wrote:
         | I've also had much more success with "queues in postgres" than
         | "queues in kafka". I'm sure the use cases exist where kafka
         | works better, but you have to architect it so carefully, and
         | there are so many ways to trip up. Whereas most of your team
         | probably already understands a RDMS and you're probably already
         | running one reliably.
        
         | lmilcin wrote:
         | Just because you can do it with Postgres doesn't mean it is the
         | best tool for the job.
         | 
         | Sometimes the restrictions placed on the user are as important.
         | Kafka presents a specific interface to the user that causes
         | users to build their applications in certain way.
         | 
         | While you can replicate almost all functionality of Kafka with
         | Postgres (except for performance, but hardly anybody needs as
         | much of it), we all know what we end up with when we set up
         | Postgres and use it to integrate applications with each other.
         | 
         | If developers had discipline they could of course crate tables
         | with appendable logs of data, marked with a partition, that
         | consumers could process from with basically same guarantees as
         | with Kafka.
         | 
         | But that is not how it works in reality.
        
         | bsaul wrote:
         | using a sql db for push/pop semantic feels like using a hammer
         | to squash a bug.. How would you model queues & partitions with
         | ordering guarantees with pg ?
        
           | throwaway81523 wrote:
           | With transactions, and stored procedures if that helps ;).
           | Redis also seems well suited to the use cases I've seen for
           | Kafka. Kafka must have capabilities beyond those use cases,
           | and I've sometimes wondered what they are.
        
             | [deleted]
        
           | anotherhue wrote:
           | sql has many conveniences for doing so, it wouldn't be much
           | work.
           | 
           | > using a hammer to squash a bug..
           | 
           | Agreed - but Kafka is a much much bigger hammer. SES/Az
           | Queues are also good choices.
        
         | mrweasel wrote:
         | We work with a client who has requested a Kafka cluster. They
         | can't really say why, or what they need it for, but the now
         | have a very large cluster which doesn't do much. I know why
         | they want it, same reason why they "need" Kubernetes. So far
         | they use it as a sort of message bus.
         | 
         | It's not that there's anything wrong with Kafka, it's a very
         | good product and extremely robust. Same with Kubernetes, it has
         | it uses and I can't fault anyone for having it as a
         | consideration.
         | 
         | My problem is when people ignore how capable modern servers
         | are, and when developers don't see the risks in building these
         | highly complex systems, if something much simpler would solve
         | the same problem, only cheaper and safer.
        
         | aqme28 wrote:
         | > Show me that you can't do what you need with a beefy
         | Postgres.
         | 
         | I've found this question very useful when pitched any esoteric
         | database.
        
         | alephu5 wrote:
         | Why Postgres? Why not redis, rabbitMQ or even Kafka itself?
        
           | hardwaresofton wrote:
           | Not those other tools because you can't achieve Postgres
           | functionality with those other tools generally.
           | 
           | Postgres can pretend to be Redis, RabbitMQ, and Kafka, but
           | redis, RabbitMQ, and Kafka would have a hard time pretending
           | to be Postgres.
           | 
           | Postgres has the best database query language man has
           | invented so far (AFAIK), well reasoned persistence and
           | semantics, and as of recently partitioning features to boot
           | and lots of addons to support different usecases. Despite all
           | this postgres is mostly Boring Technology (tm) and easily
           | available as well as very actively developed in the open,
           | with a enterprise base that does consulting first and usually
           | upstreams improvements after some time (2nd Quadrant, EDB,
           | Citus, TimescaleDB).
           | 
           | The other tools win on simplicity for some (I'd take managing
           | a PostgreSQL cluster over RMQ or Kafka any day), but for
           | other things especially feature wise Postgres (and it's
           | amalgamation of mostly-good-enough to great features) wins
           | IMO.
        
             | LoriP wrote:
             | Your comment on Boring Technology (tm) made me smile... At
             | Timescale as you'll see in this blog we are happy to
             | celebrate the boring, there's an awful lot to be said for
             | it :) https://blog.timescale.com/blog/when-boring-is-
             | awesome-build...
             | 
             | For transparency: I'm Timescale's Community Manager
        
         | sumtechguy wrote:
         | I have done both patterns.
         | 
         | Kafka has its use case. Databases have theirs. You can make a
         | DB do what kafka does. But you also add in the programming
         | overhead of getting the DB semantics correct to make an event
         | system. When I see people saying 'lets put the DB into kafka' I
         | make the exact same argument. You will spend more time making
         | kafka act like a database and getting the semantics right.
         | Kafka is more of a data/event transportation system. A DB is an
         | at rest data store that lets you manipulate the data. Use them
         | to their strengths or get crushed by weird edge cases.
        
           | tengbretson wrote:
           | I'd argue that having the event system semantics layered on
           | top of a sql database is a big benefit when you have an
           | immature product, since you have an incredibly powerful
           | escape hatch to jump in and fire off a few queries to fix
           | problems. Kafka's visibility for debugging is pretty poor in
           | my experience.
        
             | sumtechguy wrote:
             | My issues typically with layering an event system on top of
             | a db is replication and ownership of that event. Kafka
             | makes some very nice guarantees about giving best attempt
             | to make sure only one process works on something at a time
             | inside of a consumer group. You have to build a system in
             | the db using locks and different things that are poor
             | substitutes.
             | 
             | If you are having trouble debugging kafka you could use a
             | connector to put the data into the database/file to also
             | debug, or a db streamer. You can also use the built in cli
             | tools to scroll along. I have had very good luck with using
             | both of those to find out what is going wrong. Also kafka
             | will basically by default keep all the messages for the
             | past 7 days so you can play it back if you need to by
             | moving the consumer offsets. IF you are trying to use kafka
             | like a db and change messages on the fly you will have a
             | bad time of it. Kafka is meant to be a here is something
             | that happened and some data. Changing that data after the
             | fact would in the kafka world be another event. In some
             | types of systems that is a very desirable property (such as
             | audit heavy cultures, banks, medical, etc). Now also kafka
             | can be a real pain if you are debugging and messup a schema
             | or produce a message that does not fit into the consumer.
             | Getting rid of that takes some poor cli trickery when in a
             | db it is a delete call.
             | 
             | Also kafka is meant for a distributed system event based
             | worker systems (typically some sort of microservice style
             | system). If you are early on you more than likely not
             | building that yet. Just dumping something into a list in a
             | table and polling on that list is a very effective way for
             | something that is early on or maybe even forever. But once
             | you add in replication and/or multiple clients looking at
             | that same task list table you will start to see the issues
             | quickly.
             | 
             | Using an event system like a db system and yes it will feel
             | broken. Also vice versa. You can do it. But like you are
             | finding out those edge cases are a pain and make you feel
             | like 'bah its easier to do it this way'. In some cases yes.
             | In your case you have a bad state in your event data. You
             | are cleaning it up with some db calls. But what if instead
             | you had an event that did that for you?
        
           | BiteCode_dev wrote:
           | Well, if you want events, you have lighter alternative. Redis
           | pub/sub, crossbar... Even rabbitMQ is lighter.
           | 
           | If you need queuing, you have libs like celery.
           | 
           | You don't need to go full kafka
        
             | emerongi wrote:
             | Redis - it's probably in your stack already and fills all
             | the use-cases.
        
       | bsaul wrote:
       | Very interested to hear how people here overcome the limits of
       | kafka for ordered events delivery in real world, and what those
       | were.
        
         | luxurytent wrote:
         | I feel as if you're using Kafka and expect guaranteed ordering,
         | then you're using the wrong tool. At best you have guaranteed
         | ordering per partition but then you've tied your
         | ordering/keying strategy to the amount of partitions you've
         | enabled ... which may not ideal.
         | 
         | But, that's speaking from my light experience with it. I'm also
         | curious if there's a better way :-)
        
         | orobinson wrote:
         | At lower data volumes (<10,000 events per minute) it's
         | perfectly feasible to just use single partition topics and then
         | ordered event delivery is no problem at all. If a consuming
         | service has processing times that means horizontal scaling is
         | necessary then the topic can be repartitioned into a new topic
         | with multiple partitions and the processing application can
         | handle sorting the outputted data to some SLA.
        
         | BFLpL0QNek wrote:
         | It depends on what the events are, how they are structured.
         | 
         | You get guaranteed ordering at the partition level.
         | 
         | Items are partitioned by key so you also get guaranteed
         | ordering for a key.
         | 
         | If you have guaranteed ordering for a key you can't get total
         | ordering across all keys but you can get eventual consistency
         | across the keys.
         | 
         | Ultimately if you want ordering you have to design around being
         | eventually consistent.
         | 
         | I don't read a lot of papers but Leslie Lamports Time, Clocks,
         | and the Ordering of Events in a Distributed System gave me a
         | lot of insight in to the constraints.
         | https://lamport.azurewebsites.net/pubs/time-clocks.pdf
        
           | sumtechguy wrote:
           | For kafka the default is round robin in each partition. A
           | hash key can let you direct the work to particular
           | partitions. Each partition is guaranteed ordering. Also only
           | one consumer in a consumer group can remove an item from a
           | partition at a time. No two consumers in a consumer group
           | will get the same message.
        
             | tiew9Vii wrote:
             | It's round robin if no key specified otherwise it uses
             | murmur2 hash of the key so the partition for a key is
             | always deterministic.
             | 
             | Just checking the docs it appears the round robin is no
             | longer true after Confluent Platform 5.4. After 5.4 it
             | looks like if no key specified the partition is assigned
             | based on the batch being processed.
             | 
             | > If the key is provided, the partitioner will hash the key
             | with murmur2 algorithm and divide it by the number of
             | partitions. The result is that the same key is always
             | assigned to the same partition. If a key is not provided,
             | behavior is Confluent Platform version-dependent:...
             | 
             | https://docs.confluent.io/platform/current/clients/producer
             | ....
        
         | csours wrote:
         | put a timestamp in the message. use a conflict free replicated
         | data type
        
           | sethammons wrote:
           | If you need a guaranteed ordering, timestamps and distributed
           | systems are not friends. See logical / vector clocks.
        
         | jgraettinger1 wrote:
         | Not for Kafka, but we are building Flow [1] to offer
         | deterministic ordering, even across multiple logical and
         | physical partitions, and regardless of whether you're back-
         | filling over history or processing in real-time.
         | 
         | This ends up being required for one of our architectural goals,
         | which is fully repeatable transformations: You must be able to
         | model a transactional decision as a Flow derivation (like "does
         | account X have funds to transfer Y to Z ?", and if you create a
         | _copy_ of that derivation months later, get the exact same
         | result.
         | 
         | Under the hood (and simplifying a bit) Flow always does a
         | streaming shuffled read to map events from partitions to task
         | shards, and each shard maintains a min-heap to process events
         | in their ~wall-time order.
         | 
         | This also avoids the common "Tyranny of Partitioning", where
         | your upstream partitioning parallelism N also locks you into
         | that same task shard parallelism -- a big problem if tasks
         | manage a lot of state. With a read-time shuffle, you can scale
         | them independently.
         | 
         | [1]: https://github.com/estuary/flow
        
       | HelloNurse wrote:
       | The only time I used Kafka, it was involuntarily (included for
       | the sake of fashion in some complicated IBM product, where it hid
       | among WebSphere, DB2 and other bigger elephants) and it ran my
       | server out of disk space because due to a bug ridiculously
       | massive temporary files weren't erased. Needless to say, I wasn't
       | impressed: just one more hazard to worry about.
        
         | kitd wrote:
         | _due to a bug_
         | 
         | Data retention time is Kafka config 101. Are you sure it was a
         | bug?
        
           | geodel wrote:
           | Considering how half-assed Kafka is in general, that it needs
           | all clients code changes when Kafka servers are upgraded. It
           | is very likely that user hit Kafka bug.
        
             | kitd wrote:
             | Citation needed.
             | 
             | New server versions are protocol backwards compatible so
             | I'm not sure what you're referring to.
             | 
             | Ofc, if you downgraded a server without changing the
             | client, that may cause problems, but tbh that's hardly
             | Kafka's fault.
        
       | sam0x17 wrote:
       | Couldn't the "state" issue be solved simply by enclosing the
       | database save and kafka message send in the same database
       | transaction block and only doing the kafka send if it reaches
       | that part of the code?
        
       ___________________________________________________________________
       (page generated 2021-10-18 23:01 UTC)