[HN Gopher] Understanding Kafka with Factorio (2019)
       ___________________________________________________________________
        
       Understanding Kafka with Factorio (2019)
        
       Author : pul
       Score  : 119 points
       Date   : 2023-07-13 15:00 UTC (8 hours ago)
        
 (HTM) web link (ruurtjan.medium.com)
 (TXT) w3m dump (ruurtjan.medium.com)
        
       | jrm4 wrote:
       | I was 100% expecting something about the writer. :)
        
         | golergka wrote:
         | One morning, when Gregor Samsa woke from troubled dreams, he
         | found himself transformed into a piece of iron ore on a moving
         | conveyor belt.
        
           | tobbe2064 wrote:
           | That one mad me laugh so much i annoyed my wife
        
         | geodel wrote:
         | Bad software is way more popular than a good writer.
        
         | lukasLansky wrote:
         | Both Franz Kafka and Factorio come from Prague after all.
        
       | mavu wrote:
       | I clicked this and was immeasurably disapointed that the article
       | talks about Apache Kafka, and not the author Kafka and how to
       | understand his work with Factorio.
       | 
       | That would be a much much better article.
        
         | politician wrote:
         | Franz Kafka's work is known for its themes of existential
         | anxiety, guilt, and isolation, and exploring these themes
         | within the context of a game like Factorio offers a unique
         | perspective. At first glance, Factorio, a game primarily about
         | resource management, automation, and industrial progression,
         | might seem incongruous with the metaphysical despair often
         | present in Kafka's literature. However, if we delve a bit
         | deeper, we can draw some intriguing parallels.
         | 
         | The player's isolation in an alien world in Factorio mirrors
         | the sense of alienation and loneliness experienced by many of
         | Kafka's characters. There's a constant struggle against the
         | environment, similar to the struggle of Kafka's characters
         | against unseen, overpowering bureaucracies or societal
         | pressures. The game's relentless push for automation,
         | efficiency, and progression can be likened to the systemic and
         | impersonal processes that Kafka's characters often find
         | themselves trapped in. This relentless pursuit of progress,
         | with no clear end or purpose, can create a sense of existential
         | dread similar to the one found in Kafka's works, such as "The
         | Metamorphosis" or "The Trial".
         | 
         | Factorio, like Kafka's narratives, presents an absurd world.
         | The player is left to automate an entire industrial complex on
         | an alien planet, battling local fauna and managing resources,
         | all to eventually create a spaceship to escape. However, once
         | escaped, the player simply starts anew on another world, a
         | Sisyphean task much like the endless, futile labors Kafka's
         | characters often face. Through this lens, Factorio could serve
         | as a vehicle for understanding the bleak, surreal worlds Kafka
         | creates, and the existential dilemmas his characters endure.
        
           | RhodesianHunter wrote:
           | I'm not sure how to feel about the fact that we're starting
           | to see ChatGPT responses to questions like this in forums.
        
       | dang wrote:
       | Related:
       | 
       |  _Understanding Kafka with Factorio (2019)_ -
       | https://news.ycombinator.com/item?id=29304414 - Nov 2021 (72
       | comments)
       | 
       |  _Understanding Kafka with Factorio_ -
       | https://news.ycombinator.com/item?id=20362179 - July 2019 (84
       | comments)
       | 
       | (Reposts are fine after a year or so; links to past threads are
       | just to satisfy extra-curious readers)
        
       | morelisp wrote:
       | Cute, but over years of explaining it I think any explanation of
       | Kafka that presents it as a queue is bound to leave the reader
       | with more misaligned expectations than when they started (while
       | also making them think they learned something, which can be even
       | more dangerous). To keep the Factorio-esque framing, move the
       | consumers, not the messages.
        
         | kentm wrote:
         | Agreed. There's an important difference between things like
         | Kafka & Kinesis vs RabbitMQ & SQS. The latter are conceptually
         | queues, and the former are conceptually logs. Logs and queues
         | can both be used in many of the same use cases, but it's
         | important to understand how they are different.
        
           | Terr_ wrote:
           | i.e.: Items are frequently removed from queues which have a
           | shared "next item", while logs usually just get longer and
           | each consumer is responsible for keeping track of their own
           | progress or positions.
           | 
           | It's harder to think of factory-game analogies for logs,
           | since they involve copying without altering the original
           | sequence. It would have to involve some kind of moving non-
           | destructive sensor or object-cloner mechanic.
        
       | lakomen wrote:
       | Idk... maybe it's because I'm self taught and have been coding
       | since the age of 11, but I don't find the indirect approach
       | helpful, the opposite.
       | 
       | I believe that's why OO is so popular, people who only know the
       | object way of thinking, who have difficulties with the virtual
       | and abstract like OO and condemn the pragmatic approach.
        
       | cubefox wrote:
       | Semi related: Are other people also annoyed by how many projects
       | are using names of completely unrelated famous things? I expected
       | to read some wild association between a game and Franz Kafka, but
       | no, it's about a streaming platform which happens to be named
       | "Kafka". This is getting seriously annoying when you google for
       | some, e.g. historical, term and then your search results are
       | littered by some completely unrelated software/IT project which
       | reuses the name for no reason in particular. "Factorio" is
       | actually an example of how to do better: Just make up your own
       | word!
        
         | radiator wrote:
         | Yes, it should have been Apache Kafka in this case.
        
         | thomastjeffery wrote:
         | It would help tremendously if the title was changed to
         | "Understanding _Apache_ Kafka with Factorio ".
        
         | the_af wrote:
         | Even worse, Kafka and Apache Kafka have almost no meaningful
         | connection. According to Wikipedia, the author was reading
         | (Franz) Kafka, who was a writer, and the software system is
         | "optimized for writing". (Franz) Kafka wasn't "optimized" for
         | anything, so this is just whimsical naming. It could just as
         | well have been named Apache Hemingway, or Apache Tolstoy.
         | 
         | Whimsical naming is ok, but can also be confusing and annoying.
        
           | titanomachy wrote:
           | Wow, that's so unrelated. I think I'd tacitly assumed that it
           | was named that because adopting an event-driven architecture
           | results in byzantine and overly complicated software, like
           | the bureaucracies in Kafka's novels.
        
             | bee_rider wrote:
             | I'd be shocked if your explanation wasn't the real one.
             | "Optimized for writing" sounds like the sort of
             | justification you give when your project with a sarcastic
             | self-deprecating name becomes surprisingly successful.
             | 
             | If they wanted to name it after a fast, efficient famous
             | writer it would be Apache Hemingway.
             | 
             | Kafka died before most of his writing was published and
             | most of it was destroyed, which doesn't seem like something
             | the software would want to be associated with, right?
        
               | the_af wrote:
               | Your comment is on point. However...
               | 
               | ... if you read Jay Krep's introduction to logs [1] (in
               | the Kafka sense of logs) you can see that while he has a
               | nice sense of humor [2], he felt pretty good about the
               | Log abstraction and about Kafka. In no sense do I get the
               | feeling he thought he was creating a kludge or something
               | bad -- or "kafkaesque". Judging by the article, it might
               | as well been named Apache Tolstoy!
               | 
               | [1] https://engineering.linkedin.com/distributed-
               | systems/log-wha...
               | 
               | [2] "'Each working data pipeline is designed like a log;
               | each broken data pipeline is broken in its own way.' --
               | Count Leo Tolstoy (translation by the author)"
        
           | lbarrett wrote:
           | "Kafkaesque" suggests people standing in very long queues. It
           | doesn't seem unreasonable as a name for very long queue
           | software.
        
           | cubefox wrote:
           | I think many such names are mostly chosen because they kind
           | of sound nice and they have a prestigious association.
        
       | [deleted]
        
       | marginalia_nu wrote:
       | > Vertical scaling -- a bigger, exponentially more expensive
       | server
       | 
       | This is in practice not true at all. Vertical scaling is
       | typically a sublinear cost increase (up to a point, but that
       | point is a ridiculous beast of a machine), since you're
       | (typically) upgrading just the CPU and/or just the RAM or just
       | the storage; not all of them at once.
       | 
       | There are instances where you can get nearly 10x the machine for
       | 2x the cost.
        
         | geodel wrote:
         | The idea is don't let the logic come in the way of promoting
         | "web scale" software.
        
         | morelisp wrote:
         | The kind of server you'd run Kafka on tends to already be
         | pretty far up the curve. I don't think I can get 10x our
         | default broker for 20x the cost. Maybe 100x the cost. (I could
         | probably get 2x it for 2x the cost but once you value HA the
         | practical inflection point starts below the actual cost
         | intersection.)
        
         | teawrecks wrote:
         | For small consumer products sure, but we're talking at the
         | extreme end of performance and physical capabilities. Sure you
         | can get a 2Ghz CPU for ~2x the price of a 200Mhz CPU, but how
         | much are you going to pay for a 6.0Ghz CPU vs 5.0Ghz? 6.1Ghz vs
         | 6.0Ghz?
        
           | marginalia_nu wrote:
           | You can go from a 8T/16C Epyc 7xxx series CPU to a 32T/64C
           | CPU and not even double the cost.
        
             | fluoridation wrote:
             | That's more like horizontal scaling, though. You get more
             | throughput (transactions per second) but not lower latency
             | (seconds per transaction). Though it may be more cost-
             | effective to have a single 32-core machine than two 16-core
             | machines.
        
               | marginalia_nu wrote:
               | I disagree with this definition of horizontal scaling. If
               | you're moving to a bigger computer rather than more
               | computers, then you're scaling vertically and not
               | horizontally.
               | 
               | (and fwiw, wikipedia agrees with this definition: https:/
               | /en.wikipedia.org/wiki/Scalability#Horizontal_(scale_...
               | )
        
               | fluoridation wrote:
               | Then it sounds like you have a disagreement of
               | terminology with FTA, since the article is using the
               | terms like I am. Vertical scaling means increasing the
               | serial performance of the system, and horizontal scaling
               | means increasing the parallel performance of the system.
               | In this sense, vertical scaling past a certain point does
               | indeed get exponentially more expensive, while horizontal
               | scaling almost always scales linearly in cost, or better.
        
               | dekhn wrote:
               | The terms are used loosely and it doesn't make a lot of
               | sense to argue about the definitions.
               | 
               | I think it's true to say that vertical scaling normally
               | is done by increasing the RAM and CPU of a single machine
               | with a single address space and switch/bus. While
               | horizontal scaling is normally adding more machines
               | (additional addresses spaces and switch/bus).
               | Historically this is because RAM to CPU performance
               | (throughput and latency) in a single address space and
               | bus greatly exceeds the performance of any NIC connecting
               | machines with distinct address spaces and busses. And it
               | mostly ignores effects like the performance costs of
               | swapping/paging when you don't have enough RAM.
               | 
               | I haven't really seen many systems where horizontal
               | scaling is truly linear, unless the problem is
               | embarassingly parallel, like serving static content.
        
               | fluoridation wrote:
               | Note that I was referring to scaling of _cost_ , not of
               | performance. If your application parallelizes ideally,
               | then in the worst case your cost will scale linearly,
               | because you just add more machines and increase your
               | power consumption by
               | new_machine_count/previous_machine_count. It's possible
               | adding more processors in the same address space
               | increases the cost by an amount below
               | new_core_count/previous_core_count, in which case the
               | cost scales better than linearly.
        
               | marginalia_nu wrote:
               | What I'm commenting on is this phrasing from the article
               | 
               | > Vertical scaling -- a bigger, exponentially more
               | expensive _server_
               | 
               | > Horizontal scaling -- distribute the load over more
               | _servers_
        
               | teawrecks wrote:
               | Ok, I see where the lay person would get confused on
               | this. In the context of this article, every core is what
               | Wikipedia calls a "node". There is no difference between
               | a single 32C CPU and 4x 8C CPUs except for their ability
               | to share memory faster. Both are similarly defined as
               | horizontal scaling in the context of this article. You're
               | not going to finish a single workload any faster, but
               | you're going to increase the throughput of finishing
               | multiple workloads in parallel.
               | 
               | The fact that AMD chooses to package the "nodes" together
               | on one die vs multiple doesn't change that.
        
               | marginalia_nu wrote:
               | The wikipedia article qualifies what it means with
               | vertical scaling
               | 
               | > typically involving the addition of CPUs, memory or
               | storage to a single computer.
        
               | teawrecks wrote:
               | This is one of those times when I feel like you just
               | didn't read anything I typed. So... I'm just gonna let
               | you be confidently incorrect.
        
             | teawrecks wrote:
             | The article defines vertical scaling as using faster
             | conveyer belts (serial performance) and horizontal scaling
             | as using more conveyer belts (parallel performance).
             | 
             | So your example of adding more CPU cores would be
             | horizontal scaling, while using a faster core would be
             | vertical. Vertical scaling has diminishing returns.
        
           | Sohcahtoa82 wrote:
           | Think cores instead of clock speeds.
           | 
           | In the case of cloud instances, doubling cores is frequently
           | less than 100% more expensive.
        
             | The_Colonel wrote:
             | Increasing core count is not really vertical scaling. It's
             | a hybrid between vertical and horizontal scaling, having
             | some characteristics of both. It also tops out quite early
             | (especially its cost-effectiveness for many use cases, but
             | there's an absolute upper limit as well).
        
             | morelisp wrote:
             | https://aws.amazon.com/msk/pricing/ prices scale linearly
             | with CPU beginning with m5.large, and I wouldn't really
             | want to run a production Kafka on anything less than
             | m5.xlarge. (They do at least keep linearly scaling all the
             | way up.) Speculating wildly, I could probably have run some
             | of our real clusters on the equivalent of a 8xlarge, but of
             | course 32 core systems were not widely available at that
             | time. The cluster I run today, even a hypothetical 48xlarge
             | would struggle.
             | 
             | YMMV for non-managed stuff, but really, you can only bump
             | cores like 3 times realistically, 4 if you started really
             | shitty, before you start getting into special pricing
             | brackets.
        
         | dekhn wrote:
         | Disagree- typically vertical scaling is lumpy, and even worse-
         | CPU and RAM upgrades are typically not linear, because you're
         | limited by the number of slots/sockets and the manufacturers
         | intentionally charge higher (expoentially) prices for the
         | largest RAM and fastest CPUs.
        
           | defendBanana wrote:
           | With clouds this is not true anymore. They are exactly
           | linear. If you ask for a smaller node they are simply
           | propositioning a chunk of a larger machine anyway.
           | 
           | There is a point where the exponential pricing starts, but
           | that point is way out there than most people expect. Probably
           | ~100CPU, ~1TB RAM, >50Gbps network etc.
        
           | vegabook wrote:
           | If they charge these big numbers more it's precisely because
           | they're trying to capture some of the embarrassingly better
           | value you get from vertical scaling. It's a testament to
           | vertical scaling's effectiveness that they _can_ do so.
        
             | foota wrote:
             | Sure, but by doing so they consume the effectiveness?
        
               | dekhn wrote:
               | No, because you pay a fixed cost to get higher
               | performance and then benefit through the whole lifetime
               | of the product (I'm assuming you are purchasing
               | rationally and keep your machines loaded at 75% or
               | better, and your software is not egregiously wasteful).
        
           | morelisp wrote:
           | Kafka is also a system that can make pretty good general use
           | of more CPUs and more storage, but doesn't have much need for
           | RAM. Tying the CPU and RAM together whether by CPU model or
           | cloud vendor offerings is annoying if you're trying to scale
           | only vertically.
        
             | defendBanana wrote:
             | Kafka can keep a decent bit of data in RAM using file
             | system pages. Often times you end up wasting CPUs on kafka
             | nodes, not memory i think.
             | 
             | https://docs.confluent.io/platform/current/kafka/deployment
             | ....
        
               | morelisp wrote:
               | I find that if you are seeking lots of consumers around
               | large topics no amount of RAM is really sufficient, and
               | if you are mostly sticking to the tails like a regular
               | Kafka user, even 64GB is usually way more than enough.
               | 
               | CPU isn't usually a problem until you start using very
               | large compactions, and then suddenly it can be a massive
               | bottleneck. (Actually I would love to abuse more RAM here
               | but log.cleaner.dedupe.buffer.size has a tiny maximum
               | value!)
               | 
               | Kafka Streams (specifically) is also configured to
               | transact by default, even though most applications aren't
               | written to be able to actually benefit from that. If you
               | run lots of different consumer services this results in
               | burning a lot of CPU on transactions in a "flat
               | profile"-y way that's hard to illustrate to application
               | developers since each consumer, individually, is
               | relatively small - there's just thousands of them.
        
         | KRAKRISMOTT wrote:
         | Also beyond a certain point, it makes sense to go straight to
         | dedicated bare metal. The AWS tax is not worth paying if your
         | workload is mostly fixed, somewhat fault tolerant (i.e. failed
         | hardware on the weekends can be replaced on Monday without
         | major interruption to business operations), and CPU bound. Get
         | a high end machine on Hetzner and put everything behind a VPN
         | or API auth and you will save more than 50% in spending.
        
           | RhodesianHunter wrote:
           | I haven't found this to be true generally unless your
           | workloads are _truly_ completely static, which I 've never
           | actually experienced.
           | 
           | Given what engineers at this level cost, their costs per hour
           | dealing with all of the nonsense clouds handle for you
           | (networking, storage, elastic scaling, instant replacement of
           | faulty servers, load balancing, yadda yadda) end up being
           | higher than whatever tax you're paying for using the cloud.
           | 
           | Economies of scale are real.
        
       | haswell wrote:
       | I recently started playing Factorio, and I kept thinking that
       | this is what "low code" integration/automation tools should look
       | like. Developer tooling with extremely clear visuals, obvious
       | dataflow, endless combinations into which the rigidly defined
       | components can be assembled to do exactly what they do.
       | 
       | As opposed to so many takes on "flow based" programming, which
       | present some imperfect nodal representation of the program, but
       | rarely can the user make sense of what's going on by seeing stuff
       | moving around as the thing executes.
       | 
       | And by the way, be sure you're ready to sink some time in if
       | you're curious about this game...it's just too good, and I've had
       | to consciously reduce the time I'm spending, because I could just
       | keep optimizing...building...expanding...optimizing...it's built
       | in the shape of the reward center of my brain.
        
         | gjulianm wrote:
         | > And by the way, be sure you're ready to sink some time in if
         | you're curious about this game...it's just too good, and I've
         | had to consciously reduce the time I'm spending, because I
         | could just keep
         | optimizing...building...expanding...optimizing...it's built in
         | the shape of the reward center of my brain.
         | 
         | I feel the same. It scratches so many itches. I made the
         | mistake of installing Space Exploration mod after the first
         | run... so many hours invested now.
        
           | lsaferite wrote:
           | SE is such a deep hole to fall into. I feel for you.
        
           | throitallaway wrote:
           | SE is such a slog but I must keep going!
        
         | lsaferite wrote:
         | > it's built in the shape of the reward center of my brain.
         | 
         | This is a very accurate description of how I feel about
         | factorio. Thanks, I'm going to use this going forward.
        
         | ineedasername wrote:
         | >it's built in the shape of the reward center of my brain
         | 
         | Yes, it's like a distillation of the feeling I get from the
         | most enjoyable parts of my job, risking productivity loss from
         | real world responsibilities until the complexity rises high
         | enough to require project organization and advanced planning of
         | tasks that are the least enjoyable parts of my job, along the
         | lines of:
         | 
         | "Crap, I have a sudden urgent need to deal with enemies
         | creeping out from beyond my radar range that will push back
         | operationalization of my proof-of-concept production pipeline.
         | I'd estimate 3 man hours are required to perform a one-off fix
         | on the enemies & radar expansion, maybe 5 to automate long-
         | term... damnit I need a break, let me VPN into work to
         | decompress from Factorio stress."
        
           | jay_kyburz wrote:
           | Yes, but it also embodies the worst parts of my job, which is
           | that there is always more to do and the work is never ending.
        
           | cortesoft wrote:
           | I wonder who the real world equivalent of biters are...
           | sales?
        
             | oconnor663 wrote:
             | rad hits
        
             | mettamage wrote:
             | Security issues in software
        
       | not_your_mentat wrote:
       | And now I think of everything I do in Factorio models.
        
       | AbraKdabra wrote:
       | It's always good to se Cracktorio on the frontpage of HN,
       | hopefully someone will make a similar article showing the
       | similarities between Factorio and drugs.
        
         | mistermann wrote:
         | Or human culture as a video game in general, and humans as
         | semi-rational, semi-aware characters in the video game, except
         | some of the characters are special in that their job is to
         | deceive and exploit other characters, including constructing
         | illusions indistinguishable from reality of who is good and who
         | is bad, what is true and what is not, what we should do and
         | should not do, etc. And to make it all even more exciting, not
         | all of the illusionists are aware of their actual, often hybrid
         | role in the big scheme of things.
         | 
         | Maybe Shakespeare or some famous philosophers would have seen
         | this angle, were video games to exist in their era. Unfortunate
         | timing I guess.
        
       | BSEdlMMldESB wrote:
       | what about understanding our own imperialistic civilization?
       | 
       | crash in planet (or new continent?) and proceed to expand while
       | decimating native life?
       | 
       | in any case, good game.
        
         | golergka wrote:
         | That's what all life forms do -- consume everything they can
         | and reproduce as much as they can. So, doing is _is_ life.
        
           | hutzlibu wrote:
           | That's an oversimplification. If you look closer, you might
           | find that most living things are mostly living in symbiosis
           | with their surroundings. Your human body is an example of
           | cooperation of simple life forms. And when those living cells
           | of your body consume everthing they can and reproduce as much
           | as they can - than this is called cancer.
        
             | golergka wrote:
             | Those statements do not contradict each other. Life forms
             | consume and reproduce as much as they can, and they also
             | live in cooperation and symbiosis. The latter can often be
             | the most effective strategy to consume and reproduce as
             | much as one wants.
        
               | BSEdlMMldESB wrote:
               | yes, agree
               | 
               | the issue then, is how to decide what "as much as we can"
               | even means; if we do "as much as we can" but this kills
               | us, then it will have turned out that we indeed could not
               | do that much
        
               | hutzlibu wrote:
               | "That's what all life forms do -- consume everything they
               | can and reproduce as much as they can"
               | 
               | Well, if you take this statement literal, then I think
               | they do contradict each other - see my cancer example.
               | But in the way you apparently meant it, then no, there is
               | no contradiction.
        
             | Terr_ wrote:
             | Survivorship bias: Mutual benign coexistence and symbiosis
             | within the present are rooted in massacres of the past.
        
       ___________________________________________________________________
       (page generated 2023-07-13 23:00 UTC)