[HN Gopher] Understanding Kafka with Factorio (2019)
___________________________________________________________________
Understanding Kafka with Factorio (2019)
Author : pul
Score : 119 points
Date : 2023-07-13 15:00 UTC (8 hours ago)
(HTM) web link (ruurtjan.medium.com)
(TXT) w3m dump (ruurtjan.medium.com)
| jrm4 wrote:
| I was 100% expecting something about the writer. :)
| golergka wrote:
| One morning, when Gregor Samsa woke from troubled dreams, he
| found himself transformed into a piece of iron ore on a moving
| conveyor belt.
| tobbe2064 wrote:
| That one mad me laugh so much i annoyed my wife
| geodel wrote:
| Bad software is way more popular than a good writer.
| lukasLansky wrote:
| Both Franz Kafka and Factorio come from Prague after all.
| mavu wrote:
| I clicked this and was immeasurably disapointed that the article
| talks about Apache Kafka, and not the author Kafka and how to
| understand his work with Factorio.
|
| That would be a much much better article.
| politician wrote:
| Franz Kafka's work is known for its themes of existential
| anxiety, guilt, and isolation, and exploring these themes
| within the context of a game like Factorio offers a unique
| perspective. At first glance, Factorio, a game primarily about
| resource management, automation, and industrial progression,
| might seem incongruous with the metaphysical despair often
| present in Kafka's literature. However, if we delve a bit
| deeper, we can draw some intriguing parallels.
|
| The player's isolation in an alien world in Factorio mirrors
| the sense of alienation and loneliness experienced by many of
| Kafka's characters. There's a constant struggle against the
| environment, similar to the struggle of Kafka's characters
| against unseen, overpowering bureaucracies or societal
| pressures. The game's relentless push for automation,
| efficiency, and progression can be likened to the systemic and
| impersonal processes that Kafka's characters often find
| themselves trapped in. This relentless pursuit of progress,
| with no clear end or purpose, can create a sense of existential
| dread similar to the one found in Kafka's works, such as "The
| Metamorphosis" or "The Trial".
|
| Factorio, like Kafka's narratives, presents an absurd world.
| The player is left to automate an entire industrial complex on
| an alien planet, battling local fauna and managing resources,
| all to eventually create a spaceship to escape. However, once
| escaped, the player simply starts anew on another world, a
| Sisyphean task much like the endless, futile labors Kafka's
| characters often face. Through this lens, Factorio could serve
| as a vehicle for understanding the bleak, surreal worlds Kafka
| creates, and the existential dilemmas his characters endure.
| RhodesianHunter wrote:
| I'm not sure how to feel about the fact that we're starting
| to see ChatGPT responses to questions like this in forums.
| dang wrote:
| Related:
|
| _Understanding Kafka with Factorio (2019)_ -
| https://news.ycombinator.com/item?id=29304414 - Nov 2021 (72
| comments)
|
| _Understanding Kafka with Factorio_ -
| https://news.ycombinator.com/item?id=20362179 - July 2019 (84
| comments)
|
| (Reposts are fine after a year or so; links to past threads are
| just to satisfy extra-curious readers)
| morelisp wrote:
| Cute, but over years of explaining it I think any explanation of
| Kafka that presents it as a queue is bound to leave the reader
| with more misaligned expectations than when they started (while
| also making them think they learned something, which can be even
| more dangerous). To keep the Factorio-esque framing, move the
| consumers, not the messages.
| kentm wrote:
| Agreed. There's an important difference between things like
| Kafka & Kinesis vs RabbitMQ & SQS. The latter are conceptually
| queues, and the former are conceptually logs. Logs and queues
| can both be used in many of the same use cases, but it's
| important to understand how they are different.
| Terr_ wrote:
| i.e.: Items are frequently removed from queues which have a
| shared "next item", while logs usually just get longer and
| each consumer is responsible for keeping track of their own
| progress or positions.
|
| It's harder to think of factory-game analogies for logs,
| since they involve copying without altering the original
| sequence. It would have to involve some kind of moving non-
| destructive sensor or object-cloner mechanic.
| lakomen wrote:
| Idk... maybe it's because I'm self taught and have been coding
| since the age of 11, but I don't find the indirect approach
| helpful, the opposite.
|
| I believe that's why OO is so popular, people who only know the
| object way of thinking, who have difficulties with the virtual
| and abstract like OO and condemn the pragmatic approach.
| cubefox wrote:
| Semi related: Are other people also annoyed by how many projects
| are using names of completely unrelated famous things? I expected
| to read some wild association between a game and Franz Kafka, but
| no, it's about a streaming platform which happens to be named
| "Kafka". This is getting seriously annoying when you google for
| some, e.g. historical, term and then your search results are
| littered by some completely unrelated software/IT project which
| reuses the name for no reason in particular. "Factorio" is
| actually an example of how to do better: Just make up your own
| word!
| radiator wrote:
| Yes, it should have been Apache Kafka in this case.
| thomastjeffery wrote:
| It would help tremendously if the title was changed to
| "Understanding _Apache_ Kafka with Factorio ".
| the_af wrote:
| Even worse, Kafka and Apache Kafka have almost no meaningful
| connection. According to Wikipedia, the author was reading
| (Franz) Kafka, who was a writer, and the software system is
| "optimized for writing". (Franz) Kafka wasn't "optimized" for
| anything, so this is just whimsical naming. It could just as
| well have been named Apache Hemingway, or Apache Tolstoy.
|
| Whimsical naming is ok, but can also be confusing and annoying.
| titanomachy wrote:
| Wow, that's so unrelated. I think I'd tacitly assumed that it
| was named that because adopting an event-driven architecture
| results in byzantine and overly complicated software, like
| the bureaucracies in Kafka's novels.
| bee_rider wrote:
| I'd be shocked if your explanation wasn't the real one.
| "Optimized for writing" sounds like the sort of
| justification you give when your project with a sarcastic
| self-deprecating name becomes surprisingly successful.
|
| If they wanted to name it after a fast, efficient famous
| writer it would be Apache Hemingway.
|
| Kafka died before most of his writing was published and
| most of it was destroyed, which doesn't seem like something
| the software would want to be associated with, right?
| the_af wrote:
| Your comment is on point. However...
|
| ... if you read Jay Krep's introduction to logs [1] (in
| the Kafka sense of logs) you can see that while he has a
| nice sense of humor [2], he felt pretty good about the
| Log abstraction and about Kafka. In no sense do I get the
| feeling he thought he was creating a kludge or something
| bad -- or "kafkaesque". Judging by the article, it might
| as well been named Apache Tolstoy!
|
| [1] https://engineering.linkedin.com/distributed-
| systems/log-wha...
|
| [2] "'Each working data pipeline is designed like a log;
| each broken data pipeline is broken in its own way.' --
| Count Leo Tolstoy (translation by the author)"
| lbarrett wrote:
| "Kafkaesque" suggests people standing in very long queues. It
| doesn't seem unreasonable as a name for very long queue
| software.
| cubefox wrote:
| I think many such names are mostly chosen because they kind
| of sound nice and they have a prestigious association.
| [deleted]
| marginalia_nu wrote:
| > Vertical scaling -- a bigger, exponentially more expensive
| server
|
| This is in practice not true at all. Vertical scaling is
| typically a sublinear cost increase (up to a point, but that
| point is a ridiculous beast of a machine), since you're
| (typically) upgrading just the CPU and/or just the RAM or just
| the storage; not all of them at once.
|
| There are instances where you can get nearly 10x the machine for
| 2x the cost.
| geodel wrote:
| The idea is don't let the logic come in the way of promoting
| "web scale" software.
| morelisp wrote:
| The kind of server you'd run Kafka on tends to already be
| pretty far up the curve. I don't think I can get 10x our
| default broker for 20x the cost. Maybe 100x the cost. (I could
| probably get 2x it for 2x the cost but once you value HA the
| practical inflection point starts below the actual cost
| intersection.)
| teawrecks wrote:
| For small consumer products sure, but we're talking at the
| extreme end of performance and physical capabilities. Sure you
| can get a 2Ghz CPU for ~2x the price of a 200Mhz CPU, but how
| much are you going to pay for a 6.0Ghz CPU vs 5.0Ghz? 6.1Ghz vs
| 6.0Ghz?
| marginalia_nu wrote:
| You can go from a 8T/16C Epyc 7xxx series CPU to a 32T/64C
| CPU and not even double the cost.
| fluoridation wrote:
| That's more like horizontal scaling, though. You get more
| throughput (transactions per second) but not lower latency
| (seconds per transaction). Though it may be more cost-
| effective to have a single 32-core machine than two 16-core
| machines.
| marginalia_nu wrote:
| I disagree with this definition of horizontal scaling. If
| you're moving to a bigger computer rather than more
| computers, then you're scaling vertically and not
| horizontally.
|
| (and fwiw, wikipedia agrees with this definition: https:/
| /en.wikipedia.org/wiki/Scalability#Horizontal_(scale_...
| )
| fluoridation wrote:
| Then it sounds like you have a disagreement of
| terminology with FTA, since the article is using the
| terms like I am. Vertical scaling means increasing the
| serial performance of the system, and horizontal scaling
| means increasing the parallel performance of the system.
| In this sense, vertical scaling past a certain point does
| indeed get exponentially more expensive, while horizontal
| scaling almost always scales linearly in cost, or better.
| dekhn wrote:
| The terms are used loosely and it doesn't make a lot of
| sense to argue about the definitions.
|
| I think it's true to say that vertical scaling normally
| is done by increasing the RAM and CPU of a single machine
| with a single address space and switch/bus. While
| horizontal scaling is normally adding more machines
| (additional addresses spaces and switch/bus).
| Historically this is because RAM to CPU performance
| (throughput and latency) in a single address space and
| bus greatly exceeds the performance of any NIC connecting
| machines with distinct address spaces and busses. And it
| mostly ignores effects like the performance costs of
| swapping/paging when you don't have enough RAM.
|
| I haven't really seen many systems where horizontal
| scaling is truly linear, unless the problem is
| embarassingly parallel, like serving static content.
| fluoridation wrote:
| Note that I was referring to scaling of _cost_ , not of
| performance. If your application parallelizes ideally,
| then in the worst case your cost will scale linearly,
| because you just add more machines and increase your
| power consumption by
| new_machine_count/previous_machine_count. It's possible
| adding more processors in the same address space
| increases the cost by an amount below
| new_core_count/previous_core_count, in which case the
| cost scales better than linearly.
| marginalia_nu wrote:
| What I'm commenting on is this phrasing from the article
|
| > Vertical scaling -- a bigger, exponentially more
| expensive _server_
|
| > Horizontal scaling -- distribute the load over more
| _servers_
| teawrecks wrote:
| Ok, I see where the lay person would get confused on
| this. In the context of this article, every core is what
| Wikipedia calls a "node". There is no difference between
| a single 32C CPU and 4x 8C CPUs except for their ability
| to share memory faster. Both are similarly defined as
| horizontal scaling in the context of this article. You're
| not going to finish a single workload any faster, but
| you're going to increase the throughput of finishing
| multiple workloads in parallel.
|
| The fact that AMD chooses to package the "nodes" together
| on one die vs multiple doesn't change that.
| marginalia_nu wrote:
| The wikipedia article qualifies what it means with
| vertical scaling
|
| > typically involving the addition of CPUs, memory or
| storage to a single computer.
| teawrecks wrote:
| This is one of those times when I feel like you just
| didn't read anything I typed. So... I'm just gonna let
| you be confidently incorrect.
| teawrecks wrote:
| The article defines vertical scaling as using faster
| conveyer belts (serial performance) and horizontal scaling
| as using more conveyer belts (parallel performance).
|
| So your example of adding more CPU cores would be
| horizontal scaling, while using a faster core would be
| vertical. Vertical scaling has diminishing returns.
| Sohcahtoa82 wrote:
| Think cores instead of clock speeds.
|
| In the case of cloud instances, doubling cores is frequently
| less than 100% more expensive.
| The_Colonel wrote:
| Increasing core count is not really vertical scaling. It's
| a hybrid between vertical and horizontal scaling, having
| some characteristics of both. It also tops out quite early
| (especially its cost-effectiveness for many use cases, but
| there's an absolute upper limit as well).
| morelisp wrote:
| https://aws.amazon.com/msk/pricing/ prices scale linearly
| with CPU beginning with m5.large, and I wouldn't really
| want to run a production Kafka on anything less than
| m5.xlarge. (They do at least keep linearly scaling all the
| way up.) Speculating wildly, I could probably have run some
| of our real clusters on the equivalent of a 8xlarge, but of
| course 32 core systems were not widely available at that
| time. The cluster I run today, even a hypothetical 48xlarge
| would struggle.
|
| YMMV for non-managed stuff, but really, you can only bump
| cores like 3 times realistically, 4 if you started really
| shitty, before you start getting into special pricing
| brackets.
| dekhn wrote:
| Disagree- typically vertical scaling is lumpy, and even worse-
| CPU and RAM upgrades are typically not linear, because you're
| limited by the number of slots/sockets and the manufacturers
| intentionally charge higher (expoentially) prices for the
| largest RAM and fastest CPUs.
| defendBanana wrote:
| With clouds this is not true anymore. They are exactly
| linear. If you ask for a smaller node they are simply
| propositioning a chunk of a larger machine anyway.
|
| There is a point where the exponential pricing starts, but
| that point is way out there than most people expect. Probably
| ~100CPU, ~1TB RAM, >50Gbps network etc.
| vegabook wrote:
| If they charge these big numbers more it's precisely because
| they're trying to capture some of the embarrassingly better
| value you get from vertical scaling. It's a testament to
| vertical scaling's effectiveness that they _can_ do so.
| foota wrote:
| Sure, but by doing so they consume the effectiveness?
| dekhn wrote:
| No, because you pay a fixed cost to get higher
| performance and then benefit through the whole lifetime
| of the product (I'm assuming you are purchasing
| rationally and keep your machines loaded at 75% or
| better, and your software is not egregiously wasteful).
| morelisp wrote:
| Kafka is also a system that can make pretty good general use
| of more CPUs and more storage, but doesn't have much need for
| RAM. Tying the CPU and RAM together whether by CPU model or
| cloud vendor offerings is annoying if you're trying to scale
| only vertically.
| defendBanana wrote:
| Kafka can keep a decent bit of data in RAM using file
| system pages. Often times you end up wasting CPUs on kafka
| nodes, not memory i think.
|
| https://docs.confluent.io/platform/current/kafka/deployment
| ....
| morelisp wrote:
| I find that if you are seeking lots of consumers around
| large topics no amount of RAM is really sufficient, and
| if you are mostly sticking to the tails like a regular
| Kafka user, even 64GB is usually way more than enough.
|
| CPU isn't usually a problem until you start using very
| large compactions, and then suddenly it can be a massive
| bottleneck. (Actually I would love to abuse more RAM here
| but log.cleaner.dedupe.buffer.size has a tiny maximum
| value!)
|
| Kafka Streams (specifically) is also configured to
| transact by default, even though most applications aren't
| written to be able to actually benefit from that. If you
| run lots of different consumer services this results in
| burning a lot of CPU on transactions in a "flat
| profile"-y way that's hard to illustrate to application
| developers since each consumer, individually, is
| relatively small - there's just thousands of them.
| KRAKRISMOTT wrote:
| Also beyond a certain point, it makes sense to go straight to
| dedicated bare metal. The AWS tax is not worth paying if your
| workload is mostly fixed, somewhat fault tolerant (i.e. failed
| hardware on the weekends can be replaced on Monday without
| major interruption to business operations), and CPU bound. Get
| a high end machine on Hetzner and put everything behind a VPN
| or API auth and you will save more than 50% in spending.
| RhodesianHunter wrote:
| I haven't found this to be true generally unless your
| workloads are _truly_ completely static, which I 've never
| actually experienced.
|
| Given what engineers at this level cost, their costs per hour
| dealing with all of the nonsense clouds handle for you
| (networking, storage, elastic scaling, instant replacement of
| faulty servers, load balancing, yadda yadda) end up being
| higher than whatever tax you're paying for using the cloud.
|
| Economies of scale are real.
| haswell wrote:
| I recently started playing Factorio, and I kept thinking that
| this is what "low code" integration/automation tools should look
| like. Developer tooling with extremely clear visuals, obvious
| dataflow, endless combinations into which the rigidly defined
| components can be assembled to do exactly what they do.
|
| As opposed to so many takes on "flow based" programming, which
| present some imperfect nodal representation of the program, but
| rarely can the user make sense of what's going on by seeing stuff
| moving around as the thing executes.
|
| And by the way, be sure you're ready to sink some time in if
| you're curious about this game...it's just too good, and I've had
| to consciously reduce the time I'm spending, because I could just
| keep optimizing...building...expanding...optimizing...it's built
| in the shape of the reward center of my brain.
| gjulianm wrote:
| > And by the way, be sure you're ready to sink some time in if
| you're curious about this game...it's just too good, and I've
| had to consciously reduce the time I'm spending, because I
| could just keep
| optimizing...building...expanding...optimizing...it's built in
| the shape of the reward center of my brain.
|
| I feel the same. It scratches so many itches. I made the
| mistake of installing Space Exploration mod after the first
| run... so many hours invested now.
| lsaferite wrote:
| SE is such a deep hole to fall into. I feel for you.
| throitallaway wrote:
| SE is such a slog but I must keep going!
| lsaferite wrote:
| > it's built in the shape of the reward center of my brain.
|
| This is a very accurate description of how I feel about
| factorio. Thanks, I'm going to use this going forward.
| ineedasername wrote:
| >it's built in the shape of the reward center of my brain
|
| Yes, it's like a distillation of the feeling I get from the
| most enjoyable parts of my job, risking productivity loss from
| real world responsibilities until the complexity rises high
| enough to require project organization and advanced planning of
| tasks that are the least enjoyable parts of my job, along the
| lines of:
|
| "Crap, I have a sudden urgent need to deal with enemies
| creeping out from beyond my radar range that will push back
| operationalization of my proof-of-concept production pipeline.
| I'd estimate 3 man hours are required to perform a one-off fix
| on the enemies & radar expansion, maybe 5 to automate long-
| term... damnit I need a break, let me VPN into work to
| decompress from Factorio stress."
| jay_kyburz wrote:
| Yes, but it also embodies the worst parts of my job, which is
| that there is always more to do and the work is never ending.
| cortesoft wrote:
| I wonder who the real world equivalent of biters are...
| sales?
| oconnor663 wrote:
| rad hits
| mettamage wrote:
| Security issues in software
| not_your_mentat wrote:
| And now I think of everything I do in Factorio models.
| AbraKdabra wrote:
| It's always good to se Cracktorio on the frontpage of HN,
| hopefully someone will make a similar article showing the
| similarities between Factorio and drugs.
| mistermann wrote:
| Or human culture as a video game in general, and humans as
| semi-rational, semi-aware characters in the video game, except
| some of the characters are special in that their job is to
| deceive and exploit other characters, including constructing
| illusions indistinguishable from reality of who is good and who
| is bad, what is true and what is not, what we should do and
| should not do, etc. And to make it all even more exciting, not
| all of the illusionists are aware of their actual, often hybrid
| role in the big scheme of things.
|
| Maybe Shakespeare or some famous philosophers would have seen
| this angle, were video games to exist in their era. Unfortunate
| timing I guess.
| BSEdlMMldESB wrote:
| what about understanding our own imperialistic civilization?
|
| crash in planet (or new continent?) and proceed to expand while
| decimating native life?
|
| in any case, good game.
| golergka wrote:
| That's what all life forms do -- consume everything they can
| and reproduce as much as they can. So, doing is _is_ life.
| hutzlibu wrote:
| That's an oversimplification. If you look closer, you might
| find that most living things are mostly living in symbiosis
| with their surroundings. Your human body is an example of
| cooperation of simple life forms. And when those living cells
| of your body consume everthing they can and reproduce as much
| as they can - than this is called cancer.
| golergka wrote:
| Those statements do not contradict each other. Life forms
| consume and reproduce as much as they can, and they also
| live in cooperation and symbiosis. The latter can often be
| the most effective strategy to consume and reproduce as
| much as one wants.
| BSEdlMMldESB wrote:
| yes, agree
|
| the issue then, is how to decide what "as much as we can"
| even means; if we do "as much as we can" but this kills
| us, then it will have turned out that we indeed could not
| do that much
| hutzlibu wrote:
| "That's what all life forms do -- consume everything they
| can and reproduce as much as they can"
|
| Well, if you take this statement literal, then I think
| they do contradict each other - see my cancer example.
| But in the way you apparently meant it, then no, there is
| no contradiction.
| Terr_ wrote:
| Survivorship bias: Mutual benign coexistence and symbiosis
| within the present are rooted in massacres of the past.
___________________________________________________________________
(page generated 2023-07-13 23:00 UTC)