[HN Gopher] Kubernetes StatefulSets are Broken
       ___________________________________________________________________
        
       Kubernetes StatefulSets are Broken
        
       Author : mjg235
       Score  : 94 points
       Date   : 2022-08-12 14:31 UTC (8 hours ago)
        
 (HTM) web link (www.plural.sh)
 (TXT) w3m dump (www.plural.sh)
        
       | aeyes wrote:
       | > Manually edit the StatefulSet volume claim with the new storage
       | size and add a dummy pod annotation to force a rolling update
       | 
       | If the PVC size changes Kubernetes automatically does an online
       | resize of the filesystem:
       | https://kubernetes.io/blog/2022/05/05/volume-expansion-ga/
       | 
       | This has been possible for 2-3 years if you had the flag enabled.
        
         | arianvanp wrote:
         | Problem is that you don't create PVCs yourself usually but the
         | statefulset manages them based on a template.
        
         | uberduper wrote:
         | Online resizing of pv has been possible for quite a while. The
         | complaint from the author is that you can do this for any
         | existing pv that was created by the sts pvc definition, but
         | scaling up the sts will create a new pod and pvc using the
         | original spec. Altering that spec in the sts manifest is
         | unpleasant when it really shouldn't be.
        
         | sweaver wrote:
         | How consistent have the results been? I used this in the past
         | with some flakiness...
        
           | jeppesen-io wrote:
           | Not OP, but I've been doing it for a few years. Can't say
           | I've ever seen an issue - with Prometheus Operator, which the
           | article mentions, as few times as well
        
       | 0xbadcafebee wrote:
       | If you need stateful files in a microservice, just mount a
       | network filesystem. Yes, it's slow. Yes, it's buggy. Yes, it's
       | not very portable. Yes, there are locking issues. But if you
       | really need stateful files in a microservice, something is
       | already fucked up. Network filesystems are a simple and cheap way
       | to add that functionality and more. You get 'infinite' storage,
       | redundancy, persistence, shared volumes, intra-cluster data
       | availability, centralized data lifecycle management, etc, and no
       | need for more custom Kubernetes logic just to persist some files.
       | It's a shitty solution that is actually good enough.
        
         | mjg235 wrote:
         | I actually think k8s is (or can be) a decent solve for non-
         | critical, low cost databases if you're confident about
         | navigating the toolchain. The one big gap is storage resizing,
         | which is part of why we're pointing it out.
        
         | oceanplexian wrote:
         | NFS doesn't have to be slow and buggy, its just that people try
         | to set it up on a buggy cloud provider, and get burned. Tons of
         | massive, mission-critical bare metal VMWare infra is backed by
         | NFS and something like a NetApp. The one I used to admin years
         | ago was connected via redundant 40Gb ethernet and it could
         | easily push many GB/s at very consistent and low latency, and
         | would fail over instantly without interruption.
        
         | spmurrayzzz wrote:
         | I'd double down even more on your latter comment of "just dont
         | use k8s" for stateful needs as its a minefield. But mounting a
         | network fs is just gonna create a whole host of headaches when
         | they decide to use the mount with anything that doesn't have
         | native io fencing etc.
        
       | MrStonedOne wrote:
        
       | openplatypus wrote:
       | Agreed, we started with Kafka, PostgreSQL and one other service
       | in K8s with StatefulSets.
       | 
       | It was painful. We quickly realized that rolling upgrades often
       | got blocked. The reason being peristent volume claims sometimes
       | get stuck.
       | 
       | And this is just one of few problems.
       | 
       | Since we moved databases and stateful services outside
       | Kubernetes, everything got faster and more reliable.
        
         | powerhour wrote:
         | I don't have a lot of experience with production k8s so I
         | haven't seen too many issues. I have seen PVCs get stuck
         | though. Recently, I was unable to do a rolling restart because
         | I messed up and used zonal disks on a regional cluster. The
         | scheduler would assign pods to nodes in the "wrong" zone and
         | then the disks could never be attached properly.
         | 
         | The fix was to add scheduler hints that moved the pods to the
         | correct zone as I should have done in the beginning. (On first
         | deployment, the disk is created on the correct node and thus
         | zone, so it all seemed to work).
        
           | dilyevsky wrote:
           | 9/10 times stuckness is a bug in particular csi or node image
           | that you use. Those are fixable. The zonal scheduling thing I
           | believe was fixed a while ago
        
             | powerhour wrote:
             | Ah, I've been on 1.19 for a while so I'm definitely behind.
             | I'm using GKE regional clusters and as such cannot do zero
             | downtime upgrades to the control plane (they break DNS) so
             | it's risky upgrading.
        
               | dilyevsky wrote:
               | Hm the topology-aware provisioning should be fixed since
               | at least 1.12/13 and is working well on our self-managed
               | regional clusters. I'd bug GKE support for this...
               | 
               | Just curious - how does GKE control plane upgrade break
               | DNS?
        
         | moreira wrote:
         | As someone who's been eyeing Kubernetes as a learning
         | experience, that actually is really disheartening to hear. If
         | you're being forced to manage your database outside of
         | Kubernetes, what are you using Kubernetes for? Running your
         | web/worker servers?
        
           | anchochilis wrote:
           | My mid-sized shop uses managed offerings for MySQL and
           | Postgres but we run several stateful workloads, including
           | ElasticSearch and MongoDB, in GKE.
           | 
           | I personally haven't experienced the issues the OP is
           | describing. It might be that StatefulSets and PVCs are more
           | stable now then when OP tried them. Of course using
           | managed/hosted K8s makes a big difference.
        
             | openplatypus wrote:
             | We use managed K8s. And we dropped StatefulSet maybe 4
             | months ago.
             | 
             | There might have been a way to support our use case better.
             | But this was managed service and we couldn't easily tweak
             | every settings. But also, we didn't want to.
             | 
             | Our goal of using Kubernetes is to make difficult stuff
             | easy. Not to make easy stuff difficult. And in our skillset
             | and capacity moving stateful services outside K8s was
             | easier, cheaper and more reliable.
        
           | openplatypus wrote:
           | Microservices, automations, APIs.
           | 
           | K8s is great way to easily deploy and scale services. As we
           | leverage Kafka, a lot of applications fallback on Topics and
           | KTables for storage.
           | 
           | Kubernetes with jobs, cron jobs, operators, secrets etc is
           | effectively an Operating System for the cloud.
           | 
           | Definitely worth the investment to learn and practice.
        
           | [deleted]
        
           | ohthehugemanate wrote:
           | if you're interested in it as a learning experience, this is
           | a great setup. start by putting the stateless parts of your
           | application into k8s, to get the normal flow. then try and
           | move your database in . stateful workloads are tricky and
           | maybe impossible to really do well in k8s, but there are a
           | lot of options that almost get you there, and maybe get you
           | to "good enough." and the learning you will get is fantastic
           | for understanding lots of core k8s capabilities.
        
           | waynesonfire wrote:
           | stop asking questions ... here have some of this koolaid.
           | it's great. it's not complex _jedi hand wave_ and it works
           | for me.
        
       | Spivak wrote:
       | I cannot understand why people in this thread are saying people
       | who want state are holding it wrong. If you are saying that k8s
       | falls down on state -- literally the only hard problem -- then
       | why are you bothering take on the complexity of k8s? What are you
       | running on k8s that is stateless and can't be served by ec2
       | instances in auto scaling groups? State management is the killer
       | feature for k8s; it manages all the complexity and presents you a
       | fiction where stateful apps can be written like they're actually
       | stateless.
        
         | mjg235 wrote:
         | People are definitely sleeping on k8s' capability in managing
         | stateful workloads. We run various datastores on it all the
         | time, and it's quite reliable for a lot of use cases (largely
         | because the underlying cloud block stores it's orchestrating
         | are incredibly robust).
         | 
         | That said there are warts that should be removed, but that's
         | not surprising.
        
         | oceanplexian wrote:
         | The vast majority of developers don't understand what
         | "stateless" means, since if you want to get technical is no
         | such thing as a stateless system.
         | 
         | You might have drained requests off of a pod, but it still has
         | a "state", it still has things cached in memory, it might still
         | have connections open to some outside entity (Database,
         | whatever), and the developer, not K8s, is responsible for
         | catching those signals, cleaning up things in memory,
         | gracefully terminating connections, handing off in-flight
         | workloads to another pod, etc. Even in the upper echelons of
         | the tech I see a very small minority of developers actually
         | aware of all the things that can make stateless workloads
         | stateful. Which is OK for something where the stakes are low,
         | but if you're a DBA or a Systems oriented person you'll see
         | people make these (very wrong) assumptions all the time and
         | recoil in horror.
        
       | singron wrote:
       | I just looked at the scripts for the last StatefulSet thing I
       | did, and it has the exact workaround in the apply script. I.e.
       | recreate the ss, edit the pvcs, and restart the rollout when
       | there is a volume size change.
       | 
       | K8s has really been an experience of encountering 2-4 year old
       | bugs (closed by the stale bot of course) and missing features. We
       | accumulate increasingly more workarounds and additional pieces of
       | complexity to deal with this. Everything is always changing, but
       | somehow still staying the same.
        
         | mjg235 wrote:
         | Yeah it's wild that it's been a pain point for so long without
         | a properly supported solve.
        
       | CodesInChaos wrote:
       | Sparse block storage is one of the cloud features I miss most.
       | What I'd expect:
       | 
       | * You can create a volume, but pay only for in-use blocks.
       | 
       | * You create a volume bigger than what you'll ever need => no
       | resizing required (At least 16 TiB, but possibly even something
       | crazy like an exabyte)
       | 
       | * There is some kind of "trim" operation that marks a block as
       | free
       | 
       | * Ideally it's possible to choose the maximum block index (size
       | of block-volume) independently from the maximum number of in-use
       | blocks (for cost control)
        
         | benlivengood wrote:
         | Sparse block storage is achievable with s3backer and similar
         | tools which treat bucket/object storage as arbitrarily large
         | volumes.
         | 
         | Practically, sizing thin-provisioned filesystems too large
         | wastes a lot of space on filesystem metadata.
         | 
         | All operating systems and clouds that I've worked with support
         | online volume and filesystem resizing which is as close to the
         | semantics of thin-provisioning as I've ever needed. I wasn't
         | actually aware that StatefulSets don't resize volumes as
         | smoothly as desired but I haven't had a problem manually
         | deleting pods to get the filesystem on a PVC to grow (the way
         | it works for Deployments).
        
         | AtlasBarfed wrote:
         | AWS has elastic filesystem, but the perf is crap from what I
         | understand.
         | 
         | I haven't used it, so your mileage is a total unknown, but our
         | "big data" folks like it.
         | 
         | AWS won't change to your desired model, they mint money on
         | underutilized resources (and hidden I/O charges/network)
        
         | lifty wrote:
         | Great feature, but cloud providers are incentivised to make it
         | difficult for people to save money. This would instantly
         | vaporise a big chunk of revenue, because people would not have
         | to pay for pre-provisioned storage.
        
           | waynesonfire wrote:
           | Yep, the cloud long term is a terrible investment. Due to its
           | sticky nature innovation in technology is seldom passed down
           | to consumers and instead is pocketed by the provider.
           | 
           | I guarantee you internally cloud providers over provision
           | their storage.
        
           | pgwhalen wrote:
           | > Great feature, but cloud providers are incentivised to make
           | it difficult for people to save money.
           | 
           | Every business is incentivized this way. Competition is what
           | pushes prices down, of which there is plenty in the cloud
           | industry (though maybe still not as much as we would like).
        
           | thinkmassive wrote:
           | AWS EBS volumes used by a PVC can be expanded without
           | downtime by increasing the request on the PVC. The only
           | prerequisite is that the StorageClass has
           | allowVolumeExpansion=true
           | 
           | Edit: Now that I actually read TFA I see it mentions most CSI
           | drivers already provide this functionality. They provide a
           | workaround similar to what I'm sure most people use, and I
           | agree this functionality seems like it could be handled by
           | StatefulSet.
        
           | advisedwang wrote:
           | Maybe one of the lower tier providers should use it as a
           | differentiator.
        
             | djbusby wrote:
             | Why add a low margin feature to your product/service?
        
               | bo0tzz wrote:
               | Because it draws in customers that will then also use
               | your higher margin features.
        
               | discodave wrote:
               | Amazon historically has gone after 'low margin'
               | businesses. The cloud, at 25% margins (for AWS) is is
               | 'low margin' compared to traditional software sales.
        
         | dpedu wrote:
         | It's not at the block level, but Amazon's EFS checks most of
         | these boxes.
        
           | topspin wrote:
           | > Amazon's EFS
           | 
           | No snapshots. No support for ACLs. Poor performance. High
           | cost.
        
           | chrsig wrote:
           | i read a story on HN recently about someone having left a
           | `nohup yes > /some/efs/mount/file.txt` type command running,
           | and not noticing until the bill came due...
           | 
           | since then, I'm totally on board with pre-provisioned
           | storage. at the very least, some way to put an upper bound on
           | volume size (maybe efs has that? not sure.)
        
             | switch007 wrote:
             | Oh wow that's horrific. That will write gigabytes in a few
             | seconds on an SSD.
             | 
             | Unsurprisingly, AWS EFS does not support NFS quotas.
        
             | kqr wrote:
             | For production software, you should _always_ have an upper
             | limit on _everything_.
             | 
             | There is always a point where you can tell in advance that
             | it doesn't make sense to keep going. It is never sensible
             | to literally go on forever with anything -- it can only
             | break stuff in annoying ways when it runs into real world
             | limitations (like financial ones in that case.)
             | 
             | I sound aggressive about this because it's such a common
             | mistake. It always goes something like,
             | 
             | "Why does this list have to be unbounded?"
             | 
             | "Well, we don't want to give the user an error because it's
             | full."
             | 
             | "Okay, but does it really need to support 35,246,953
             | instances?"
             | 
             | "Sure, why not?"
             | 
             | "How long would the main interaction with the system take
             | if you stress it to that level?"
             | 
             | "Oh, I don't know, at that level it might well take 20
             | minutes."
             | 
             | "And the clients usually timeout after?..."
             | 
             | "5 seconds."
             | 
             | "Would the user rather wait for 20 minutes and then get a
             | response that might be outdated by that time, or get an
             | error right away?"
             | 
             | "They may well prefer the error at that point."
             | 
             | "So let's go backwards from that. Will it ever make sense
             | to support more than 250,000 instances?"
             | 
             | "That corresponds to the five second timeout and then some.
             | I guess that's fine in practise..."
             | 
             | It's not that hard!
        
               | chrsig wrote:
               | 100% agreed.
               | 
               | Often limits need to be addressed at the product level.
               | At some point at my $work we started pushing hard enough
               | back at product to say that we're placing a hard limit on
               | _every_ entity. What the limit is can be negotiated, re-
               | evaluated, and changed -- but changes to it need to be
               | intentional and done with consideration to operational
               | impact.
               | 
               | My personal pet peeve is when i'm dealing with a client
               | library that doesn't expose some sort of timeout.
               | 
               | I'd also add unbounded queues to the list above as a
               | subtle place where the lack of a limit can really cause
               | production issues. Everything may look like it's working
               | fine until you realize that you've got a 4gb process with
               | a giant buffer that'll take forever to drain.
        
             | purplerabbit wrote:
             | Terrifying. Please link the story if you can find it!
        
               | chrsig wrote:
               | luckily it was recent and unique -- easy for google to
               | find :)
               | 
               | here it is: https://news.ycombinator.com/item?id=32175328
        
         | dmitryminkovsky wrote:
         | Where is such block storage available? Seems fantastical...?
        
           | Nullabillity wrote:
           | Ceph's RBD basically works like this (but it's a self-hosted
           | software package, not IaaS), so you still need to manage the
           | storage pool yourself.
        
           | topspin wrote:
           | "Where is such block storage available? Seems
           | fantastical...?"
           | 
           | SAN vendors have been selling devices that deliver thin
           | provisioned network block devices you can snapshot, clone,
           | resize dynamically and replicate in a cluster for about 20
           | years now. Third parties can do all of this in cloud
           | environments as well. See Netapp Cloud Volumes. You could
           | roll your own using Ceph, or even a ghetto version with a
           | subset of these capabilities with thin provisioned LVM+iSCSI,
           | DRBD etc.
           | 
           | You should be able to point and click a thin provisioned,
           | dynamically expandable, snapshot-able, clone-able, replicate-
           | able pay as you go network block device with whatever degree
           | of performance, availability and capacity you wish to pay
           | for. The fact that you can't is a function of the business
           | model and mentality of IaaS cloud operators; they don't want
           | your old fashioned 'stateful' workloads. Neither does
           | kubernetes. So you're swimming against the current.
        
       | powerhour wrote:
       | This is more a criticism of operators than StatefulSets. The
       | promise of operators is easier automated management of your
       | resources. If the operator can't do basic things like resizing
       | disks, it's flawed and needs to be fixed.
       | 
       | However, I've always been suspicious of operators, so maybe my
       | bias is leaking here. Kubernetes is already a level of
       | abstraction and adding another seems risky and unnecessary.
        
         | dmead wrote:
         | I agree with this. I manage a monorepo of 10s of
         | terraform/terragrunt modules most of which are deployed to a
         | k8s cluster.
         | 
         | the whole point of code as infrastructure is that your git repo
         | is an audit log of whats changed in your infra. Once you leave
         | it up to operators or persistent volume claim templates (or
         | whatever else makes side effectful changes with being told)
         | you're throwing all that niceness of control out the window.
        
           | raffraffraff wrote:
           | Infra as code can't actively manage things like database
           | clusters, handling promotion of a secondary, automatic
           | deployment of new replicas, scaling. Using it to deploy RDS
           | Aurora doesn't count. My impression is that operators are
           | designed to handle those "active" decisions in real-time, and
           | that's outside the realm of terraform, GitOps etc.
        
             | dmead wrote:
             | Indeed, and if the k8s api was so precisely laid out that
             | you could look at a log of side effect changes that came as
             | the result of scalers, operator, templates etc that would
             | be one thing, but we aren't there yet.
             | 
             | Right now what i have is "oh the cluster is in a different
             | state (there is drift) than the code, lets be a detective
             | to see why this changed".
        
           | encryptluks2 wrote:
           | A Git repo can still be a good audit log even if you use
           | templates, especially if you're adding a step to transform
           | files with each push.
        
         | shp0ngle wrote:
         | And now imagine it's all in a Helm chart. And using kustomize
         | magic
         | 
         | So you start debugging the helm chart, and kustomize, and the
         | operator.
         | 
         | Not fun.
        
           | mkl95 wrote:
           | Your service is now crippled by a bunch of spaghetti APIs on
           | top of the stuff that was crippling it before. You are cloud-
           | scale!
        
             | code_biologist wrote:
             | Now that you've hit cloud scale, engineering leadership can
             | make the case for budget for an ops hire to keep the thing
             | working. Bigger headcount is always a plus.
        
               | mkl95 wrote:
               | Also make sure that Dev and Ops are separate departments
               | that are not allowed to work together. Devops? We don't
               | do that here. BTW you need to fix that 15-second query
               | ASAP.
        
               | ok_dad wrote:
               | > Bigger headcount is always a plus.
               | 
               | You can't make a baby in 1 month with 9.5 women.
        
         | nithril wrote:
         | As long as the operator is really well designed but it is
         | rarely the case. I switched from helm to bare k8s manifest for
         | the same reason, helm/operator can be helpful, but not when the
         | maintainer introduces breaking change (that does not relate to
         | the service by itself) that needs a delete/deploy of the k8s
         | resources
        
           | encryptluks2 wrote:
           | To be fair Helm is just a template layer for manifests, so it
           | would take quite a bit of effort to break things.. especially
           | if you're using a projects official Helm charts.
        
             | sieabahlpark wrote:
        
           | bavell wrote:
           | Yep, this is also the reason I don't use helm in any of my
           | clusters. Too much magic and extra dependencies for a
           | marginal (IMO) benefit.
        
         | debarshri wrote:
         | Best operators are the operators that you don't have to deal
         | with. The moment you have to investigate stuff, life is hell.
        
         | mjg235 wrote:
         | The flaw is definitely most acute when it is combined with
         | operators, but it would make storage resizes vastly more
         | accessible for less capable devs if it were able to be done
         | declaratively via the spec instead of via a painful workaround.
        
         | uberduper wrote:
         | An operator could trivially create the pvc ahead of any sts
         | scaling op. The sts will simply look for an existing pvc by a
         | predictable name and use it rather than create a new one,
         | regardless of whether the pre-created pvc matches the spec of
         | what the sts defines.
        
       | ec109685 wrote:
       | This fundamentally untrue: "But, Kubernetes was originally
       | intended to act as a container orchestration platform for
       | stateless workloads, not stateful applications."
       | 
       | Even from the early days, there were constructs to allow stateful
       | applications to be built on top of it.
       | 
       | They have made it easier to run stateful applications, but to say
       | it wasn't designed to support stateful applications is incorrect:
       | https://kubernetes.io/blog/2016/08/stateful-applications-usi...
       | 
       | "Stateful applications such as MySQL, Kafka, Cassandra, and
       | Couchbase for a while, the introduction of Pet Sets has
       | significantly improved this support."
        
       | koprulusector wrote:
       | These are valid points. I've personally needed to do things like
       | delete the StatefulSet, change volume size, update/change pvc,
       | update StatefulSet, reapply. I just took it as a given and hadn't
       | thought much about the ergonomics.
       | 
       | It's really not that big of a deal and probably a ten minute
       | process, tops. That said, OP makes a good point that mechanisms
       | to handle this kind of stuff exist in CSI, so why not leverage /
       | take advantage of that versus pissing in the wind (especially
       | when you run into CRD that wants to undo the above work while
       | you're in the process)?
        
         | smarterclayton wrote:
         | At the time we designed the StatefulSet, the resize behavior
         | hadn't been designed / finalized.
         | 
         | Now that it has, it's really a matter of someone having the
         | time and patience to drive it though. I don't know whether
         | someone has sorted out all the design implications (like what
         | happens if someone adds new resource requests, or what happens
         | if a new resource request is rejected, and how that gets
         | reported to the user), but it's a small change that needs a
         | fair amount of design thought, which tends to be hard to drive
         | through.
        
       | zoomzoom wrote:
       | Generally, from teams we talk to at Coherence
       | (withcoherence.com), it seems to be a mistake to try to run
       | stateful workloads in Kubernetes. As much as the cost savings
       | compared to managed services like AWS RDS or GCP Cloud SQL are
       | attractive, the configuration time and maintenance are just not
       | worth it. Our perspective is that container orchestration tools
       | should be used to manage stateless workloads, and that stateful
       | workloads should delegate to managed services where volume
       | reliability, scalability, and backup can be handled with purpose-
       | built tools.
        
       | de6u99er wrote:
       | There's a pending pull request that solves this problem since end
       | of June.
       | 
       | https://github.com/kubernetes/enhancements/pull/3412
        
       | AtlasBarfed wrote:
       | k8s is made for stateless servers. But "stateful sets" has always
       | seemed like a bolt on.
       | 
       | Also, for important "big data" stores with replication, I don't
       | want to trust all-in-one third party operators, which is what all
       | K8S operators seem to aspire to be. Your data is important, and
       | you should evaluate all your data operations use cases yourself.
        
         | MrStonedOne wrote:
        
       | lokar wrote:
       | This totally misunderstands the motivation for statefull sets.
       | The only thing they offer is the stable per-task names. They are
       | a reimplantation of jobs in borg where you always have stable
       | task numbers. It's really more for shared services then ones with
       | persistent state.
        
         | bogomipz wrote:
         | From the latest documentation[1]:
         | 
         | StatefulSets are valuable for applications that require one or
         | more of the following.
         | 
         | Stable, unique network identifiers.
         | 
         | Stable, persistent storage.
         | 
         | Ordered, graceful deployment and scaling.
         | 
         | Ordered, automated rolling updates.
         | 
         | [1]
         | https://kubernetes.io/docs/concepts/workloads/controllers/st...
        
           | Filligree wrote:
           | Which confuses me, frankly.
           | 
           | When do you _not_ need automated rolling updates? Or stable
           | network identifiers? I 'm sure there are cases, but it seems
           | like supporting them should be the default -- are
           | statefulsets somehow expensive?
        
             | bogomipz wrote:
             | >"When do you not need automated rolling updates? Or stable
             | network identifiers?"
             | 
             | Most of the time with stateless apps.
             | 
             | In regards to automated rolling updates, the key word there
             | is "ordered." It will always select the same ordinal index
             | first(reverse order starting with highest ordinal.)[1] This
             | is different than the regular deployment strategy of
             | "rolling update" which is also one by one but will select
             | any pod to whack first.
             | 
             | Regular pods don't have stable network identifiers as they
             | are inherently unstable. In regular pods stable network
             | identifiers are provided by either the cluterIP or a load
             | balancer service.
             | 
             | [1] https://kubernetes.io/docs/tutorials/stateful-
             | application/ba...
        
             | lokar wrote:
             | Consider a job of N identical API servers. Your request can
             | go to any of them. To update we could just add new tasks
             | and take away old tasks. It's a gradual update, but not
             | "rolling" is the sens that each task name has a single love
             | instance at a time.
        
             | mjg235 wrote:
             | A basic stateless web app behind a load balancer can
             | tolerate unstable network identifiers as long as you
             | automate registering/deregistering the new names, which
             | kubernetes services do very well.
             | 
             | Stateful apps can be a bit more exceptional because they
             | often have client libraries that require you to hardcode
             | network addresses via connection strings and other tooling
             | that makes that glue automation more tricky and worth the
             | added guarantee around consistent naming.
        
             | lokar wrote:
             | I'f it was named today I'm sure it would be called a "non
             | fungible set" :)
        
           | lokar wrote:
           | Persistent storage is its own API, you don't need to use it
           | with a satefull set. All of the other properties are implied
           | by stable task names.
           | 
           | And I would guess the doc is a bit of after the fact
           | justification for a design choice already made.
        
             | mdaniel wrote:
             | > Persistent storage is its own API, you don't need to use
             | it with a satefull set. All of the other properties are
             | implied by stable task names.
             | 
             | I believe that's only partially true, since even if one has
             | a PVC in something like a Deployment (the antipattern I see
             | *a lot* in folks new to k8s trying to run mysql or
             | whatever), then "k scale --replicas=10 deploy/my-deploy"
             | will not allocate _new_ PVCs for the new pods
             | 
             | That's why I think StatefulSets are valuable as their own
             | resource type: they represent _future_ storage needs
        
             | mcronce wrote:
             | Persistent storage _tied to a specific member of the
             | deployment_ is a StatefulSet feature. That 's likely what
             | they mean by "stable", although I really wish they were
             | more clear about it
        
             | bogomipz wrote:
             | Your comment makes no sense. Persistent storage is one
             | component of a stateful set. You are not even using correct
             | Kubernetes terminology, I have never once seen the term
             | "stable task name."
        
         | vorpalhex wrote:
         | Incorrect. An STS changes the deployment/rollout mechanisms
         | away from blue-green.
        
         | sleepybrett wrote:
         | You, in fact, are misunderstanding the motivation of stateful
         | sets.
        
       | nullwarp wrote:
       | I have completely given up on anything related to stateful
       | applications in Kubernetes. Most of the time it can work okay but
       | the amount of headache's we've experienced with it we've
       | completely abandoned using Kubernetes for anything isn't
       | stateless and requiring dynamic scaling.
        
       | linuxftw wrote:
       | Sometimes operators interfere with what you're trying to do, as
       | the article points out the Prometheus operator. Pro tip: Scale
       | down the operator, make your changes, put everything back the way
       | the operator expects it, scale operator back up.
        
       ___________________________________________________________________
       (page generated 2022-08-12 23:02 UTC)