[HN Gopher] Kubernetes' dirty endpoint secret and Ingress
___________________________________________________________________
Kubernetes' dirty endpoint secret and Ingress
Author : richardfey
Score : 72 points
Date : 2022-03-22 20:34 UTC (2 hours ago)
(HTM) web link (philpearl.github.io)
(TXT) w3m dump (philpearl.github.io)
| motoboi wrote:
| There is a bug in GKE causing 502 when using CNI network.
|
| The big is triggered by port names in ingresses. Use port numbers
| and you should be good to go.
| TurningCanadian wrote:
| Check out
|
| https://kubernetes.github.io/ingress-nginx/user-guide/nginx-...
|
| if you're running nginx. Consider setting it to true instead of
| the default (false)
|
| ---
|
| By default the NGINX ingress controller uses a list of all
| endpoints (Pod IP/port) in the NGINX upstream configuration.
|
| The nginx.ingress.kubernetes.io/service-upstream annotation
| disables that behavior and instead uses a single upstream in
| NGINX, the service's Cluster IP and port.
|
| This can be desirable for things like zero-downtime deployments .
| nhoughto wrote:
| ah good tip, I don't care about session affinity or custom
| balancing algos so that works. I'd imagine running in GKE or
| AWS you would also avoid the DNAT / conntrack overhead as pods
| by default use a routable VPC IP instead of a magic CNI IP.
| Would have to test that though.
|
| Quote from related issue:
|
| The NGINX ingress controller does not uses Services to route
| traffic to the pods. Instead it uses the Endpoints API in order
| to bypass kube-proxy to allow NGINX features like session
| affinity and custom load balancing algorithms. It also removes
| some overhead, such as conntrack entries for iptables DNAT.
| AaronBBrown wrote:
| This is a design flaw in Kubernetes. The article doesn't really
| explain what's happening though. The real problem is that there
| is no synchronization between the ingress controller (which
| manages the ingress software configuration, e.g. nginx from the
| Endpoints resources), kube-proxy (which manages iptables rules
| from the Endpoints resource), and kubelet (which sends the
| signals to the container). A presStop hook w/ a sleep equivalent
| to an acceptable timeout will handle the 99%+ cases (and the
| cases it doesn't will have exceeded your timeout anyhow). Things
| become more complicated when there are sidecar containers (say an
| envoy or nginx routing to another container in the same pod) and
| that often requires shenanigans such as shared emptyDir{} volumes
| that waits (with fsnotify or similar) for socket files to be
| closed to ensure requests are fully completed.
| kodah wrote:
| I mean, technically, you can recreate this scenario on a single
| host as well. Send a sigterm to an application and try to swap
| in another instance of it.
|
| System fundamentals are at the heart of that problem: SIGTERM
| is just what it is, it's a signal and an application can choose
| to acknowledge it and do something or catch it and ignore it.
| The system also has no way of knowing what the application
| chose to do.
|
| All that to say, I'm not sure it's as much of a _flaw_ in
| Kubernetes as much as it 's the way systems work and Kubernetes
| is reflecting that.
| lolc wrote:
| In my view it is a clear flaw that the signal to terminate
| can arrive while the server is still getting new requests.
| Being able to steer traffic based on your knowledge of the
| state of the system is one of the reasons why you'd want to
| set up an integrated environment where the load-balancer and
| servers are controlled from the same process.
|
| The time to send the signal is entirely under control of the
| managing process. It could synchronize with the load-balancer
| before sending pods the term signal, and I'm unclear why this
| isn't done.
| spullara wrote:
| Tomcat had similar behavior when I was using it except it would
| bind the listener before it was ready to serve traffic with
| similar results.
| zeckalpha wrote:
| (2019)
| richardfey wrote:
| It is still 100% applicable (AFAIK) and informative, with the
| (2019) in title readers will think it's not relevant anymore?
| gscho wrote:
| Wait until you find out that kubernetes secrets aren't actually
| secrets but base64 encoded strings.
| twalla wrote:
| Encryption at rest for secrets can be enabled, the base64 thing
| is more of an artifact of how JSON serialization works with
| byte arrays.
| notwedtm wrote:
| I think K8S secrets get a bad wrap. They are not intended to be
| secret in the sense that they are "kept from prying eyes by
| default". The secret object is simply a first-class citizen
| that differentiates it from a ConfigMap in a way that allows
| distinct ACL's.
|
| Most organizations I know will still use something like
| ExternalSecret for source control and then populate the Secret
| with the values once in cluster and to an object with very few
| access points.
| gscho wrote:
| I think calling it a secret when it isn't gave it a bad wrap.
| The last time I looked at the documentation it didn't even
| clearly describe that it is not a secure object (that may
| have changed recently). Why call it a secret when it is not
| even close to one? I guess thing-to-store-secrets-if-you-use-
| rbac was too long.
| dharmab wrote:
| They're not necessarily strings. You can put binary data in the
| data field, which is why it is base64.
|
| You can also configure the apiserver/etcd to encrypt specific
| keyspaces, such as the secrets/ key space.
| zaat wrote:
| It is in your hands (the version where it became available is
| more than a year end of life, basically forever in Kubernetes
| life), maybe they will change the default too. At least there's
| a fine bold warning box in the docs.
|
| https://kubernetes.io/docs/tasks/administer-cluster/encrypt-...
| nhoughto wrote:
| Is this true of the OOTB GKE nginx ingress? Hard to tell by 'load
| balancer' do they mean nginx ingress reverse proxy?
|
| I can imagine the delay between updating the GCP global load
| balancer service from GKE would be much higher than nginx-ingress
| reacting to changes in pod health/endpoints.
|
| Either way I guess the takeaway is there is a race there between
| endpoints being updated and those updates propagating, and seems
| like that isn't handled as perfectly as people assume, and this
| likely gets worse with node contention and Kube API performance
| problems.
| cyberpunk wrote:
| By load balancer they mean internal kubernetes "service" object
| that a given ingress uses as it's backing service.
| mad_vill wrote:
| the issues I see with kubernetes ingress are more related to an
| ingress pod going down than the upstream.
| cyberpunk wrote:
| What controller are you using? I've absolutely smashed nginx
| and the aws elb controllers and never seen them flinch...
| blaisio wrote:
| Yes! I think this is a really under-reported issue. It's
| basically caused by kubernetes doing things without confirming
| everyone responded to prior status updates. It affects every
| ingress controller, and it also affects services of type "Load
| Balancer" and there isn't a real fix. Even if you add a timeout
| in the pre stop hook, that still might not handle it 100% of the
| time. IMO it is a design flaw in Kubernetes.
| [deleted]
| LimaBearz wrote:
| Not defending the situation of a preStop hook at least in the
| case of API's k8s can handle it 100%, its just messy.
|
| We have a preStop hook of 62s. 60s timeouts are set in our
| apps, 61s is set on the ALBs (ensuring the ALB is never the
| cause of the hangup), and 62s on the preStop to make sure
| nothing has come into the container in the last 62s.
|
| Then we set a terminationGracePeriodSeconds of 60 just to make
| sure it doesn't pop off too fast. This gives us 120s where
| nothing happens and anything in flight can get to where its
| going.
| rifelpet wrote:
| There's a KubeCon North America talk (also from 2019) that goes
| into more detail on this very issue including some additional
| recommendations
|
| https://youtu.be/0o5C12kzEDI?t=57
___________________________________________________________________
(page generated 2022-03-22 23:00 UTC)