Post AafqXGtApEwNIoPYuG by szbalint@x0r.be
(DIR) More posts by szbalint@x0r.be
(DIR) Post #AafqXC0mxV3qAyw4a8 by szbalint@x0r.be
2023-10-11T21:50:57Z
0 likes, 0 repeats
Assuming that people have deployed eg HAProxy, there are like a dozen different timeout options that you can specify to deal with edge cases.When you have enough clients those edge cases start to matter quite a lot.So assuming you’ve configured all those timeouts in HAProxy, are you safe?No. What HAProxy docs do not sufficiently emphasize is that you need to set a bunch of kernel/tcp level settings as well in order for HAProxy to behave reasonably.
(DIR) Post #AafqXDFMMoJK0SlDvs by tim@union.place
2023-10-11T21:55:08Z
0 likes, 0 repeats
@szbalint what's funny is that (years ago) even hardware load balancers like F5's products didn't actually do these things correctly natively and you had to go in and optimize them yourself. When that was literally the intended value-add of the specialized product.It's amazing to me how bad the tech industry is at some of these things.
(DIR) Post #AafqXESVrOQTlXvF4a by szbalint@x0r.be
2023-10-11T21:54:21Z
0 likes, 0 repeats
Eg, what happens if you set a client timeout of 300s in HAProxy, and a server timeout of say eg 600s.Does this mean if a connection breaks between the end-user client and HAProxy, that it will be always detected as a client timeout?No.Let’s say it’s a HTTP/2 connection both between the client and HAProxy and HAP and the server.Client downloads an image, but suddenly drops off the net (complete packet loss without graceful connection termination)
(DIR) Post #AafqXGtApEwNIoPYuG by szbalint@x0r.be
2023-10-11T21:58:15Z
0 likes, 0 repeats
Will HAProxy act on the client timeout?No.The connection is still active, and HAProxy merely feeds the outgoing connection send buffer. There is data there pending, so HAProxy thinks the client is just slow (which is not a timeout).Unfortunately the default tcp retransmit timeout in Linux is around 900+ seconds. (retries2 = 15).But how would this show up as a server timeout?If the protocol is HTTP/2, then let’s say that HAProxy distributes…
(DIR) Post #AafqXItbMcTHWicvqq by szbalint@x0r.be
2023-10-11T22:01:50Z
0 likes, 0 repeats
…two requests to the same server.When the client is very slow (or gone without a trace), HAProxy can’t read more from the receive buffer of the connection between HAProxy and the server. Where would it put this data?So if HAProxy made two requests to the same server, the first response takes ages to transfer - then due to head-of-line blocking it can’t read the 2nd HTTP response. The server timeout applies to this, and you see server fails in your metrics.
(DIR) Post #AafqXKxvfV7ZwifPsG by szbalint@x0r.be
2023-10-11T22:05:13Z
0 likes, 0 repeats
This makes sense. HAProxy should not keep responses in idle state for longer than defined in the server timeout.However in this case this was indirectly caused by the client.Therefor it’s super important to set both TCP keepalives to a relatively low value (eg 60s instead of the default 2h) and to set tcp user timeout (tcp-ut in the HAProxy config) to somewhere above when the tcp keepalive (and retries) would finish.
(DIR) Post #AafqXNEfEFhCzCVoEi by szbalint@x0r.be
2023-10-11T22:06:33Z
0 likes, 0 repeats
(TCP keepalives only cover parts of the tcp connection timeout scenarios, this is why the need for “tcp user timeout”)There you go, now you know more about HAProxy edge-case scenarios than many SREs.