hngopher.com

       [HN Gopher] Researchers discover major roadblock in alleviating ...
       ___________________________________________________________________
        
       Researchers discover major roadblock in alleviating network
       congestion
        
       Author : rntn
       Score  : 80 points
       Date   : 2022-08-04 09:27 UTC (13 hours ago)
        
 (HTM) web link (news.mit.edu)
 (TXT) w3m dump (news.mit.edu)
        
       | stingraycharles wrote:
       | Does anyone have a link to the paper? I've been working with
       | various congestion control / QoS algorithms over the past two
       | years as a hobby, and there are plenty of new developments going
       | on in recent years. I'm curious which algorithms they studied,
       | and what the actual roadblock is, because I'm sceptical they
       | weren't just looking for a great punch line for an article (e.g.
       | perhaps the problem is more theoretical than practical).
        
         | Deathmax wrote:
         | I'm guessing it's http://people.csail.mit.edu/venkatar/cc-
         | starvation.pdf
        
         | [deleted]
        
       | mhandley wrote:
       | The paper is about delay-based congestion control algorithms such
       | as BBR, Vegas, etc. From their conclusion:
       | 
       | "We offer three key conclusions for CCA designers, who should
       | model (or better estimate) non-congestive delays explicitly in
       | delay-convergent CCAs. First, to utilize the link efficiently, a
       | CCA must maintain a queue that is larger than the non-congestive
       | delay on the path; second, this alone is not enough to avoid
       | starvation, but in addition the variation in the queueing delay
       | in steady state must also be greater than one-half of the delay
       | jitter; and third, if we have a prior upper bound on sending
       | rate, we may be able to avoid starvation while also reducing the
       | queueing delay variation."
       | 
       | So what non-congestive delays are the problem? A key one is the
       | way modern WiFi agregates multiple packets into AMPDUs for
       | transmission to reduce the number of medium acquisitions it needs
       | to perform. When WiFi is under heavy load this gives good
       | throughput, at the expense of per-packet jitter. If I understand
       | correctly, their conclusions mean delay-based congestion control
       | loses the signal it needs to converge when the target queue size
       | is similar to the non-congestive jitter. Many of us working on
       | congestion control have wondered about this in the context of
       | BBR, and it's great that this paper formalizes the issue.
       | 
       | The implication is that such delay-based schemes need to add
       | perhaps a few tends of milliseconds to their queue delay targets
       | to avoid potential starvation. Doesn't seem like a real
       | roadblock, as per the article title, but the desire to reduce
       | latency further does perhaps increase the incentive for more
       | ECN/AQM research. It's perfectly possible to return ECN
       | congestion marking before a queue starts to build, so this
       | latency cost isn't fundamental.
        
       | vadiml wrote:
       | Given the fact that the root cause of the problem with current CC
       | algorithms is inability to discriminate between congestion and
       | jitter provoked delays the obvious solution will be to implement
       | some kind of method to report jitter state to the source. Maybe
       | some kind of ICMP packet or IP option.
        
         | urthor wrote:
         | Not sure that will work in real world scenarios.
         | 
         | How would you know they're honestly reporting jitter?
        
         | foobiekr wrote:
         | Someone should coin this as a law: "All problems on the
         | internet will be solved with more bandwidth."
         | 
         | Everything else will turn out to be a configuration burden,
         | control plane load problem, security issue, hard to debug
         | (especially after the fact, impossible) and so on.
         | 
         | Bandwidth and NPUs with designed-in minimal latency are easy to
         | metric, easy to measure, easy to deploy, easy to implement and
         | so on. They have very predictable behaviors.
         | 
         | Reality is that we are entering a phase where networks can be
         | vastly simplified. MPLS is going, SR is here for the time
         | being, QoS is dying, multicast dead, SDWAN is going to be a
         | thing for a few more years then dead, and so on.
        
         | jdthedisciple wrote:
         | At least in TCP can't jitter be detected and measured using
         | ACKs?
         | 
         | I thought some algorithms even do that already
        
           | toast0 wrote:
           | You can measure jitter, but you don't know how much of the
           | delay was due to congestion, and how much was due to other
           | factors.
           | 
           | Something mentioned elsewhere in the thread is wifi physical
           | layers may wait to send small packets until they get a few
           | small packets to aggregate or a timeout. Other systems that
           | aggregate packets may do something similar.
           | 
           | On time division multiplexed systems, it may take measurable
           | time to wait for an assigned slot, even if the channel isn't
           | congested. Some packets would get lucky and have a short wait
           | and others would have a longer wait.
           | 
           | This would be challenging to signal as the delay is added at
           | a hop-by-hop level, but whether the delay is significant
           | enough to signal is unknowable at that level. Maybe you could
           | ask all hops to add to a field indicating milliseconds of
           | delay (or some other fixed increments), but I don't think
           | there's room for that in existing fields and you'd have a
           | heck of a time getting meaningful support. ECN took how long
           | to be usable on the internet, because some middleboxes would
           | drop connections with it enabled, to say nothing of devices
           | actually flagging congestion.
        
           | signa11 wrote:
           | rfc-4689 (iirc) defines jitter as the fluctuation in
           | forwarding delay between 2 consecutive received packets in a
           | stream.
           | 
           | not sure how this would work for tcp-ack's which can be
           | cumulative. moreover, any approach that does this measurement
           | must account for both delayed and lost/corrupt packets.
           | 
           | imho, only a true realtime jitter measurement would do,
           | anything else would be a crude approximation, and might
           | result in the same flaws as before...
           | 
           | [edit]: examples of '...anything else...' mentioned above
           | might be inter-arrival histogram where receiver relies on
           | packets being transmitted at a fixed cadence, in this case
           | lost/corrupted packets would badly skew the computed numbers.
           | another approach might be post-processing (after packet
           | capture) where limited buffer space might prove to be the
           | achilles heel etc.
        
             | vadiml wrote:
             | In VOIP applications the correct jitter estimation is very
             | important and RTP/RTCP protocols are doing it pretty well.
             | Of course each RTP packet has a timestamp which simplifies
             | the task
        
         | [deleted]
        
         | stingraycharles wrote:
         | Isn't this already somewhat there, the ECN bit of TCP?
        
       | skyde wrote:
       | Instead of using TCP connection as the unit of bandwidth, ISP use
       | "Link/circuit". Think of it like configuring a guaranteed minimum
       | bandwidth for a VNET.
        
       | bryanrasmussen wrote:
       | If you cannot avoid starvation, can you detect that starvation
       | has happened to a particular user? I suspect the answer is no,
       | because if you can detect which user is starved surely you could
       | take bandwidth from those with a surfeit etc.
        
       | gz5 wrote:
       | Flow control and QoS was critical 20 years ago. I helped build a
       | global VoIP network, and some of our patents included dynamic
       | routing across multiple ASes...they were critical.
       | 
       | Now, we (different company) have similar real-time algorithms,
       | and the algorithms see much less problems (mainly across
       | backbones of AWS, Azure, Oracle, IBM, Alibaba).
       | 
       | I suspect this is due to more bandwidth, more routes and better
       | network optimization from the ISPs (we still see last mile issues
       | but those are often a result of problems which better flow
       | control algorithms usually can't completely solve).
       | 
       | Curious if ISP engineers can give a more expert view on the
       | current state of the _need_ or _impact_ of better flow control in
       | middle mile and /or last mile situations?
        
         | phkahler wrote:
         | Not a networking guy. I'm curious if packets have priorities,
         | and if so does everyone get greedy and claim to be high
         | priority? They talk about delay reduction in the article, but a
         | lot of the internet bandwidth today seems to be video which
         | doesn't need to have low latency once it's got a bit in the
         | receive buffer. It just seems like gamers packets should be
         | prioritized for delay while streaming stuff should be (maybe)
         | prioritized for bandwidth, possibly with changing priority
         | depending how far ahead of the viewer the buffer is. Not sure
         | where regular web traffic would fit in this - probably low
         | delay?
        
           | uluyol wrote:
           | Paket properties are not respected on the public internet,
           | but organizations do make use of them internally.
           | 
           | For public clouds who operate global networks, they can
           | typically send video streams at low/mid priority all the way
           | to "last mile" ISP by peering with so many networks and just
           | running massive WANs internally. So they can get most of the
           | benefits of prioritization, even though the internet doesn't
           | support it.
        
           | kkielhofner wrote:
           | "The internet" is "best effort". What this really means is
           | all of the networks that make up the internet don't pay
           | attention to the DSCP marks[0] in the IP header.
           | 
           | In reality almost all large networks (internal and internet
           | traffic handling) use MPLS[1] or some variant to
           | tunnel/encapsulate different types of traffic and handle
           | priority that way while not paying attention to whatever DSCP
           | markings users can arbitrarily set. MPLS (in most cases) is
           | invisible to the end user so the carrier can do their own QoS
           | while not allowing customer configuration to impact it.
           | 
           | If "the internet" cared about DSCP you would definitely see
           | the situation you're describing where everyone would just
           | mark their traffic highest priority. Note you can still mark
           | it, it's just that no one cares or respects it.
           | 
           | On your network and queues you can definitely use DSCP and
           | 802.1p[2] (layer 2 - most commonly ethernet) to prioritize
           | traffic. Thing here is you need equipment end to end (every
           | router, switch, etc) that's capable of parsing these headers
           | and adjusting queueing accordingly.
           | 
           | As if this isn't complicated enough, in the case of the
           | typical edge connection (a circuit from an ISP) you don't
           | have direct control of inbound traffic - when it gets to you
           | is just when it gets to you.
           | 
           | Unless you use something like ifb[3], in which case you can
           | kind of fake ingress queuing by way of wrapping it through
           | another interface that effectively makes the traffic look
           | like egress traffic. All you can really do here is introduce
           | delay and or drop packets which for TCP traffic most commonly
           | will trigger TCP congestion control, causing the transmitting
           | side to back off because they'll think they're sending data
           | too fast for your link.
           | 
           | UDP doesn't have congestion control but in practice that just
           | means it's implemented higher in the stack. Protocols like
           | QUIC, etc have their own congestion control implemented that
           | in many cases can effectively behave like TCP. The difference
           | here is the behavior in these scenarios is left dictated to
           | the implementation as opposed to being at the mercy of the
           | kernel/C lib/wherever else TCP is implemented.
           | 
           | Clear as mud, right?
           | 
           | Good news is many modern end user routers just kind of handle
           | this with things like FQ-Codel, etc.
           | 
           | [0] - https://en.wikipedia.org/wiki/Differentiated_services
           | 
           | [1] -
           | https://en.wikipedia.org/wiki/Multiprotocol_Label_Switching
           | 
           | [2] - https://en.wikipedia.org/wiki/IEEE_P802.1p
           | 
           | [3] - https://wiki.linuxfoundation.org/networking/ifb
        
             | sitkack wrote:
             | Thank you! I learned a ton.
        
               | kkielhofner wrote:
               | No problem. I'm happy I've had experience with all of
               | this and learned it. I'm happier I don't have to deal
               | with it on a daily basis anymore!
        
         | foobiekr wrote:
         | Honestly, QoS at the internet level was never really a big
         | thing outside of special cases like VoIP. Network gear vendors
         | tried like hell, desperately, from around 1998 forward to
         | convince everyone to do diffserv, qos, etc etc etc like crazy
         | plus DPI because they thought they would charge more by having
         | complex features and "be more than a dumb pipe."
         | 
         | The situation now is that bandwidth is plentiful. A lot changed
         | in 20Y.
         | 
         | 100G and 400G are now quite cheap - most of the COGS for a
         | given router is the optics, not the chassis, control plane or
         | NPU, and optics has been, on and off, a pretty competitive
         | space.
         | 
         | Plus, almost all the traffic growth has been in cache-friendly
         | content - non-video/audio/sw image growth has been modest and
         | vastly outpaced by those. Not just cache friendly but layered
         | cache friendly. Of the modern traffic types, only realtime
         | audio-video like Zoom is both high volume and sensitive to
         | latency and congestion. That's a small component, is often (but
         | not always) either hosted or has a dedicated network of pops
         | that do early handoff, and so on, so your typical backbone is
         | now mostly carrying CDN cache misses..
        
         | vlovich123 wrote:
         | And 20 years of properly tuning congestion control algorithms.
         | Don't underestimate the benefits BBR and the fight against
         | buffer bloat in the early 2000s did to improve the quality of
         | TCP stacks.
        
       | irrational wrote:
       | Like GI Joe says, Knowing is half the battle. Now that we know
       | about the problem, hopefully someone(s) can devise a solution.
        
       | bornfreddy wrote:
       | Spoiler: they have only shown that _existing_ algorithms can 't
       | always avoid starvation, not that such algorithms don't exist.
        
         | Sakos wrote:
         | From the article:
         | 
         | > While Alizadeh and his co-authors weren't able to find a
         | traditional congestion control algorithm that could avoid
         | starvation, there may be algorithms in a different class that
         | could prevent this problem. Their analysis also suggests that
         | changing how these algorithms work, so that they allow for
         | larger variations in delay, could help prevent starvation in
         | some network situations.
        
           | bornfreddy wrote:
           | Yes. The title is misleading though, these are not
           | roadblocks, just imperfections of existing solutions.
        
             | davidgrenier wrote:
             | Proudly, I thought I had identified a pun.
        
             | headsoup wrote:
             | Roadblocks can be removed...
        
             | sixbrx wrote:
             | Roadblocks don't block hypothetical flying cars either
             | though.
        
       ___________________________________________________________________
       (page generated 2022-08-04 23:01 UTC)