[HN Gopher] Hunting down the stuck BGP routes
       ___________________________________________________________________
        
       Hunting down the stuck BGP routes
        
       Author : bswinnerton
       Score  : 187 points
       Date   : 2021-04-21 10:31 UTC (12 hours ago)
        
 (HTM) web link (blog.benjojo.co.uk)
 (TXT) w3m dump (blog.benjojo.co.uk)
        
       | michaelbuckbee wrote:
       | So I keep coming into situations where I think this is the
       | problem that's occurring (a stuck route). While I'd certainly
       | love to be able to diagnosis this, would it even matter? There's
       | no recourse that I can take as an end user is there?
        
       | navanchauhan wrote:
       | This reminds me of when YouTube was down for a lot of the world
       | when Pakistan banned YouTube and one of the country's telecom
       | company forgot to switch off their BGP route (if that is what the
       | correct terminology would be).[0] Half as Interesting made a nice
       | YouTube video about it.[1]
       | 
       | [0] https://www.cnet.com/news/how-pakistan-knocked-youtube-
       | offli...
       | 
       | [1] https://www.youtube.com/watch?v=K9gnRs33NOk
        
         | john37386 wrote:
         | This is less likely to happen these days as BGP routes are now
         | validated through an open registry like RPKI
         | https://www.arin.net/resources/manage/rpki/
         | 
         | I am not aware how popular and which company are using it, but
         | I doubt that youtube is today as vulnerable as it was in 2008.
         | BGP securities has a lot of tractions these days and is an
         | interesting topic to follow.
        
       | teh_klev wrote:
       | From the article:
       | 
       | > _With the current "default free zone" containing around
       | 1,000,000 routes_
       | 
       | Back in ~1998 I was tasked with building a route
       | collector/looking glass machine for an internet exchange point
       | (sadly defunct). I remember the day we switched the collector on
       | and acquired "all the routes", there were ~98,000 of them, you
       | could've knocked me over with a feather. It was like looking into
       | the Total Perspective Vortex. Having been out of that game for
       | many years now I'd no idea we were up to 1M routes...wow. One of
       | the RIPE conferences I attended back then there was much concern
       | about the rapidly increasing size of the global routing table and
       | whether vendors could build hardware powerful enough to keep up.
       | 
       | For anyone interested the route collector was built on FreeBSD
       | (3.0 I think) and Zebra[0].
       | 
       | And finally, what cracking blog, especially stuff like this:
       | 
       | https://blog.benjojo.co.uk/post/eve-online-bgp-internet
       | 
       | [0]: https://en.wikipedia.org/wiki/GNU_Zebra
        
         | bogomipz wrote:
         | Interesting historical perspective, thanks. And to put that
         | growth into more recent perspective, in the last 7 years we
         | have had a "512 K" day and "768 K" day. See:
         | 
         | https://cumulusnetworks.com/blog/768k-day-importance-adaptab...
        
         | iso1631 wrote:
         | Just looked at my router in docklands, 833,000 IPv4 routes,
         | from 1.0.0.0/24 to 223.255.64.0/18
         | 
         | 32,528 of them are in the 103.0/8 range, but on the other hand
         | 21.0.0.0/8 is advertised once, no subnets at all. (same with
         | 26, 28, 30, 33, 73. I don't have a route for 9.0.0.0/8 aside
         | from 9.9.9.0/24.
         | 
         | Only 108,000 IPv6 routes.
        
           | icedchai wrote:
           | Crazy! I first started working with BGP in 1996. I remember
           | there being under 30,000 routes... total.
        
           | tialaramex wrote:
           | Two very different things are going on to cause the smaller
           | number of IPv6 routes.
           | 
           | One of course is that some Autonomous Systems don't advertise
           | IPv6, either they have no globally routable IPv6 or they only
           | achieve IPv6 via a tunnel and so that's only advertised via
           | their tunnel provider.
           | 
           | But more important, many Autonomous Systems only need to
           | advertise one prefix in IPv6 because it's big enough. Even if
           | your needs grow, because we'd done this before and because
           | IPv6 addresses are plentiful the allocations were
           | deliberately sparse - so your RIR can give you the adjacent
           | addresses, meaning you still only need one route entry for
           | your larger space.
           | 
           | With IPv4 a provider may find itself advertising hundreds or
           | even sometimes thousands of routes to the same Autonomous
           | System since the addresses they need to advertise aren't
           | contiguous.
        
       | john37386 wrote:
       | Nice article on the basic functionalities of the Internet
       | backbone. I really like the animation explaining this article
       | with nice pictures. In short, BGP has a bug that potentially
       | created a huge outage in August 2020. The proposed fix is to
       | imrove the BGP protocol with a new feature. It's not easy
       | because, it's the backbone of internet. Let's see where this will
       | go.
        
         | yardstick wrote:
         | Is a protocol change necessary here? Keep alives are already
         | sent... and they would be held up if the TCP window hit 0? At
         | which point the BGP/TCP session can be terminated and re-
         | established.
        
           | ZiiS wrote:
           | I think the argument is if _your_ keep-alives are held up
           | then currently you wait on _them_ terminating the session. If
           | they are malicious or just not working well they may not do
           | this.
        
             | 0xEFF wrote:
             | You can see the window size is zero though, so I think GP
             | is suggesting sending a TCP reset or something similar.
             | 
             | Maybe this isn't a good option because it would have too
             | many undesirable side effects?
        
           | vbernat wrote:
           | The RFC proposes to change the BGP finite-state machine, not
           | the protocol.
        
           | iso1631 wrote:
           | Like you I don't see how a change in protocol is requried, an
           | update to the RFC to say something SHOULD time out the
           | connection if the send window is zero. That said I haven't
           | read the specs with a toothcomb and perhaps there's something
           | about how you MUST NOT drop the connection if you're getting
           | keepalives?
           | 
           | Get Cisco and Juniper to implement it and that's 75% of LINX
           | covered at least, I assume other exchanges have similar
           | equipment makeup.
           | 
           | It seems reasonable behaviour to me.
           | 
           | It doesn't prevent the problem of the malicious BGP peer of
           | course, but we know that already - if they choose to ignore
           | your messages (while being happy with a high send-window) but
           | continue to send keepalives you're equally screwed.
        
             | iforgotpassword wrote:
             | If you don't put it in the RFC then you'll end up with five
             | different solutions to this problem from five different
             | vendors, and a nice 5x5 matrix of new hilarious edge cases
             | when these are talking to each other and something wonky is
             | happening to the TCP session.
        
           | eqvinox wrote:
           | BGP Keepalives are not request-reply, they are simple
           | scheduled transmissions. Which means even if one side is not
           | reading, it may still be sending keepalives. So the other
           | side keeps the session open, despite its own keepalives
           | sitting in its send queue.
           | 
           | Also, any valid BGP message resets the keepalive timer, so
           | the reading side just needs to occasionally pop something off
           | the full queue and process it. Which, say, if you're swapping
           | to hell and back, can still get done. (Assuming it even has
           | the scheduling get to killing things due to holdtime expiry.
           | It might just not be expiring anything anymore for reasons of
           | floating face-down in the river.)
        
       | sigmonsays wrote:
       | Interesting read. Its interesting that this is a big in the
       | specification and not implementation since bgp is so old. We must
       | not hit this case often
        
         | eqvinox wrote:
         | It's actually becoming more frequent due to BGP speakers being
         | increasingly multithreaded. In the olden days, if you were
         | overloaded, that also meant no more keepalives being sent. With
         | the power of multithreading you can now simultaneously be
         | overloaded and still send keepalives! :D
        
           | NovemberWhiskey wrote:
           | Ah yes; that's one of my favorites - health check returning
           | 200s instantaneously; actual service is a black hole.
        
             | jschwartzi wrote:
             | One of many problems that are only solvable with a software
             | watchdog.
        
           | EricE wrote:
           | > With the power of multithreading you can now simultaneously
           | be overloaded and still send keepalives! :D
           | 
           | Now that's progress!
           | 
           | Reminds me of some of the naive comments in a few of the
           | recent posts on HN about programming multithreading. It's not
           | as easy as just adding more threads.
           | 
           | Or sending in more trains :)
           | https://www.youtube.com/watch?v=-hyttagGsz0
        
       | anticristi wrote:
       | Thinking out loud: When I read the BGP spec, I got the feeling
       | that it was optimized for reduced churn. As the Internet routing
       | table size increased and increase in CPU power of routers was an
       | uncertainty, the architects of the Internet wanted to avoid extra
       | BGP exchanges.
       | 
       | However, now it seems like the Internet is facing new challenges
       | and a different trade-off might make sense. Why not add a "valid
       | until" attribute on each route? The originating router would have
       | to re-announce a new route every 24 hours. Failure to propagate
       | the update at any point would automatically withdraw it. Of
       | course, re-announcing 1M routes every day might be a lot, but at
       | this point it feels worth considering.
        
       | vlovich123 wrote:
       | I wonder if a robust consensus algorithm might be a better
       | investment than a timeout. I would imagine there are other bugs
       | in BGP implementations so having a routing table that's going to
       | trend towards eventual consistency regardless of the starting
       | point might be a more robust solution than just focusing on this
       | one corner case. Might be a more intrusive change though & hard
       | to get middleware to roll out such a change?
        
         | anticristi wrote:
         | Given the size and complexity of the Internet, it might be
         | worth considering making BGP tolerant to Bizantine failures.
        
           | benlivengood wrote:
           | There isn't a really useful metric for failure, though. Not
           | every prefix of every AS needs to be reachable from every
           | other. Unlike consensus problems where everyone wants to
           | agree on the same state it's sufficient for the Internet to
           | be in a working state where each AS has enough routes that
           | they care about, and BGP is pretty good at achieving that.
        
         | benlivengood wrote:
         | There isn't actually a consensus to be formed on the Internet.
         | Communities, local configuration, etc. cause BGP routers to
         | make local decisions about routes to advertise and re-advertise
         | that aren't going be part of a concensus.
        
       ___________________________________________________________________
       (page generated 2021-04-21 23:01 UTC)