[HN Gopher] Hunting down the stuck BGP routes
___________________________________________________________________
Hunting down the stuck BGP routes
Author : bswinnerton
Score : 187 points
Date : 2021-04-21 10:31 UTC (12 hours ago)
(HTM) web link (blog.benjojo.co.uk)
(TXT) w3m dump (blog.benjojo.co.uk)
| michaelbuckbee wrote:
| So I keep coming into situations where I think this is the
| problem that's occurring (a stuck route). While I'd certainly
| love to be able to diagnosis this, would it even matter? There's
| no recourse that I can take as an end user is there?
| navanchauhan wrote:
| This reminds me of when YouTube was down for a lot of the world
| when Pakistan banned YouTube and one of the country's telecom
| company forgot to switch off their BGP route (if that is what the
| correct terminology would be).[0] Half as Interesting made a nice
| YouTube video about it.[1]
|
| [0] https://www.cnet.com/news/how-pakistan-knocked-youtube-
| offli...
|
| [1] https://www.youtube.com/watch?v=K9gnRs33NOk
| john37386 wrote:
| This is less likely to happen these days as BGP routes are now
| validated through an open registry like RPKI
| https://www.arin.net/resources/manage/rpki/
|
| I am not aware how popular and which company are using it, but
| I doubt that youtube is today as vulnerable as it was in 2008.
| BGP securities has a lot of tractions these days and is an
| interesting topic to follow.
| teh_klev wrote:
| From the article:
|
| > _With the current "default free zone" containing around
| 1,000,000 routes_
|
| Back in ~1998 I was tasked with building a route
| collector/looking glass machine for an internet exchange point
| (sadly defunct). I remember the day we switched the collector on
| and acquired "all the routes", there were ~98,000 of them, you
| could've knocked me over with a feather. It was like looking into
| the Total Perspective Vortex. Having been out of that game for
| many years now I'd no idea we were up to 1M routes...wow. One of
| the RIPE conferences I attended back then there was much concern
| about the rapidly increasing size of the global routing table and
| whether vendors could build hardware powerful enough to keep up.
|
| For anyone interested the route collector was built on FreeBSD
| (3.0 I think) and Zebra[0].
|
| And finally, what cracking blog, especially stuff like this:
|
| https://blog.benjojo.co.uk/post/eve-online-bgp-internet
|
| [0]: https://en.wikipedia.org/wiki/GNU_Zebra
| bogomipz wrote:
| Interesting historical perspective, thanks. And to put that
| growth into more recent perspective, in the last 7 years we
| have had a "512 K" day and "768 K" day. See:
|
| https://cumulusnetworks.com/blog/768k-day-importance-adaptab...
| iso1631 wrote:
| Just looked at my router in docklands, 833,000 IPv4 routes,
| from 1.0.0.0/24 to 223.255.64.0/18
|
| 32,528 of them are in the 103.0/8 range, but on the other hand
| 21.0.0.0/8 is advertised once, no subnets at all. (same with
| 26, 28, 30, 33, 73. I don't have a route for 9.0.0.0/8 aside
| from 9.9.9.0/24.
|
| Only 108,000 IPv6 routes.
| icedchai wrote:
| Crazy! I first started working with BGP in 1996. I remember
| there being under 30,000 routes... total.
| tialaramex wrote:
| Two very different things are going on to cause the smaller
| number of IPv6 routes.
|
| One of course is that some Autonomous Systems don't advertise
| IPv6, either they have no globally routable IPv6 or they only
| achieve IPv6 via a tunnel and so that's only advertised via
| their tunnel provider.
|
| But more important, many Autonomous Systems only need to
| advertise one prefix in IPv6 because it's big enough. Even if
| your needs grow, because we'd done this before and because
| IPv6 addresses are plentiful the allocations were
| deliberately sparse - so your RIR can give you the adjacent
| addresses, meaning you still only need one route entry for
| your larger space.
|
| With IPv4 a provider may find itself advertising hundreds or
| even sometimes thousands of routes to the same Autonomous
| System since the addresses they need to advertise aren't
| contiguous.
| john37386 wrote:
| Nice article on the basic functionalities of the Internet
| backbone. I really like the animation explaining this article
| with nice pictures. In short, BGP has a bug that potentially
| created a huge outage in August 2020. The proposed fix is to
| imrove the BGP protocol with a new feature. It's not easy
| because, it's the backbone of internet. Let's see where this will
| go.
| yardstick wrote:
| Is a protocol change necessary here? Keep alives are already
| sent... and they would be held up if the TCP window hit 0? At
| which point the BGP/TCP session can be terminated and re-
| established.
| ZiiS wrote:
| I think the argument is if _your_ keep-alives are held up
| then currently you wait on _them_ terminating the session. If
| they are malicious or just not working well they may not do
| this.
| 0xEFF wrote:
| You can see the window size is zero though, so I think GP
| is suggesting sending a TCP reset or something similar.
|
| Maybe this isn't a good option because it would have too
| many undesirable side effects?
| vbernat wrote:
| The RFC proposes to change the BGP finite-state machine, not
| the protocol.
| iso1631 wrote:
| Like you I don't see how a change in protocol is requried, an
| update to the RFC to say something SHOULD time out the
| connection if the send window is zero. That said I haven't
| read the specs with a toothcomb and perhaps there's something
| about how you MUST NOT drop the connection if you're getting
| keepalives?
|
| Get Cisco and Juniper to implement it and that's 75% of LINX
| covered at least, I assume other exchanges have similar
| equipment makeup.
|
| It seems reasonable behaviour to me.
|
| It doesn't prevent the problem of the malicious BGP peer of
| course, but we know that already - if they choose to ignore
| your messages (while being happy with a high send-window) but
| continue to send keepalives you're equally screwed.
| iforgotpassword wrote:
| If you don't put it in the RFC then you'll end up with five
| different solutions to this problem from five different
| vendors, and a nice 5x5 matrix of new hilarious edge cases
| when these are talking to each other and something wonky is
| happening to the TCP session.
| eqvinox wrote:
| BGP Keepalives are not request-reply, they are simple
| scheduled transmissions. Which means even if one side is not
| reading, it may still be sending keepalives. So the other
| side keeps the session open, despite its own keepalives
| sitting in its send queue.
|
| Also, any valid BGP message resets the keepalive timer, so
| the reading side just needs to occasionally pop something off
| the full queue and process it. Which, say, if you're swapping
| to hell and back, can still get done. (Assuming it even has
| the scheduling get to killing things due to holdtime expiry.
| It might just not be expiring anything anymore for reasons of
| floating face-down in the river.)
| sigmonsays wrote:
| Interesting read. Its interesting that this is a big in the
| specification and not implementation since bgp is so old. We must
| not hit this case often
| eqvinox wrote:
| It's actually becoming more frequent due to BGP speakers being
| increasingly multithreaded. In the olden days, if you were
| overloaded, that also meant no more keepalives being sent. With
| the power of multithreading you can now simultaneously be
| overloaded and still send keepalives! :D
| NovemberWhiskey wrote:
| Ah yes; that's one of my favorites - health check returning
| 200s instantaneously; actual service is a black hole.
| jschwartzi wrote:
| One of many problems that are only solvable with a software
| watchdog.
| EricE wrote:
| > With the power of multithreading you can now simultaneously
| be overloaded and still send keepalives! :D
|
| Now that's progress!
|
| Reminds me of some of the naive comments in a few of the
| recent posts on HN about programming multithreading. It's not
| as easy as just adding more threads.
|
| Or sending in more trains :)
| https://www.youtube.com/watch?v=-hyttagGsz0
| anticristi wrote:
| Thinking out loud: When I read the BGP spec, I got the feeling
| that it was optimized for reduced churn. As the Internet routing
| table size increased and increase in CPU power of routers was an
| uncertainty, the architects of the Internet wanted to avoid extra
| BGP exchanges.
|
| However, now it seems like the Internet is facing new challenges
| and a different trade-off might make sense. Why not add a "valid
| until" attribute on each route? The originating router would have
| to re-announce a new route every 24 hours. Failure to propagate
| the update at any point would automatically withdraw it. Of
| course, re-announcing 1M routes every day might be a lot, but at
| this point it feels worth considering.
| vlovich123 wrote:
| I wonder if a robust consensus algorithm might be a better
| investment than a timeout. I would imagine there are other bugs
| in BGP implementations so having a routing table that's going to
| trend towards eventual consistency regardless of the starting
| point might be a more robust solution than just focusing on this
| one corner case. Might be a more intrusive change though & hard
| to get middleware to roll out such a change?
| anticristi wrote:
| Given the size and complexity of the Internet, it might be
| worth considering making BGP tolerant to Bizantine failures.
| benlivengood wrote:
| There isn't a really useful metric for failure, though. Not
| every prefix of every AS needs to be reachable from every
| other. Unlike consensus problems where everyone wants to
| agree on the same state it's sufficient for the Internet to
| be in a working state where each AS has enough routes that
| they care about, and BGP is pretty good at achieving that.
| benlivengood wrote:
| There isn't actually a consensus to be formed on the Internet.
| Communities, local configuration, etc. cause BGP routers to
| make local decisions about routes to advertise and re-advertise
| that aren't going be part of a concensus.
___________________________________________________________________
(page generated 2021-04-21 23:01 UTC)