[HN Gopher] Cloudflare servers don't own IPs anymore so how do t...
___________________________________________________________________
Cloudflare servers don't own IPs anymore so how do they connect to
the internet?
Author : jgrahamc
Score : 213 points
Date : 2022-11-25 14:16 UTC (8 hours ago)
(HTM) web link (blog.cloudflare.com)
(TXT) w3m dump (blog.cloudflare.com)
| danrl wrote:
| As an industry, we are bad at deprecating old protocols like
| IPv4. This is a genius hack for a problem we have due to IPv6 not
| being adopted widely enough so that serving legacy IP users
| becomes a dropable liability to the business. The ROI is still
| high enough for us to "innovate" here. I applaud the solution but
| mourn the fact that we still need this.
|
| I guess ingress is next, then? Two layers of Unimog to achieve
| stability before TCP/TLS termination maybe.
| dopylitty wrote:
| I've been thinking a lot about this in my own enterprise and
| I've increasingly come to the conclusion that IP itself is the
| wrong abstraction for how the majority of modern networked
| compute works. IPv6, as a (quite old itself) iteration on top
| of IPv4 with a bunch of byzantine processes and acronyms tacked
| on is solving the wrong problem.
|
| Originally IP was a way to allow discrete physical computers in
| different locations owned by different organizations to find
| each other and exchange information autonomously.
|
| These days most compute actually doesn't look like that. All my
| compute is in AWS. Rather than being autonomous it is
| controlled by a single global control plane and uniquely
| identified within that control plane.
|
| So when I want my services to connect to each-other within AWS
| why am I still dealing with these complex routing algorithms
| and obtuse numbering schemes?
|
| AWS knows exactly which physical hosts my processes are running
| on and could at a control plane level connect them directly.
| And I, as someone running a business, could focus on the higher
| level problem of 'service X is allowed to connect to service Y'
| rather than figuring out how to send IP packets across
| subnets/TGWs and where to configure which ports in NACLs and
| security groups to allow the connection.
|
| Similarly my ISP knows exactly where Amazon and CloudFlare's
| nearest front doors are so instead of 15 hops and DNS
| resolutions my laptop could just make a request to Service X on
| AWS. My ISP could drop the message in AWS' nearest front door
| and AWS could figure out how to drop the message on the right
| host however they want to.
|
| I know there's a lot of legacy cruft and also that there are
| benefits of the autonomous/decentralized model vs central
| control for the internet as a whole but given the centralized
| reality we're in, especially within the enterprise, I think
| it's worth reevaluating how we approach networking and whether
| the continuing focus on IP is the best use of of our time.
| ec109685 wrote:
| The IP addresses you see as an AWS customer aren't the same
| used to route packets between hosts. That said, there's a
| huge amount of commodity infrastructure built up that
| understands IP addresses and routing layers, so unless a new
| scheme offers tremendous benefits, it won't get adoption.
|
| At least from a security perspective though ip acl's are
| falling out of favor to service based identities, which is a
| good thing.
|
| You can see how AWS internally does networking here:
| https://m.youtube.com/watch?v=ii5XWpcYYnI
| wpietri wrote:
| > my laptop could just make a request to Service X on AWS
|
| I was looking for the "just" that handwaves away the
| complexity and I was not disappointed.
|
| How do you imagine your laptop expressing a request in a way
| that it makes it through to the right machine? Doing a
| traceroute to amazon.com, I count 26 devices between me and
| it. How will those devices know which physical connection to
| pass the request over? Remember that some of them will be
| handling absurd amounts of traffic, so your scheme will need
| to work with custom silicon for routing as well as doing ok
| on the $40 Linksys home unit. What are you imagining that
| would be so much more efficient that it's worth the enormous
| switching costs?
|
| I also have questions about your notion of "centralization".
| Are you saying that Google, Microsoft, and other cloud
| vendors should just... give up and hand their business to
| AWS? Is that also true for anybody who does hosting,
| including me running a server at home? If so, I invite you to
| read up on the history of antitrust law, as there are good
| reasons to avoid a small number of people having total
| control over key economic sectors.
| dopylitty wrote:
| > How do you imagine your laptop expressing a request in a
| way that it makes it through to the right machine? Doing a
| traceroute to amazon.com, I count 26 devices between me and
| it. How will those devices know which physical connection
| to pass the request over?
|
| That's my whole point. You're thinking of it from an IP
| perspective where there are individual devices in some
| chain and they all need to autonomously figure out a path
| from my laptop to AWS. The reality is every device between
| me and AWS is owned by my ISP. They know exactly which
| physical path ahead of time will get a message from my
| laptop to AWS. So why waste all the time on the IP
| abstraction?
|
| > I also have questions about your notion of
| "centralization". Are you saying that Google, Microsoft,
| and other cloud vendors should just... give up and hand
| their business to AWS?
|
| AWS is just an example. Realistically a huge amount of
| traffic on the internet is going to 6 places and my ISP
| already has direct physical connections to those places.
| Maintaining this complex and byzantine abstraction to
| figure out how to get a message from my laptop to compute
| in those companies' infrastructure should not be necessary.
|
| And in general the more important part is within AWS' (or
| Microsoft's or enterprise X's) network why waste time on IP
| when the network owner knows exactly which host every
| compute process is running on?
|
| Instead of thinking of an enterprise network as a set of
| autonomous hosts that need to figure out a path between
| each other think of it as a set of processes running on the
| same OS (the virtual infrastructure). Linux doesn't need to
| do BGP to figure out how to connect two processes so why
| does your network?
| scarmig wrote:
| > The reality is every device between me and AWS is owned
| by my ISP. They know exactly which physical path ahead of
| time will get a message from my laptop to AWS.
|
| None of these are true.
| akira2501 wrote:
| > Rather than being autonomous it is controlled by a single
| global control plane and uniquely identified within that
| control plane.
|
| By default, sure. You can easily bring your own IPs into AWS
| and use them instead, and I don't think it's hard to imagine
| the pertinent use cases and risk management this brings.
| mike256 wrote:
| Wouldn't it be better when all those big CDNs just switch off
| IPv4 and force the sleeping ISPs to enable IPv6? Maybe we should
| introduce some IPv6 only days as a first step...
| subarctic wrote:
| Pretty interesting article. TLDR: they're now using anycast for
| egress, not just ingress.
|
| Each data center has a single IP for each country code (so that
| they can make outgoing requests that are geolocated in any
| country). In order to achieve that, they have a /24 or larger
| range for each country, and announce it from all their data
| centers, and then they route the traffic over their backbone to
| the appropriate data center for that IP.
|
| Then in the data center, they share the single IP across all
| their servers by giving each server a range of TCP/UDP port space
| (instead of doing stateful NAT).
| ec109685 wrote:
| It's not a single IP address per data center. Otherwise they'd
| only be able to make 64k simultaneous egress connections, nor
| would their scheme of different ip addresses per "geo" and
| product work.
| Terretta wrote:
| I quite like what CloudFlare has done here.
|
| There's a fourth way to resolve this, that works for the core use
| case, is less engineering, and was in production 20 years ago,
| but I can't fit it within the margins of this comment box.
|
| // CF's approach has additional feature advantages though.
| [deleted]
| dfawcus wrote:
| What they describe sounds a lot like a distributed static RSIP
| scheme.
|
| https://en.wikipedia.org/wiki/Realm-Specific_IP
|
| With port ranges rather than being 'leased', being allocated on
| the the basis of per server within a locale.
|
| So the IP goes to the locale, the port range is the the static
| RSIP to the server within that locale.
| martinohansen wrote:
| Am I missing something here or did they just reinvent a NAT
| gateway with static rules?
|
| I understand that they started using anycast for the egress IPs
| as well, but thats unrelated to the NAT problem.
| [deleted]
| xg15 wrote:
| > _However, while anycast works well in the ingress direction, it
| can 't operate on egress. Establishing an outgoing connection
| from an anycast IP won't work. Consider the response packet. It's
| likely to be routed back to a wrong place - a data center
| geographically closest to the sender, not necessarily the source
| data center!_
|
| Slightly OT question, but why wouldn't this be a problem with
| ingress, too?
|
| E.g. suppose I want to send a request to https://1.2.3.4. What I
| don't know is that 1.2.3.4 is an anycast address.
|
| So my client sends a SYN packet to 1.2.3.4:443 to open the
| connection. The packet is routed to data center #1. The data
| center duly replies with a SYN/ACK packet, which my client
| answers with an ACK packet.
|
| However, due to some bad luck, the ACK packet is routed to data
| center #2 which is also a destination for the anycast address.
|
| Of course, data center #2 doesn't know anything about my
| connection, so it just drops the ACK or replies with a RST. In
| the best case, I can eventually resend my ACK and reach the right
| data center (with multi-second delay), in the worst case, the
| connection setup will fail.
|
| Why does this not happen on ingress, but is a problem for egress?
|
| Even if the handshake uses SYN cookies and got through on data
| center #2, what would keep subsequent packets that I send on that
| connection from being routed to random data centers that don't
| know anything about the connection?
| matsur wrote:
| This is a problem in theory. In practice (and through
| experience) we see very little routing instability in the way
| you describe.
| xg15 wrote:
| You mean, it's just luck?
| Brian_K_White wrote:
| right? also seems like load should or at least could be
| changing all the time. geo or hops proximity is really the
| only things that decide a route? not load also?
|
| But although I would be surprised if load were not also
| part of the route picker, I would also be surprised if the
| routers didn't have some association or state tracking to
| actively ensure related packets get the same route.
|
| But I guess this is saying exactly that, that it's relying
| on luck and happenstance.
|
| It may be doing the job well enough that not enough people
| complain, but I wouldn't be proud of it myself.
| remram wrote:
| Anycast is implemented by BGP and doesn't take load into
| account in any way. You will reach the closest location
| announcing that address (well, prefix).
| ignoramous wrote:
| TFA claims that _Anycast_ is an advantage when dealing
| with DDoS because it helps spread the load? A regional
| DDoS (where it consistently hits a small set of DCs) is
| not a common scenario, I guess?
| csande17 wrote:
| Basically yes. Large-scale DDoS attacks rely on
| compromising random servers and devices, either directly
| with malware or indirectly with reflection attacks. Those
| hosts aren't all going to be located in the same place.
|
| An attacker could choose to only compromise devices
| located near a particular data center, but that would
| really reduce the amount of traffic they could generate,
| and also other data centers would stay online and serve
| requests from users in other places.
| toast0 wrote:
| Your intuition is more or less all wrong here, sorry.
|
| Most routers with multiple viable paths pass was too much
| traffic to do state tracking of individual flows. Most
| typically, the default metric is BGP path length, for a
| given prefix, send packets through the route that has the
| most specific prefix, if there's a tie, use the route
| that transits the fewest networks to get there, if
| there's still a tie, use the route that has been up the
| longest (which maybe counts as state tracking). Routing
| like this doesn't take into account any sort of load
| metric, although people managing the routers might do
| traffic engineering to try to avoid overloaded routes
| (but it's difficult to see what's overloaded a few hops
| beyond your own router).
|
| For the most part, an anycast operation is going to work
| best if all sites can handle all the forseable load,
| because it's easy to move all the traffic, but it's not
| easy to only move some. Everything you can do to try to
| move some traffic is likely to either not be effective or
| move too much.
| richieartoul wrote:
| Why shouldn't they be proud of a massive system like
| Cloudflare that works extremely well? As a commentor
| below described, it's not luck or happenstance, it's a
| natural consequence of how BGP works. Seems pretty
| elegant to me.
| rizky05 wrote:
| [deleted]
| tonyb wrote:
| It works because the route to 1.2.3.4 is relatively stable. The
| routes would only change and end up at data center #2 if data
| center #1 stopped announcing the routes. In that case the
| connection would just re-negotiate to data center #2.
| xg15 wrote:
| Ah, ok, that makes sense. So for a given point of origin,
| anycast generally routes to the same server?
| majke wrote:
| Correct. From a single place, you're likely to BGP-reach
| one Cloudflare location, and it doesn't change often.
| ratorx wrote:
| As others have mentioned, this is not often a problem because
| routing is normally fairly stable (at least compared to the
| lifetime of a typical connection). For longer lived connections
| (e.g. video uploads), it's more of a problem.
|
| Also, there are a fair number of ASes that attempt to load
| balance traffic between multiple peering points, without
| hashing (or only using the src/dst address and not the port).
| This will also cause the problem you described.
|
| In practice it's possible to handle this by keeping track of
| where the connections for an IP address typically ingress and
| sending packets there instead of handling them locally. Again,
| since it's a few ASes that cause problems for typical
| connections, is also possible to figure out which IP prefixes
| experience the most instability and only turn on this overlay
| for them.
| grogers wrote:
| Yep, it can happen that your packet gets routed to a different
| DC from a prior packet. But the routers in between the client
| and the anycast destination will do the same thing if the
| environment is the same. So to get routed to a new location,
| you would usually need either:
|
| * A new (usually closer) DC comes online. That will probably be
| your destination from now on.
|
| * The prior DC (or a critical link on the path to it) goes
| down.
|
| The crucial thing is that the client will typically be routed
| to the closest destination to it. In the egress case the
| current DC may not be the closest DC to the server it is trying
| to reach so the return traffic would go to the wrong place.
| This system of identifying a server with unique IP/port(s)
| means that CF's network can forward the return traffic to the
| correct place.
| ignoramous wrote:
| Yes, as others have mentioned, route flapping is a problem.
| But, in practice, not as big a problem as DNS-based routing.
|
| - See: https://news.ycombinator.com/item?id=10636547
|
| - And: https://news.ycombinator.com/item?id=17904663
|
| Besides, SCTP / QUIC aware load balancers (or proxies) are
| detached from IPs and should continue to hum along just fine
| regardless of which server IP the packet ends up at.
| Thorentis wrote:
| The fact that we haven't yet adopted IPv6 tells me that IPv6
| isn't actually that great of a solution. We need an Internet
| Protocol that solves modern problems and that has a good
| migration path.
| wpietri wrote:
| 40% of Google's traffic comes via IPv6. Up from 1% a decade
| ago. https://www.google.com/intl/en/ipv6/statistics.html
|
| If you think you can do better than that, I look forward to
| hearing your plan. Personally, I think that's huge progress.
| eastdakota wrote:
| Fun fact: the first product we announced to celebrate
| Cloudflare's launch day anniversary was a IPv4<->IPv6 gateway:
|
| https://blog.cloudflare.com/introducing-cloudflares-automati...
|
| The success of that convinced us we should do something to
| improve the Internet every year to celebrate our "birthday."
| Over time we ended up with more than one product that met that
| criteria and timing, so it went from a day of celebration to a
| week. That became our Birthday Week. Then we saw how well
| bundling a set of announcements into a week was so we decided
| to do it other times of the year. And that's how Cloudflare
| Innovation Weeks got started, explicitly with us delivering
| IPv6 support back in 2011.
| growse wrote:
| You need an IPv4 src address to connect out to an IPv4 origin.
| zekica wrote:
| Where do they say that they haven't adopted IPv6? All their
| offerings support IPv6.
| inopinatus wrote:
| TLDR: Cloudflare is using five bits from the port number as a
| subnetting & routing scheme, with optional content policy
| semantics, for hosts behind anycast addressing and inside their
| network boundary.
| Ptchd wrote:
| If you don't need an IP to be connected to the internet, sign me
| up... I think they are full of it though... Even if you only have
| one IP.... you still have an IP
|
| > PING cloudflare.com (104.16.133.229) 56(84) bytes of data.
|
| > 64 bytes from 104.16.133.229 (104.16.133.229): icmp_seq=1
| ttl=52 time=10.6 ms
|
| With a ping like this, you know that I am not using Musk's
| Internet....
| cesarb wrote:
| All this wonderful complexity, just because a few servers insist
| on behaving as if the location of the IP address and the location
| of the user should always match.
| jesuspiece wrote:
| ronnier wrote:
| Spammers are exploiting cloudflare by creating thousands of new
| domains on the free tld (like .ml) and hosting the sites behind
| cloudflare and spamming social media apps with links to scam
| dating sites. CPA scammers.
|
| If anyone from CF sees this, I can work with you and give you
| data on this. I'm dealing with this at one of the large social
| media companies.
|
| Here's an example, this is NSFW - https://atragcara.ga
| elorant wrote:
| So why aren't social media platforms blocking the domains?
| ronnier wrote:
| We do. But with free TLD's, spammers and scammers can create
| an unlimited number of new domains at zero cost. That's the
| problem. They can send a single spam URL to a single person
| and scale that out, each person gets a unique domain and URL.
| elorant wrote:
| So how about blocking the users then? Or limit their
| ability to post links.
| ronnier wrote:
| That's done too. But it's not just a few, it's literally
| 10s of thousands of individuals from places like
| Bangladesh who do this as their source of income. They
| are smart, on real devices, will solve any puzzle you
| throw at them, and will adapt to any blocks or locking.
| It's not an easy problem to solve which is why no
| platform has solved it (oddly, spam is pretty much non
| existent on HN)
| elorant wrote:
| I don't think there's any benefit in spamming HN. There
| aren't that many users in here, and it could lead to a
| backlash consider the technical expertise of most people.
| gnfargbl wrote:
| OK, but why don't you block Freenom domains entirely?
|
| Apart from perhaps a couple of sites like gob.gq, there's
| essentially nothing of any value on those TLDs. Allow-list
| the handful of good sites, if you must, and default block
| the rest.
| ronnier wrote:
| I could. But we are talking about one of the worlds
| largest social media platforms used by hundreds of
| millions of people daily. There's legit websites hosted
| on these free domains and I don't want to kill those
| along with the scam sites. I've mostly got the scam sites
| blocked at this point though. Just took me a week or so
| to adapt.
| gnfargbl wrote:
| > There's legit websites hosted on these free domains
|
| Are there though, really? Can you give some examples?
|
| To a first approximation, I contend that essentially
| everything on Freenom is bad. There are maybe a _handful_
| of good sites (the one I listed, https://koulouba.ml/,
| etc) but you can find those on Google in a few minutes
| with some _site:_ searches.
|
| I commend your efforts in blocking the scam sites, but
| also honestly believe that it would be better for you,
| your customers and the internet at large to default block
| Freenom. Freenom sites are junk, wherever they are
| hosted.
| ronnier wrote:
| Here's NSFW scam sites behind CF that use free TLDs. I
| could post 10s of thousands of these.
|
| * https://atragcara.ga
|
| * https://donaga.tk
|
| * https://snatemhatzemerbedc.tk
| gnfargbl wrote:
| Yep, I know. I monitor these as they appear in
| Certificate Transparency logs and DNS PTR records.
|
| Freenom TLDs are just junk. Save yourself the hassle and
| default block :-).
| ronnier wrote:
| Seems these sites should be blocked on CF, at the root.
| Not all the leaf nodes apps. It's pretty easy for me to
| automate it at my company. Seems CF could?
| sschueller wrote:
| Same goes for DDoS attacks. I am not sure how they do it but we
| get hit by CF IPs with synfloods etc.
| gnfargbl wrote:
| Anyone can set the source IP on their packets to be anything.
| I can send you TCP SYNs which are apparently from Cloudflare.
|
| There was a proposal (BCP38) which said that networks should
| not allow outbound packets with source IPs which could not
| originate from that network, but it didn't really get a lot
| of traction -- mainly due to BGP multihoming, I think.
| toast0 wrote:
| BCP38 has gotten some traction, but it's not super
| effective until all the major tier-1 ISPs enforce it
| against their customers. But it's hard to pressure tier-1
| ISPs; you can't drop connections with them, because they're
| too useful, anyway if you did, the traffic would just flow
| through another tier-1 ISPs, because it's not really
| realistic for tier-1s to prefix filter peerings between
| themselves. Anyway, the customer that's spoofing could be
| spoofing sources their ISP legitimagely handles, and
| there's a lot of those.
|
| Some tier-1s do follow BCP38 though, so one day maybe?
| Still, there's plenty of abuse to be done without spoofing,
| so while it would be an improvement, it wouldn't usher in
| an era of no abuse.
| slothsarecool wrote:
| You do not get attacked from Cloudflare with TCP attacks.
| Somebody is spoofing the IP header and make it seem like
| Cloudflare is DDoSing you.
|
| The only way for somebody to DDoS from Cloudflare would be
| using workers, however, this isn't practical as workers have
| a very limited IP Range.
| fncivivue7 wrote:
| cmeacham98 wrote:
| The reason people do this, by the way, is because it's
| common if you're hosting via CF to whitelist their IPs and
| block the rest. This allows their SYN flood to bypass that.
| [deleted]
| uvdn7 wrote:
| This is a wonderful article. Thanks for sharing. As always,
| Cloudflare blog posts do not disappoint.
|
| It's very interesting that they are essentially treating IP
| addresses as "data". Once looking at the problem from a
| distributed system lens, the solution here can be mapped to
| distributed systems almost perfectly.
|
| - Replicating a piece of data on every host in the fleet is
| expensive, but fast and reliable. The compromise is usually to
| keep one replica in a region; same as how they share a single /32
| IP address in a region.
|
| - "sending datagram to IP X" is no different than "fetching data
| X from a distributed system". This is essentially the underlying
| philosophy of the soft-unicast. Just like data lives in a
| distributed system/cloud, you no longer know where is an IP
| address located.
|
| It's ingenious.
|
| They said they don't like stateful NAT, which is understandable.
| But the load balancer has to be stateful still to perform the
| routing correctly. It would be an interesting follow up blog post
| talking about how they coordinate port/data movements (moving a
| port from server A to server B), as it's state management (not
| very different from moving data in a distributed system again).
| remram wrote:
| I have a lot of trouble mapping your comment to the content of
| the article. It is about the _egress addresses_ , the ones
| CloudFlare use as source when fetching from origin servers.
| Those addresses need to be separated by the region of the end-
| user ("eyeball"/browser) and the CloudFlare service they are
| using (CDN or WARP).
|
| The cost they are working around is the cost of IPv4 addresses,
| versus the combinatorial explosion in their allocation scheme
| (they need number of services * number of regions * whatever
| dimension they add next, because IP addresses are nothing like
| data).
|
| I am not sure where you see data replication in this scheme?
| uvdn7 wrote:
| It's not meant to be a perfect analogy. The replication
| analogy is mostly talking about the tradeoff between
| performance and cost. So it's less about "replicating" the ip
| addresses (which is not happening). On that front, maybe
| distribution would be a better term. Instead of storing a
| single piece of data on a single host (unicast), they are
| distributing it to a set of hosts.
|
| Overall, it seems like they are treating ip addresses as data
| essentially, which becomes most obvious when they talk about
| soft-unicast.
|
| Anyway, I just found it interesting to look at this through
| this lens.
| majke wrote:
| "Overall, it seems like they are treating ip addresses as
| data essentially"
|
| Spot on!
|
| In past:
|
| * /24 per datacenter (BGP), /32 per server (local network)
| (all 64K ports)
|
| New:
|
| * /24 per continent (group of colos), /32 per colo, port-
| slice per server
|
| This is totally hierarchical. All we did is build a tech to
| change the "assignment granularity". Now with this tech we
| can do... anything we want. We're not tied to BGP, or IP's
| belonging to servers, or adjacent IP's needing to be
| nearby.
|
| The cost is the memory cost of global topology. We don't
| want a global shared-state NAT (each 2 or 4-tuple being
| replicated globally on all servers). We don't want zero-
| state (a machine knowing nothing about routing, just BGP
| does the job). We want to select a reasonable mix. Right
| now it's /32 per datacenter.... but we can change it if we
| want and be more, or less specific than that.
| superkuh wrote:
| Yikes. More cloudflare breakage of the internet model. Pretty
| soon we might as well all just live within cloudflare's WAN
| entirely.
| eastdakota wrote:
| -\\_(tsu)_/-
|
| Another perspective is that the connection of an IP to specific
| content or individuals was a bug of the Internet's original
| design and thankfully we're finally finding ways to
| disassociate them.
| AlphaSite wrote:
| The internets a set of abstractions, as long as they still
| implement some common protocols and don't create a walled
| garden, is there any real social or technical issue with them
| doing unusual things in their network?
|
| I can totally see an argument against their CDN being too
| pervasive and problematic for TOR users, but this seems fine
| IMO.
| wrs wrote:
| What's breaking the internet model is the internet becoming too
| popular and running out of addresses. There's nothing specific
| to Cloudflare here. You're free to do the same thing to
| conserve your own address space. It's sort of a super-fancy
| NAT.
| majke wrote:
| Author here, I know this is a dismissive comment, but I'll bite
| anyway.
|
| As far as I understand the history of the IP protocol,
| initially an IP address pointed to a host. (/etc/hosts file
| seems that way)
|
| Then it was realized a single entity might have multiple
| network interfaces, and an IP started to point to a network
| card on a host. (a host can have many IP's). Then all the VRF,
| dummy devices, tuntaps, VETH and containers. I guess an IP is
| now pointing to a container or VM. But there is more. For
| performance you can (almost should!) have an unique IP address
| per NUMA node. Or even logical CPU.
|
| In modern internet a server IP: points to a single CPU on a
| container in a VM on a host.
|
| Then consider Anycast, like 1.1.1.1 or 8.8.8.8. An IP means
| something else... it means a resource.
|
| On the "client" side we have customer NAT's. CG NAT's and
| VPN's. An IP means similarly little.
|
| The IP's are really expensive, so in some cases there is a
| strong advantage to save them. Take a look at
| https://blog.cloudflare.com/addressing-agility/
|
| "So, test we did. From a /20 address set, to a /24 and then,
| from June 2021, to an address set of one /32, and equivalently
| a /128 (Ao1). It doesn't just work. It really works"
|
| We're able to serve "all cloudflare" from /32.
|
| There is this whole trend of getting denser and denser IP
| usage. It's not avoidable. It's not "breaking the Internet" in
| any way more than "NAT's are breaking the Internet". The
| network evolves, because it has to. And for one, I don't think
| this is inherently bad.
| superkuh wrote:
| >It's not avoidable. It's not "breaking the Internet" in any
| way more than "NAT's are breaking the Internet".
|
| I agree. NATs, particularly the Carrier NAT that smartphone
| users are behind, has broken the internet. It's made it so
| most people do not have ports and cannot participate in the
| internet. So now software developers cannot write software
| that uses the internet (without depending on third parties).
| This is bad. So is what you've done.
|
| Someday ipv6 will save us.
| remram wrote:
| TLDR:
|
| > To avoid geofencing issues, we need to choose specific egress
| addresses tagged with an appropriate country, depending on WARP
| user location. (...) Instead of having one or two egress IP
| addresses for each server, now we require dozens, and IPv4
| addresses aren't cheap.
|
| > Instead of assigning one /32 IPv4 address for each server, we
| devised a method of assigning a /32 IP per data center, and then
| sharing it among physical servers (...) splitting an egress IP
| across servers by a port range.
| majke wrote:
| Ha, I guess this is one way of summarizing it :) Author here. I
| wanted to share more subtleties of the design, but maybe I
| failed.
|
| Indeed, the starting point is sharing IP's across servers with
| port-ranges.
|
| But there is more:
|
| * awesome performance allowed by anycast.
|
| * ability to route /32 instead of /24 per datacenter.
|
| Generally, with this tech we can have _much_ better IP usage
| density, without sacrificing reliability or performance. You
| can call it "global anycast-based stateless NAT" but that
| often implies some magic router configuration, which we don't
| have.
|
| Here's one example of problems we run into - the lack of
| connectx() syscall on Linux - makes it hard to actually select
| port range to originate connections from:
|
| https://blog.cloudflare.com/how-to-stop-running-out-of-ephem...
| chatmasta wrote:
| I was surprised IPv6 was only briefly mentioned! Is that
| something you're looking at next, or are you already running
| an IPv6 egress network?
|
| Of course not every destination is an IPv6 host, so IPv4
| remains necessary, but at least IPv6 can avoid the need for
| port slicing, since you can encode the same bucketing
| information in the IP address itself.
|
| I've seen this idea used as a cool trick [0] to implement a
| SOCKS proxy that randomizes outbound IPv6 address to be
| within a publicly routed prefix for the host (commonly a
| /64).
|
| I guess as long as you need to support IPv4, then port
| slicing is a requirement and IPv6 won't confer much benefit.
| (Maybe it could help alleviate port exhaustion if IPv6
| addresses can use dynamic ports from any slice?)
|
| Either way, thanks for the blog post, I enjoyed it!
|
| [0] https://github.com/blacklanternsecurity/TREVORproxy
| miyuru wrote:
| I was also interested to know how this was handled for
| IPv6, but it was only briefly mentioned.
|
| Probably they didn't need to do much work with IPv6, since
| half of the post is solving IPv4 exhaustion problems.
| chriscappuccio wrote:
| Cloudflare wants to make money. The IPv6 features can
| come second as v6 usage increases.
| dknecht wrote:
| All of Cloudflare services ship with IPv6 day 1. IPv6 not
| an issue as we have enough IPv6 for each machine to have
| own IPs.
| pencilcode wrote:
| Is the geofencing country level only? So if, using warp, I
| use trip advisor and go and see nearby restaurants it will
| have no idea of what city I'm in? Guessing that's not so but
| wondering how it works
| aeyes wrote:
| This blog post has some info:
| https://blog.cloudflare.com/geoexit-improving-warp-user-
| expe...
|
| Warp uses its own set of egress IPs and their geolocation
| is close to your real location.
| remram wrote:
| From your article it seemed that your use of anycast was more
| accident than feature, due to the limit of BGP prefix sizes.
| If you could route those IPs to their correct destination,
| you would, you only go to the closest data center and route
| again because you have no choice.
|
| Maybe this ends up reducing cost on customers though, because
| the international transit happens in your backbone network
| rather than on the internet (customer-side).
| elp wrote:
| In english they now do carrier grade nat.
| cm2187 wrote:
| well, vanilla NAT really.
| [deleted]
| immibis wrote:
| This is a horrible way to avoid upgrading the world to IPv6.
| xnyan wrote:
| The industry will not transition to v6 unless: 1) The cost of
| not doing so is higher than the cost of sticking with v4.
| Because of all the numerous clever tricks and products designed
| to mitigate v4's limitations, the cost argument still favors v4
| for most people in most situations.
|
| or
|
| 2) We admit that v6 needs to be rethought and rethink it. I
| understand why v6 does not just increase IP address bits from
| 32 to 128, but at this point I think everyone has admitted that
| v6 is simply too difficult for most IT departments to
| implement. In particular, the complexity of the new assignment
| schemes like prefix delegation and SLAAC needs to be paired
| back. Offer a minimum set of features and spin off everything
| else.
| Animats wrote:
| I'm surprised that Cloudflare isn't all IPv6 when Cloudflare is
| the client. That would solve their address problems. Maybe
| charge more if your servers can't talk IPv6. Or require it for
| the free tier.
|
| It's useful that they use client side certificates. (They call
| this "authenticated origin pull", but it seems to be client
| side certs.
| ec109685 wrote:
| They also have to egress to third party servers since they
| are a CDN and support things like serverless functions
___________________________________________________________________
(page generated 2022-11-25 23:00 UTC)