[HN Gopher] How does Linux NAT a ping?
___________________________________________________________________
How does Linux NAT a ping?
Author : willdaly
Score : 209 points
Date : 2023-09-10 13:28 UTC (9 hours ago)
(HTM) web link (devnonsense.com)
(TXT) w3m dump (devnonsense.com)
| jjoonathan wrote:
| NAT is such a trashy abstraction. IPv4 needs to die.
| riffic wrote:
| be mindful of the Lindy effect, an observation on the future
| longevity of non-perishable things like technology or an idea
| being proportional to their current age, ipv4 due to its age
| will likely be around for quite some time to come.
|
| https://en.wikipedia.org/wiki/Lindy_effect
| midasuni wrote:
| I have a few devices on my home internet, on a handful of
| 192.168 subnets
|
| The other week I moved my ISP. The AS my house belonged to
| obviously changed to the new ISP, and I got a new v4 IP
|
| All I had to do was update my Wan router to forward trafffic
| from the new Ip.
|
| Instead with ipv6 I would have to change every node on my
| network, update my internal DNS.
|
| Now in theory I could have my own /48 which I take with me.
| That relies on my new ISP being willing to advertise it (which
| my current one does) but it's not particularly common.
|
| However a week ago my phone line was cut. I got a 5g mifi out
| and moved my wan connectivity through that until the cable was
| fixed. Again a nice simple masquerade on that interface and all
| was good (well not that good - very poor signal where I live)
|
| But the elephant in the room is of course all that ipv6 stuff
| aside, I still need to run a dual stack (or use trashy nat
| abstractions). It increases my work for no benefit.
|
| But taking about work, how about there?
|
| I have a fleet of vehicles on internal 172.16/12 subnets, they
| plug together and route to each other, and route from where
| they are via a variety of vpn connectivity (hoping that at
| least one method will work, as there's rarely a signal in the
| basements these park)
|
| If I moved them to ipv6 then again I'm back to having to move
| my /48s. Except these vehicles get internet from various
| sporting venues - most of which struggle to turn off MITM/443
| or unblock UDP, that's just not going to work in a world where
| they turn up at 10am Saturday morning and need to be working 2
| hours later.
|
| What business benefit is there for me to double the workload
| and double the risk by moving to dual stack?
| noinsight wrote:
| You could be using IPv6 ULA addresses internally on your home
| network to have static addressing. The real solution is
| moving to DNS names though with your router maintaining them
| based on DHCP leases or just using multicast DNS (Zeroconf).
|
| In the future you can probably go "IPv6-mostly" with a CLAT
| engine to ditch dual-stack:
| https://blog.apnic.net/2022/11/21/deploying-ipv6-mostly-
| acce...
| qhwudbebd wrote:
| > In the future you can probably go "IPv6-mostly" with a
| CLAT engine
|
| ...although there still isn't any kernel support for the
| necessary SIIT v4<->v6 translation, so to implement CLAT
| you end up using unmaintained (and unmergeably bad) out-of-
| tree kernel modules or unmaintained (and slow) userspace
| daemons hanging off a tuntap interface.
| dmatech wrote:
| You could, but now you have three addresses per node
| instead of one. Plus, the mechanisms for assigning those
| addresses are weird compared to DHCP and static assignment.
| I get that it facilitates packets being routed reliably,
| but some of us want maintainable firewall rules that don't
| have to deal with IP addresses changing out of the blue.
| jandrese wrote:
| With IPv6 you would do stateless autoconfigurarion, so there
| would be no manually setting of your addresses. The router
| would advertise the new prefix and everything would just use
| it.
|
| There would be no DNS configuration at all, all local
| machines would use anycast DNS for the services and a well
| known server for Internet addresses.
|
| One of the primary goals of IPv6 was to avoid needing manual
| configuration if anything on the network. It is supposed to
| be as automated as possible.
| midasuni wrote:
| And now I can't find anything because mdns doesn't work,
| half my kit won't take dns entries, more fragility from
| systems which don't exist, or and of course all my open
| sessions on local networks break as ip addresses change,
| not to mention all my WireGuard sessions.
| 3np wrote:
| > There would be no DNS configuration at all, all local
| machines would use anycast DNS for the services and a well
| known server for Internet addresses.
|
| Assumptions and dragons be here.
| midasuni wrote:
| MDNS, specifically designed for use on a single vlan, so
| useless
| pixl97 wrote:
| I do believe there is some kind of 1:1 NAT with IPv6 these
| days, which is way better than 1:Many of IPv4. There are so
| many potentially useful applications that are DOA because of
| v4 NAT being everywhere.
| midasuni wrote:
| Those applications are DOA because of firewall
| administrators that barely allow tcp/443 through.
| [deleted]
| littlecranky67 wrote:
| Not sure IPv6 will fix this. Technically, yes it does. But
| major providers only assigning a /64 to a home user (and
| charging hefty fees for "buisness use" /48) already leads to
| IPv6 NAT or segmenting the /64 further - which shoulnt be done.
| lazide wrote:
| Most seem to have stopped and are handing out /48's in my
| experience. Do you know any not doing that still?
| mindslight wrote:
| Is there a better way to not unnecessarily leak addressing
| metadata to adversarial remote nodes and middle boxes?
|
| IPv6 with assigning end users a whole /64 and end-devices
| continually churning through privacy addresses is a start. But
| even then some form of NAT is still required to nimbly use
| source prefixes from different horizon providers - eg to avoid
| spilling your geographic location or opening yourself up to
| low-effort legal shakedowns.
|
| An example: on my local network I've got an everyday web
| browsing VM and a torrent VM. They each have static 192.168.x.x
| addresses, both so I can ssh in for administration and also to
| control their view of network services. They each see a
| completely different Internet horizon through the router - the
| web browsing goes out from a rotating datacenter IP, and the
| torrent one goes out from a consumer VPN. Each of those
| outgoing horizons uses NAT - any of my hosts using that
| rotating data center IP appears the same, and any of my host
| using the consumer VPN appears the same as every other customer
| using that same VPN node.
|
| What is the no-NAT equivalent of this? Make that rotating data
| center IP and VPN external IP into subnet allocations, somehow
| feed that addressing information back to the hosts that are
| using it, and dual-home each VM with two routable addresses?
| For equivalent mixing on the consumer VPN there would also need
| to be some ARP-like protocol that let me continually rotate the
| address.
| jandrese wrote:
| Why not just use a VPN in both cases? That's more or less
| what your NAT solution is doing, except without the
| encryption to the data center.
| mindslight wrote:
| It _is_ a wireguard tunnel to the data center, but my
| comment was focused on the addressing.
| 3np wrote:
| > What is the no-NAT equivalent of this?
|
| At least for web-browsing and other HTTP/TCP use-cases: Cut
| off internet from your hosts and use centralized local
| proxies for all outgoing connections. Presumably you already
| have reverse proxies in place for the incoming. There is no
| need for NAT if all the traffic is taken care of in higher
| layers. This reduces your consideration to the internet-
| facing forward- and reverse-proxies only.
|
| Sounds like you already have bittorrent figured out via VPN
| (Wireguard I guess? Well there we have one more UDP exit-
| point to consider).
|
| BTW, I largely agree with your sentiment: Benefit of
| (especially migrating to) IPv6/DS for individual networks is
| often unclear or questionable and metadata privacy is a valid
| consideration where I believe correct solutions are not
| readily available and understood even by your well-
| intentioned and seasoned senior admins. Maybe globally the
| number of people who will get this right ranges in the 1000s?
| 10,000s if we're lucky? How many networks do we need to
| migrate again for "IPv4 to die"?
|
| I guess the only way forward is for more people to do that
| migration and share their findings and solutions, though ;)
| mindslight wrote:
| The general ignorance of the privacy benefits of NAT are
| what I'm reacting against too. It's certainly regrettable
| that end users are forced into NAT [0], but since then a
| shameless surveillance industry has cropped up, looking to
| exploit every bit of identifying information that it can.
| And it seems that calls for native IPv6 with everything
| having its own distinct address generally just ignore the
| practical privacy implications.
|
| It certainly seems _possible_ to get a NAT-equivalent
| privacy from properly set up SLAAC. Although a sibling
| comment says that the proposal for variable length prefixes
| was just submitted _this year_?!? Equivalent privacy would
| also require things like consumer VPN providers allowing
| you to request a few new addresses every few minutes,
| whereas NAT makes a shared uniform distribution the
| default.
|
| Using a proxy instead of NAT is a good point, although
| there are certainly reasons I moved towards managing egress
| flows at the packet level with VMs rather than configuring
| software to play nice with proxies. And spiritually I would
| say that a proxy is an even more heavyweight version of NAT
| one layer up.
|
| [0] Although I don't personally think the web would have
| developed any less centralized without NAT as many people
| like to imagine
| jaimex2 wrote:
| IPv6 needs to die also. It had more than enough time to become
| dominant and has just floundered.
| kazinator wrote:
| You need NAT (or something else that is worse in some respects,
| like port forwarding) in any situation in which your subnet is
| given only one address upstream, even if it is an IPv6 address.
| paulddraper wrote:
| Yeah, but there's no reason to do that with IPv6
| xxpor wrote:
| If your ISP doesn't do PD with v6, their implementation
| sucks. Even my crappy 6rd setup from CenturyLink gives me
| iirc an entire /48.
| midasuni wrote:
| Many ISPs suck. That's not controversial.
|
| We have to deal with the world we live in, not the world
| we'd like.
| pantalaimon wrote:
| That's why variable length SLAAC has been proposed
|
| https://datatracker.ietf.org/doc/draft-mishra-6man-
| variable-...
| growse wrote:
| > Many ISPs suck. That's not controversial.
|
| > We have to deal with the world we live in, not the
| world we'd like.
|
| No we don't. Some choose to just put up with shittiness,
| others enact change.
| toast0 wrote:
| Priorities. I don't have to put up with PPPoE in 2023,
| but it's a hell of a lot less expensive than pulling
| munifiber to my garage (and the monthly fees for
| munifiber are higher too, so there's no point in time
| where it makes economic sense), and consistency and
| stable addressing is currently winning over the promise
| of 5g/leo satellite.
| growse wrote:
| Sure, but you're choosing that prioritisation. It's not
| being forced on you.
| kazinator wrote:
| OP here.
|
| Your "ISP" is a sysadmin at work who gives you one
| address to your cube.
|
| You otherwise like the work and the team, and the
| compensation is fine.
|
| Now what?
| growse wrote:
| > Now what?
|
| You advocate for change. You make the case.
|
| You might not win the battle, but you're by no means
| forced to accept the status quo. The more who fight the
| battle, the more win. The more win, the faster progress,
| which benefits us all.
| kazinator wrote:
| I set up NAT, I move on.
|
| If you can solve a problem technically, without involving
| people, that is best.
| trustingtrust wrote:
| You'll hate CG-NAT even more then.
| lorenzo95 wrote:
| heh ... It's all IPv6 ULA here with Nat66
| i80and wrote:
| The first time I encountered CGNAT was such a rude shock. I
| don't think it should be legal to market it as "internet" to
| consumers
| nanmu42 wrote:
| Good post.
|
| Coincidently, I was struggling with Netfilter this weekend to
| enable transparent proxy on my OpenWRT router.
|
| For the curious, the go-to resources for Netfilter are:
|
| 1. https://wiki.nftables.org/wiki-nftables/index.php/Main_Page
|
| 2. https://www.netfilter.org/projects/nftables/manpage.html
| viopq wrote:
| It's refreshing to see a "how does" which actually drills down
| through layers of abstraction all the way to the source code.
| Nicely explained and very informative!
| peter_l_downs wrote:
| I came here to write this. Routing and networking is still
| confusing for me and all the writing about it is usually very
| "abstract" to me. A hands-on example like this one is really
| appreciated. Nice work, OP. I'll try to do it myself and follow
| along.
|
| EDIT: one of the only other posts about this stuff that has
| made much sense to me is this one from Tailscale. It contains
| lots of "worked out examples" that really make it clear how
| everything fits together.
|
| https://tailscale.com/blog/how-nat-traversal-works/
| mindslight wrote:
| IME if you're digging into the finer points of netfilter, you
| eventually run up against the limits of published
| documentation and _have_ to dig into the source code to
| figure some things out.
| kazinator wrote:
| Since there is no port in ICMP, NAT doesn't have to deal with the
| problem of sending the ICMP echo reply back to the correct port.
|
| ICMP echo requests have an ID, and that's effectively the same as
| a source port number.
|
| Correct NAT handling of ICMP echo has to remap the ID in both
| directions, the same way that correct handling of UDP remaps the
| source port.
|
| Reason being, if the machine behind NAT is being pinged at the
| same time by two different hosts, and they happen to use the same
| request numbers, then it is ambiguous.
|
| Another possibility is not to rewrite the identifiers, but keep a
| list of remote machines associated with each ID. When there is a
| clashing ID, the list contains two or more entries (remote IP
| addresses). So then, when a reply is received from the machine
| behind the NAT gateway, the NAT chooses one of the entries in the
| list (say, the least recently added one) and sends the reply to
| that machine. Then removes the entry.
| zackmorris wrote:
| I wonder if ping could be abused to send short messages for p2p
| networking over UDP without a central server to handle NAT
| busting. Looks like someone figured the message part out:
|
| https://stackoverflow.com/questions/31857419/how-to-send-a-m...
|
| Unfortunately ping is handled by the OS so apps on the peer IPs
| wouldn't be able to read the messages.
|
| I wonder if it's time to provide hooks to some of these services
| in user space to make true p2p under double-ended NAT possible.
| At least a readonly event stream or something. It just feels like
| the barriers preventing that are entirely artificial now.
| Bluecobra wrote:
| As IPv6 gains more and more adoption this should become less as
| an issue if everyone has a publicly routable IP and can avoid
| NAT altogether.
| freedomben wrote:
| Minor technical correction, but ping is ICMP rather than UDP.
|
| But I have seen data exfiltration strategies and other
| communication that uses ping! Nowadays I think it would be
| nearly impossible for p2p because most firewall default configs
| will silently drop all ICMP, including pings.
| lazide wrote:
| Nod, I remember it not being as effective/easy to hide as
| exfiltration over UDP/DNS too, as there was always less
| background noise to hide in. That said, I found this with a
| quick search - https://github.com/utoni/ptunnel-ng for those
| who still want to do it. A number of hotels and captive
| portals still let pings through relatively unmolested even if
| they play tricks with UDP/TCP.
|
| Any significant data over ICMP will always stick out though
| if anyone is doing analysis. Which isn't often, frankly, in
| situations like I described, but...
| jandrese wrote:
| Note that blanket dropping of ICMP will break Path MTU
| Discovery (PMTUD) so you had better not be tunneling or
| encapsulating TCP traffic.
| zinekeller wrote:
| Actually, ICMP-based PMTUD is almost dead in IPv4 due to
| this exact problem (since ICMP isn't a "protected" protocol
| which is required for IPv4 connectivity), most _actual_
| services tend to do the MTU discovery purely using UDP or
| even using TCP
| (https://datatracker.ietf.org/doc/html/rfc4821)
| oasisaimlessly wrote:
| It exists: https://samy.pl/pwnat/
|
| (from top comment)
| mindslight wrote:
| Interesting idea. It would seem that 'id' is effectively
| equivalent of (sport, dport), but 16 bits is a much smaller
| space than 32.
|
| But isn't the main problem with NAT punching that it requires
| activity on both ends to create a connection? Thus it always
| requires a coordination server to let node T (target) that node
| S (source) is trying to talk to it.
|
| You've got me thinking though. I wonder if there is a way to do
| this with ICMP routing messages - unreachable, TTL expired,
| etc. You can traceroute to some IP address, and get back
| packets from _other_ arbitrary IP addresses, and this generally
| works through NAT. I 'm envisioning a host T that wanted
| incoming connections to pick a random "dummy" IP address ,
| publish (router IP, dummy IP) as its identity, and periodically
| send packets to the dummy IP address. Now a host S that wants
| to talk to T might be able to send an ICMP TTL-expired to T's
| router, pertaining to the dummy address. The router should see
| this and forward the packet to T.
|
| Of course this is contingent upon if IP addresses in ICMP
| fields are ingress policed the way the addresses in the IP
| header have become.
|
| (edit: hah. There is now a top-level comment pointing to an
| implementation of this idea)
| riffic wrote:
| ping is icmp not udp
| throwawaymaths wrote:
| It's super confusing because you can use udp to read icmp
| packets (but not send, iirc), and i might be wrong, but i
| remember seeing tuts that did this!!
| throwawaymaths wrote:
| Getting downvoted, so:
|
| https://stackoverflow.com/questions/13087097/how-to-get-
| icmp...
|
| Using a udp socket is the "classic" way of implementing
| ping on low privilege syystems
| yencabulator wrote:
| You can kindly ask the kernel networking stack to inform
| you of errors, but that is not the same as "using udp to
| read icmp packets".
| dgl wrote:
| "udp" in this context means unprivileged data gram, not
| UDP the protocol. For some reason go uses the confusing
| "udp" name in parts of its API. The docs for this kind of
| socket seem to only exist on the kernel commit:
| https://lwn.net/Articles/420800/
| voxic11 wrote:
| You might be interested in https://samy.pl/pwnat/
| Specifically, when the server starts up, it begins sending fixed
| ICMP echo request packets to the fixed address 3.3.3.3.
| We expect that these packets won't be returned.
| Now, 3.3.3.3 is *not* a host we have any access to, nor will we
| end up spoofing it. Instead, when a client wants to
| connect, the client (which knows the server IP address)
| sends an ICMP Time Exceeded packet to the server. The ICMP packet
| includes the "original" fixed packet that the server was sending
| to 3.3.3.3. The packet is INSIDE the computer. This
| harcoded packet is built into pwnat and acts as an
| identifier for pwnat. Why? Well, the client is
| pretending to be a hop on the Internet, politely telling
| the server that its original "ICMP echo request" packet couldn't
| be delivered. Your NAT, being the gapingly open device it
| is, is nice enough to notice that the packet *inside* the
| ICMP time exceeded packet matches the packet the server
| sent out. Your NAT then forwards the ICMP time exceeded
| back to the server behind the NAT, *including* the full IP header
| from the client, thus allowing the server to know what
| the client IP address is!
| throwawaymaths wrote:
| Tl;Dr (but do read it, it's very good): there's an id field in
| the icmp packet and netfilter is aware of icmp packets? Frames?
| as a "special case".
| demandingturtle wrote:
| [dead]
| Beinglis23 wrote:
| When a ping is sent from a device on a local network to a device
| on the internet, the router performing NAT rewrites the source
| address of the ping to its public IP address and rewrites the ID
| field of the ICMP packet to a unique value. When the response is
| received, the router uses the unique ID value to forward the
| response to the correct device on the local network.
| baq wrote:
| Taking this thought just a tiny bit further, this is changing a
| stateless protocol to a stateful one.
| mannyv wrote:
| You're confusing tracking the packets with protocol. It's not
| changing ICMP, it's tracking ICMP packets. That's a totally
| different thing.
| littlecranky67 wrote:
| Is or was a thing with NAT. Linux also comes with stateful
| modules (ip_conntrack*) to track and rewrite higher level
| protocols, such as FTP control connections.
| slt2021 wrote:
| NAT stands for Network Address Translation, which means a NAT
| device maintains a translation table of internal IPs to
| external, so that it can return response packets coming from
| Internet to a proper destination on the internal network.
|
| By definition NAT will maintain state which is translation
| table. Now that table can be dynamic or static, but it
| doesn't change the fact that there will be some state to
| maintain.
| starfallg wrote:
| Any NAT that is not statically mapping IP addresses or ports
| 1-to-1 will require connections to be tracked and hence makes
| it stateful on the side after the translation (usually
| outside).
|
| Hence you do need state syncing between firewalls in order
| for NAT connections to failover correctly, unless it's a
| statically mapped, one-on-one, one range onto another range,
| for example.
| devman0 wrote:
| This isn't really specific to NAT either, connection
| tracking is required for most firewalls as well even if NAT
| isn't in play just to implement the most basic ALLOW
| related,estabalished rule even, and especially, what would
| normally be connectionless protocols.
| starfallg wrote:
| Yes, tracking the state of connections (e.g. TCP) is
| needed enforce rules on OSI layers 4 - 7. That's kinda
| the typically scenario when we think of connection
| tracking and stateful enforcement of rules.
|
| I was just pointing out when NAT also requires connection
| tracking (i.e. when the NAT table needs to be built
| dynamically, as opposed to statically mapped).
| p1esk wrote:
| Why not use the source private IP instead of the "unique
| value"?
| justsomehnguy wrote:
| Or thinking about the proper way: how an operating system
| distinguish between two different ICMP 'talks' to the same
| destination.
|
| Bam, you only need one computer and wireshark/tcpdump.
|
| Sure, the article is nice and probably is enlightening for
| someone who never even thought about and doesn't have any
| networking understanding... honestly it's more about how to
| make a proper network lab and dig the sources but without
| thinking.
| rfmoz wrote:
| But the ID is on the ICMP header or it belongs to the IP part?
___________________________________________________________________
(page generated 2023-09-10 23:00 UTC)