[HN Gopher] How does Linux NAT a ping?
       ___________________________________________________________________
        
       How does Linux NAT a ping?
        
       Author : willdaly
       Score  : 209 points
       Date   : 2023-09-10 13:28 UTC (9 hours ago)
        
 (HTM) web link (devnonsense.com)
 (TXT) w3m dump (devnonsense.com)
        
       | jjoonathan wrote:
       | NAT is such a trashy abstraction. IPv4 needs to die.
        
         | riffic wrote:
         | be mindful of the Lindy effect, an observation on the future
         | longevity of non-perishable things like technology or an idea
         | being proportional to their current age, ipv4 due to its age
         | will likely be around for quite some time to come.
         | 
         | https://en.wikipedia.org/wiki/Lindy_effect
        
         | midasuni wrote:
         | I have a few devices on my home internet, on a handful of
         | 192.168 subnets
         | 
         | The other week I moved my ISP. The AS my house belonged to
         | obviously changed to the new ISP, and I got a new v4 IP
         | 
         | All I had to do was update my Wan router to forward trafffic
         | from the new Ip.
         | 
         | Instead with ipv6 I would have to change every node on my
         | network, update my internal DNS.
         | 
         | Now in theory I could have my own /48 which I take with me.
         | That relies on my new ISP being willing to advertise it (which
         | my current one does) but it's not particularly common.
         | 
         | However a week ago my phone line was cut. I got a 5g mifi out
         | and moved my wan connectivity through that until the cable was
         | fixed. Again a nice simple masquerade on that interface and all
         | was good (well not that good - very poor signal where I live)
         | 
         | But the elephant in the room is of course all that ipv6 stuff
         | aside, I still need to run a dual stack (or use trashy nat
         | abstractions). It increases my work for no benefit.
         | 
         | But taking about work, how about there?
         | 
         | I have a fleet of vehicles on internal 172.16/12 subnets, they
         | plug together and route to each other, and route from where
         | they are via a variety of vpn connectivity (hoping that at
         | least one method will work, as there's rarely a signal in the
         | basements these park)
         | 
         | If I moved them to ipv6 then again I'm back to having to move
         | my /48s. Except these vehicles get internet from various
         | sporting venues - most of which struggle to turn off MITM/443
         | or unblock UDP, that's just not going to work in a world where
         | they turn up at 10am Saturday morning and need to be working 2
         | hours later.
         | 
         | What business benefit is there for me to double the workload
         | and double the risk by moving to dual stack?
        
           | noinsight wrote:
           | You could be using IPv6 ULA addresses internally on your home
           | network to have static addressing. The real solution is
           | moving to DNS names though with your router maintaining them
           | based on DHCP leases or just using multicast DNS (Zeroconf).
           | 
           | In the future you can probably go "IPv6-mostly" with a CLAT
           | engine to ditch dual-stack:
           | https://blog.apnic.net/2022/11/21/deploying-ipv6-mostly-
           | acce...
        
             | qhwudbebd wrote:
             | > In the future you can probably go "IPv6-mostly" with a
             | CLAT engine
             | 
             | ...although there still isn't any kernel support for the
             | necessary SIIT v4<->v6 translation, so to implement CLAT
             | you end up using unmaintained (and unmergeably bad) out-of-
             | tree kernel modules or unmaintained (and slow) userspace
             | daemons hanging off a tuntap interface.
        
             | dmatech wrote:
             | You could, but now you have three addresses per node
             | instead of one. Plus, the mechanisms for assigning those
             | addresses are weird compared to DHCP and static assignment.
             | I get that it facilitates packets being routed reliably,
             | but some of us want maintainable firewall rules that don't
             | have to deal with IP addresses changing out of the blue.
        
           | jandrese wrote:
           | With IPv6 you would do stateless autoconfigurarion, so there
           | would be no manually setting of your addresses. The router
           | would advertise the new prefix and everything would just use
           | it.
           | 
           | There would be no DNS configuration at all, all local
           | machines would use anycast DNS for the services and a well
           | known server for Internet addresses.
           | 
           | One of the primary goals of IPv6 was to avoid needing manual
           | configuration if anything on the network. It is supposed to
           | be as automated as possible.
        
             | midasuni wrote:
             | And now I can't find anything because mdns doesn't work,
             | half my kit won't take dns entries, more fragility from
             | systems which don't exist, or and of course all my open
             | sessions on local networks break as ip addresses change,
             | not to mention all my WireGuard sessions.
        
             | 3np wrote:
             | > There would be no DNS configuration at all, all local
             | machines would use anycast DNS for the services and a well
             | known server for Internet addresses.
             | 
             | Assumptions and dragons be here.
        
               | midasuni wrote:
               | MDNS, specifically designed for use on a single vlan, so
               | useless
        
           | pixl97 wrote:
           | I do believe there is some kind of 1:1 NAT with IPv6 these
           | days, which is way better than 1:Many of IPv4. There are so
           | many potentially useful applications that are DOA because of
           | v4 NAT being everywhere.
        
             | midasuni wrote:
             | Those applications are DOA because of firewall
             | administrators that barely allow tcp/443 through.
        
           | [deleted]
        
         | littlecranky67 wrote:
         | Not sure IPv6 will fix this. Technically, yes it does. But
         | major providers only assigning a /64 to a home user (and
         | charging hefty fees for "buisness use" /48) already leads to
         | IPv6 NAT or segmenting the /64 further - which shoulnt be done.
        
           | lazide wrote:
           | Most seem to have stopped and are handing out /48's in my
           | experience. Do you know any not doing that still?
        
         | mindslight wrote:
         | Is there a better way to not unnecessarily leak addressing
         | metadata to adversarial remote nodes and middle boxes?
         | 
         | IPv6 with assigning end users a whole /64 and end-devices
         | continually churning through privacy addresses is a start. But
         | even then some form of NAT is still required to nimbly use
         | source prefixes from different horizon providers - eg to avoid
         | spilling your geographic location or opening yourself up to
         | low-effort legal shakedowns.
         | 
         | An example: on my local network I've got an everyday web
         | browsing VM and a torrent VM. They each have static 192.168.x.x
         | addresses, both so I can ssh in for administration and also to
         | control their view of network services. They each see a
         | completely different Internet horizon through the router - the
         | web browsing goes out from a rotating datacenter IP, and the
         | torrent one goes out from a consumer VPN. Each of those
         | outgoing horizons uses NAT - any of my hosts using that
         | rotating data center IP appears the same, and any of my host
         | using the consumer VPN appears the same as every other customer
         | using that same VPN node.
         | 
         | What is the no-NAT equivalent of this? Make that rotating data
         | center IP and VPN external IP into subnet allocations, somehow
         | feed that addressing information back to the hosts that are
         | using it, and dual-home each VM with two routable addresses?
         | For equivalent mixing on the consumer VPN there would also need
         | to be some ARP-like protocol that let me continually rotate the
         | address.
        
           | jandrese wrote:
           | Why not just use a VPN in both cases? That's more or less
           | what your NAT solution is doing, except without the
           | encryption to the data center.
        
             | mindslight wrote:
             | It _is_ a wireguard tunnel to the data center, but my
             | comment was focused on the addressing.
        
           | 3np wrote:
           | > What is the no-NAT equivalent of this?
           | 
           | At least for web-browsing and other HTTP/TCP use-cases: Cut
           | off internet from your hosts and use centralized local
           | proxies for all outgoing connections. Presumably you already
           | have reverse proxies in place for the incoming. There is no
           | need for NAT if all the traffic is taken care of in higher
           | layers. This reduces your consideration to the internet-
           | facing forward- and reverse-proxies only.
           | 
           | Sounds like you already have bittorrent figured out via VPN
           | (Wireguard I guess? Well there we have one more UDP exit-
           | point to consider).
           | 
           | BTW, I largely agree with your sentiment: Benefit of
           | (especially migrating to) IPv6/DS for individual networks is
           | often unclear or questionable and metadata privacy is a valid
           | consideration where I believe correct solutions are not
           | readily available and understood even by your well-
           | intentioned and seasoned senior admins. Maybe globally the
           | number of people who will get this right ranges in the 1000s?
           | 10,000s if we're lucky? How many networks do we need to
           | migrate again for "IPv4 to die"?
           | 
           | I guess the only way forward is for more people to do that
           | migration and share their findings and solutions, though ;)
        
             | mindslight wrote:
             | The general ignorance of the privacy benefits of NAT are
             | what I'm reacting against too. It's certainly regrettable
             | that end users are forced into NAT [0], but since then a
             | shameless surveillance industry has cropped up, looking to
             | exploit every bit of identifying information that it can.
             | And it seems that calls for native IPv6 with everything
             | having its own distinct address generally just ignore the
             | practical privacy implications.
             | 
             | It certainly seems _possible_ to get a NAT-equivalent
             | privacy from properly set up SLAAC. Although a sibling
             | comment says that the proposal for variable length prefixes
             | was just submitted _this year_?!? Equivalent privacy would
             | also require things like consumer VPN providers allowing
             | you to request a few new addresses every few minutes,
             | whereas NAT makes a shared uniform distribution the
             | default.
             | 
             | Using a proxy instead of NAT is a good point, although
             | there are certainly reasons I moved towards managing egress
             | flows at the packet level with VMs rather than configuring
             | software to play nice with proxies. And spiritually I would
             | say that a proxy is an even more heavyweight version of NAT
             | one layer up.
             | 
             | [0] Although I don't personally think the web would have
             | developed any less centralized without NAT as many people
             | like to imagine
        
         | jaimex2 wrote:
         | IPv6 needs to die also. It had more than enough time to become
         | dominant and has just floundered.
        
         | kazinator wrote:
         | You need NAT (or something else that is worse in some respects,
         | like port forwarding) in any situation in which your subnet is
         | given only one address upstream, even if it is an IPv6 address.
        
           | paulddraper wrote:
           | Yeah, but there's no reason to do that with IPv6
        
           | xxpor wrote:
           | If your ISP doesn't do PD with v6, their implementation
           | sucks. Even my crappy 6rd setup from CenturyLink gives me
           | iirc an entire /48.
        
             | midasuni wrote:
             | Many ISPs suck. That's not controversial.
             | 
             | We have to deal with the world we live in, not the world
             | we'd like.
        
               | pantalaimon wrote:
               | That's why variable length SLAAC has been proposed
               | 
               | https://datatracker.ietf.org/doc/draft-mishra-6man-
               | variable-...
        
               | growse wrote:
               | > Many ISPs suck. That's not controversial.
               | 
               | > We have to deal with the world we live in, not the
               | world we'd like.
               | 
               | No we don't. Some choose to just put up with shittiness,
               | others enact change.
        
               | toast0 wrote:
               | Priorities. I don't have to put up with PPPoE in 2023,
               | but it's a hell of a lot less expensive than pulling
               | munifiber to my garage (and the monthly fees for
               | munifiber are higher too, so there's no point in time
               | where it makes economic sense), and consistency and
               | stable addressing is currently winning over the promise
               | of 5g/leo satellite.
        
               | growse wrote:
               | Sure, but you're choosing that prioritisation. It's not
               | being forced on you.
        
               | kazinator wrote:
               | OP here.
               | 
               | Your "ISP" is a sysadmin at work who gives you one
               | address to your cube.
               | 
               | You otherwise like the work and the team, and the
               | compensation is fine.
               | 
               | Now what?
        
               | growse wrote:
               | > Now what?
               | 
               | You advocate for change. You make the case.
               | 
               | You might not win the battle, but you're by no means
               | forced to accept the status quo. The more who fight the
               | battle, the more win. The more win, the faster progress,
               | which benefits us all.
        
               | kazinator wrote:
               | I set up NAT, I move on.
               | 
               | If you can solve a problem technically, without involving
               | people, that is best.
        
         | trustingtrust wrote:
         | You'll hate CG-NAT even more then.
        
           | lorenzo95 wrote:
           | heh ... It's all IPv6 ULA here with Nat66
        
           | i80and wrote:
           | The first time I encountered CGNAT was such a rude shock. I
           | don't think it should be legal to market it as "internet" to
           | consumers
        
       | nanmu42 wrote:
       | Good post.
       | 
       | Coincidently, I was struggling with Netfilter this weekend to
       | enable transparent proxy on my OpenWRT router.
       | 
       | For the curious, the go-to resources for Netfilter are:
       | 
       | 1. https://wiki.nftables.org/wiki-nftables/index.php/Main_Page
       | 
       | 2. https://www.netfilter.org/projects/nftables/manpage.html
        
       | viopq wrote:
       | It's refreshing to see a "how does" which actually drills down
       | through layers of abstraction all the way to the source code.
       | Nicely explained and very informative!
        
         | peter_l_downs wrote:
         | I came here to write this. Routing and networking is still
         | confusing for me and all the writing about it is usually very
         | "abstract" to me. A hands-on example like this one is really
         | appreciated. Nice work, OP. I'll try to do it myself and follow
         | along.
         | 
         | EDIT: one of the only other posts about this stuff that has
         | made much sense to me is this one from Tailscale. It contains
         | lots of "worked out examples" that really make it clear how
         | everything fits together.
         | 
         | https://tailscale.com/blog/how-nat-traversal-works/
        
           | mindslight wrote:
           | IME if you're digging into the finer points of netfilter, you
           | eventually run up against the limits of published
           | documentation and _have_ to dig into the source code to
           | figure some things out.
        
       | kazinator wrote:
       | Since there is no port in ICMP, NAT doesn't have to deal with the
       | problem of sending the ICMP echo reply back to the correct port.
       | 
       | ICMP echo requests have an ID, and that's effectively the same as
       | a source port number.
       | 
       | Correct NAT handling of ICMP echo has to remap the ID in both
       | directions, the same way that correct handling of UDP remaps the
       | source port.
       | 
       | Reason being, if the machine behind NAT is being pinged at the
       | same time by two different hosts, and they happen to use the same
       | request numbers, then it is ambiguous.
       | 
       | Another possibility is not to rewrite the identifiers, but keep a
       | list of remote machines associated with each ID. When there is a
       | clashing ID, the list contains two or more entries (remote IP
       | addresses). So then, when a reply is received from the machine
       | behind the NAT gateway, the NAT chooses one of the entries in the
       | list (say, the least recently added one) and sends the reply to
       | that machine. Then removes the entry.
        
       | zackmorris wrote:
       | I wonder if ping could be abused to send short messages for p2p
       | networking over UDP without a central server to handle NAT
       | busting. Looks like someone figured the message part out:
       | 
       | https://stackoverflow.com/questions/31857419/how-to-send-a-m...
       | 
       | Unfortunately ping is handled by the OS so apps on the peer IPs
       | wouldn't be able to read the messages.
       | 
       | I wonder if it's time to provide hooks to some of these services
       | in user space to make true p2p under double-ended NAT possible.
       | At least a readonly event stream or something. It just feels like
       | the barriers preventing that are entirely artificial now.
        
         | Bluecobra wrote:
         | As IPv6 gains more and more adoption this should become less as
         | an issue if everyone has a publicly routable IP and can avoid
         | NAT altogether.
        
         | freedomben wrote:
         | Minor technical correction, but ping is ICMP rather than UDP.
         | 
         | But I have seen data exfiltration strategies and other
         | communication that uses ping! Nowadays I think it would be
         | nearly impossible for p2p because most firewall default configs
         | will silently drop all ICMP, including pings.
        
           | lazide wrote:
           | Nod, I remember it not being as effective/easy to hide as
           | exfiltration over UDP/DNS too, as there was always less
           | background noise to hide in. That said, I found this with a
           | quick search - https://github.com/utoni/ptunnel-ng for those
           | who still want to do it. A number of hotels and captive
           | portals still let pings through relatively unmolested even if
           | they play tricks with UDP/TCP.
           | 
           | Any significant data over ICMP will always stick out though
           | if anyone is doing analysis. Which isn't often, frankly, in
           | situations like I described, but...
        
           | jandrese wrote:
           | Note that blanket dropping of ICMP will break Path MTU
           | Discovery (PMTUD) so you had better not be tunneling or
           | encapsulating TCP traffic.
        
             | zinekeller wrote:
             | Actually, ICMP-based PMTUD is almost dead in IPv4 due to
             | this exact problem (since ICMP isn't a "protected" protocol
             | which is required for IPv4 connectivity), most _actual_
             | services tend to do the MTU discovery purely using UDP or
             | even using TCP
             | (https://datatracker.ietf.org/doc/html/rfc4821)
        
         | oasisaimlessly wrote:
         | It exists: https://samy.pl/pwnat/
         | 
         | (from top comment)
        
         | mindslight wrote:
         | Interesting idea. It would seem that 'id' is effectively
         | equivalent of (sport, dport), but 16 bits is a much smaller
         | space than 32.
         | 
         | But isn't the main problem with NAT punching that it requires
         | activity on both ends to create a connection? Thus it always
         | requires a coordination server to let node T (target) that node
         | S (source) is trying to talk to it.
         | 
         | You've got me thinking though. I wonder if there is a way to do
         | this with ICMP routing messages - unreachable, TTL expired,
         | etc. You can traceroute to some IP address, and get back
         | packets from _other_ arbitrary IP addresses, and this generally
         | works through NAT. I 'm envisioning a host T that wanted
         | incoming connections to pick a random "dummy" IP address ,
         | publish (router IP, dummy IP) as its identity, and periodically
         | send packets to the dummy IP address. Now a host S that wants
         | to talk to T might be able to send an ICMP TTL-expired to T's
         | router, pertaining to the dummy address. The router should see
         | this and forward the packet to T.
         | 
         | Of course this is contingent upon if IP addresses in ICMP
         | fields are ingress policed the way the addresses in the IP
         | header have become.
         | 
         | (edit: hah. There is now a top-level comment pointing to an
         | implementation of this idea)
        
         | riffic wrote:
         | ping is icmp not udp
        
           | throwawaymaths wrote:
           | It's super confusing because you can use udp to read icmp
           | packets (but not send, iirc), and i might be wrong, but i
           | remember seeing tuts that did this!!
        
             | throwawaymaths wrote:
             | Getting downvoted, so:
             | 
             | https://stackoverflow.com/questions/13087097/how-to-get-
             | icmp...
             | 
             | Using a udp socket is the "classic" way of implementing
             | ping on low privilege syystems
        
               | yencabulator wrote:
               | You can kindly ask the kernel networking stack to inform
               | you of errors, but that is not the same as "using udp to
               | read icmp packets".
        
               | dgl wrote:
               | "udp" in this context means unprivileged data gram, not
               | UDP the protocol. For some reason go uses the confusing
               | "udp" name in parts of its API. The docs for this kind of
               | socket seem to only exist on the kernel commit:
               | https://lwn.net/Articles/420800/
        
       | voxic11 wrote:
       | You might be interested in https://samy.pl/pwnat/
       | Specifically, when the server starts up, it begins sending fixed
       | ICMP echo         request packets to the fixed address 3.3.3.3.
       | We expect that these packets         won't be returned.
       | Now, 3.3.3.3 is *not* a host we have any access to, nor will we
       | end up spoofing         it. Instead, when a client wants to
       | connect, the client (which knows the server         IP address)
       | sends an ICMP Time Exceeded packet to the server. The ICMP packet
       | includes the "original" fixed packet that the server was sending
       | to 3.3.3.3.         The packet is INSIDE the computer. This
       | harcoded packet is built into pwnat         and acts as an
       | identifier for pwnat.              Why? Well, the client is
       | pretending to be a hop on the Internet, politely         telling
       | the server that its original "ICMP echo request" packet couldn't
       | be         delivered. Your NAT, being the gapingly open device it
       | is, is nice enough to         notice that the packet *inside* the
       | ICMP time exceeded packet matches the         packet the server
       | sent out. Your NAT then forwards the ICMP time exceeded
       | back to the server behind the NAT, *including* the full IP header
       | from the         client, thus allowing the server to know what
       | the client IP address is!
        
       | throwawaymaths wrote:
       | Tl;Dr (but do read it, it's very good): there's an id field in
       | the icmp packet and netfilter is aware of icmp packets? Frames?
       | as a "special case".
        
       | demandingturtle wrote:
       | [dead]
        
       | Beinglis23 wrote:
       | When a ping is sent from a device on a local network to a device
       | on the internet, the router performing NAT rewrites the source
       | address of the ping to its public IP address and rewrites the ID
       | field of the ICMP packet to a unique value. When the response is
       | received, the router uses the unique ID value to forward the
       | response to the correct device on the local network.
        
         | baq wrote:
         | Taking this thought just a tiny bit further, this is changing a
         | stateless protocol to a stateful one.
        
           | mannyv wrote:
           | You're confusing tracking the packets with protocol. It's not
           | changing ICMP, it's tracking ICMP packets. That's a totally
           | different thing.
        
           | littlecranky67 wrote:
           | Is or was a thing with NAT. Linux also comes with stateful
           | modules (ip_conntrack*) to track and rewrite higher level
           | protocols, such as FTP control connections.
        
           | slt2021 wrote:
           | NAT stands for Network Address Translation, which means a NAT
           | device maintains a translation table of internal IPs to
           | external, so that it can return response packets coming from
           | Internet to a proper destination on the internal network.
           | 
           | By definition NAT will maintain state which is translation
           | table. Now that table can be dynamic or static, but it
           | doesn't change the fact that there will be some state to
           | maintain.
        
           | starfallg wrote:
           | Any NAT that is not statically mapping IP addresses or ports
           | 1-to-1 will require connections to be tracked and hence makes
           | it stateful on the side after the translation (usually
           | outside).
           | 
           | Hence you do need state syncing between firewalls in order
           | for NAT connections to failover correctly, unless it's a
           | statically mapped, one-on-one, one range onto another range,
           | for example.
        
             | devman0 wrote:
             | This isn't really specific to NAT either, connection
             | tracking is required for most firewalls as well even if NAT
             | isn't in play just to implement the most basic ALLOW
             | related,estabalished rule even, and especially, what would
             | normally be connectionless protocols.
        
               | starfallg wrote:
               | Yes, tracking the state of connections (e.g. TCP) is
               | needed enforce rules on OSI layers 4 - 7. That's kinda
               | the typically scenario when we think of connection
               | tracking and stateful enforcement of rules.
               | 
               | I was just pointing out when NAT also requires connection
               | tracking (i.e. when the NAT table needs to be built
               | dynamically, as opposed to statically mapped).
        
         | p1esk wrote:
         | Why not use the source private IP instead of the "unique
         | value"?
        
         | justsomehnguy wrote:
         | Or thinking about the proper way: how an operating system
         | distinguish between two different ICMP 'talks' to the same
         | destination.
         | 
         | Bam, you only need one computer and wireshark/tcpdump.
         | 
         | Sure, the article is nice and probably is enlightening for
         | someone who never even thought about and doesn't have any
         | networking understanding... honestly it's more about how to
         | make a proper network lab and dig the sources but without
         | thinking.
        
         | rfmoz wrote:
         | But the ID is on the ICMP header or it belongs to the IP part?
        
       ___________________________________________________________________
       (page generated 2023-09-10 23:00 UTC)