hngopher.com

       [HN Gopher] How 1500 bytes became the MTU of the Internet (2020)
       ___________________________________________________________________
        
       How 1500 bytes became the MTU of the Internet (2020)
        
       Author : Jerry2
       Score  : 91 points
       Date   : 2021-06-29 10:18 UTC (12 hours ago)
        
 (HTM) web link (blog.benjojo.co.uk)
 (TXT) w3m dump (blog.benjojo.co.uk)
        
       | EricE wrote:
       | Oh man - I got hives the moment that picture loaded. I think I
       | still have a passel of those cards in a box somewhere :p
        
         | krylon wrote:
         | Hehe, me too. I only have one old network card lying around,
         | but for some reason it's a Token Ring card. I have no idea
         | where it came from, because I never used it, and I think by the
         | time it found its way into my collection of obsolete hardware,
         | I did not even own a computer with an ISA slot...
        
         | meepmorp wrote:
         | RJ-45, AUI, and BNC.
         | 
         | I miss BNC because it was fun to build little sculptures out of
         | the terminators and connectors. Networking is faster these
         | days, but does it still make for good fidget toys? I think not.
        
       | anonymousisme wrote:
       | From what I recall, the 1500 number was chosen to minimize the
       | time wasted in the case of a collision. Collisions could happen
       | on the original links because instead of point-to-point links
       | like we have today, they all used the same shared channel. (This
       | was in the days of hubs, before switches existed.) The CSMA
       | (Carrier Sense, Multiple Access) algorithm would not transmit
       | while another node was using the channel, then it would refrain
       | from transmitting by a random length of time before sending. If
       | two nodes began transmitting at the same time, a collision would
       | occur and the Ethernet frames would be lost. Error detection and
       | correction (if used) occurs at higher layers. So if a collision
       | occurred with a 1500 byte MTU at 10Mbps, 1.2mS of time would be
       | wasted. IIRC, the 1500 byte MTU was selected empirically.
       | 
       | Another reason for the short MTU was the accuracy of the crystal
       | oscillator on each Ethernet card. I believe 10-Base Ethernet used
       | bi-phase/Manchester encoding, but instead of recovering the clock
       | from the data and using it to shift the bits in to the receiver,
       | the cheap hardware would just assume that everything was in sync.
       | So if any crystal oscillator was off by more than 0.12%, the
       | final bits of the frame would be corrupted.
       | 
       | I've actually encountered Ethernet cards that had this problem.
       | They would work fine with short frames, but then hang forever
       | once a long one came across the wire. The first one I saw was a
       | pain to troubleshoot. I could telnet into a host without trouble,
       | but as soon as I typed 'ls', the session would hang. The ARP,
       | SYN/ACK and Telnet login exchanges were small frames, but as soon
       | as I requested the directory listing, the remote end sent frames
       | at max MTU and they were never received. They would be
       | perpetually re-tried and perpetually fail because of the
       | frequency error of the oscillator in the (3C509) NIC.
        
         | gumby wrote:
         | > IIRC, the 1500 byte MTU was selected empirically.
         | 
         | That's what I was told when I worked at PARC in the 1980s.
         | Folks these days forget (or never learn) how Ethernet worked in
         | those days.
         | 
         | (There's no reason people _should_ know how it worked -- the
         | modern work of IEEE 802 has as much relation to that as we do
         | with lemurs).
        
         | dbcurtis wrote:
         | I would say more likely in the days of coax, before hubs
         | existed.
        
       | mmastrac wrote:
       | There would probably be some concrete value to having a flag day
       | on the internet where we just universally agree to bump the MTU
       | up to something a little larger and gain across the board.
       | 
       | If I'm not mistaken, this happened with both TLS and DNS - there
       | were multiple days where all the major service providers flipped
       | the bits to allow everyone to test the systems at scale and fix
       | any issues that occurred.
        
         | kubanczyk wrote:
         | 14 days after IPv6 flag day, m'kay everyone?
        
         | kazen44 wrote:
         | this is far, far harder to do compared to DNS flag days.
         | 
         | changing the value to something else is futile. What needs to
         | happen is people should allow ICMP into their networks and
         | allow PMTUD to work correctly. Path MTU discovery has solved
         | this issue, but people keep dropping ALL icmp traffic.
        
       | citrin_ru wrote:
       | I have a related question. Jumbo frames (e. g. 9000) look well
       | suited for internal VLANs in DC networks: DC grade
       | routers/switches almost always support jumbo frames, PMTUD is not
       | an issue for internal networks (servers in which don't talk to
       | clients with filtered ICMP).
       | 
       | Why are they so rarely used despite an almost free performance
       | benefit?
        
         | throw0101a wrote:
         | > _Why are they so rarely used despite an almost free
         | performance benefit?_
         | 
         | Try measuring the performance benefit. You'll probably find
         | there isn't much/any.
         | 
         | It may have been a boost around 2000, but modern NICs have all
         | sorts offloads now.
        
         | kazen44 wrote:
         | in DC Networks, 9k frames and pathmtu is very common. the issue
         | however is that the MTU across the internet is 1500, and you
         | cannot selectively use mtu's depending on routes, so you end up
         | with 9k MTU at edge routers which need to split that into 1500
         | MTU packets.
         | 
         | this fragmentation is absolutely brutal to performance of edge
         | routers.
        
           | toast0 wrote:
           | > you cannot selectively use mtu's depending on routes
           | 
           | You absolutely can. Assume a single network port on the
           | server.
           | 
           | On BSD, it looks something like this:
           | 
           | ifconfig foo 10.x.y.z/24 mtu 9000
           | 
           | route add default 10.x.y.1 -mtu 1500
           | 
           | route add 10.0.0.0/8 10.x.y.1
           | 
           | Linux has a mss attribute on routes which should work
           | similarly.
           | 
           | Roll your own IPV6 version, but radvd can sort of maybe offer
           | network routes I think, so it might be slightly easier to
           | config it that way.
        
             | r1ch wrote:
             | Doing this requires fragmenting at the IP level, which is
             | notoriously unreliable over the internet. The other option
             | is to send PMTU exceeded messages telling the sender to
             | drop their MTU for the path, adding significant overhead
             | for every new connection (assuming the sender even listens
             | to such messages).
        
               | toast0 wrote:
               | How so? In my example, the default route has a MTU of
               | 1500 and the localnet route has a MTU of whatever the
               | interface is (9000 in this example)
               | 
               | When you send a packet, the destination route is looked
               | up to determine the MTU. If you send to the internet,
               | you'll get small packets (not IP fragments), and 10. will
               | get large packets.
        
               | r1ch wrote:
               | That works fine for locally generated traffic which has
               | knowledge of the routes. For hosts going through a
               | router, they have no idea what the outbound link MTU is,
               | requiring PMTU discovery or messing with TCP MSS.
        
           | doublerabbit wrote:
           | Would it not be possible to have another network device
           | before the edge router that encapsulates the packet to 1500?
           | And vice versa for inbound packets?
           | 
           | I have a rack in a DC, I've always wanted to configure jumbo
           | frames between the host and VMs. However I haven't because I
           | am not sure of the recourse when serving traffic.
        
             | kazen44 wrote:
             | ofcourse this would be possible (with just another router)
             | but this would still result in inefficienties.
             | 
             | IP Fragmentation is something which you do not want (it was
             | removed from IPv6 for a good reason). TCP does not like it
             | when IP packets get fragmented, as this usually results in
             | lost PDU's when there is even the slightest of packet loss.
             | Which causes TCP to resend the PDU.
        
         | Hikikomori wrote:
         | The extra complexity (best practice is to run a separate
         | vlan/subnet where clients have an extra NIC with high MTU)
         | isn't worth it as the only real drawback is on the wire header
         | overhead (NICs have LRO/LSO and other offload mechanisms).
        
         | tyingq wrote:
         | It is pretty common for networks used for local NFS storage, or
         | other uses where the packets are somewhat guaranteed to not
         | leave the facility, like between Oracle RAC nodes, etc.
        
         | briffle wrote:
         | They are used often in storage networks. Back when I worked at
         | a vmware shop, we also implemented them on the vmotion network,
         | and it cut down considerably moving very large DB servers to
         | other hosts.
        
       | toast0 wrote:
       | No comments on history, AFAIK, Packet Length Limit: 1500 was
       | found on a stone tablet.
       | 
       | But note that many high profile sites purposefully do not send
       | 1500 byte packets. Path MTU detection has many cases where it
       | doesn't work, so it's simpler to say f*** it and set your
       | effective MTU to 1450 or 1480 and call it a day.
       | 
       | Reflecting the MSS in SYN minus 8 (assuming a poorly configured
       | PPPoE link) or 20 (assuming a poorly configured IPIP tunnel) or
       | even 28 (both) is probably more effective, but stats are hard to
       | gather, and AFAIK no OS makes it easy to run a test, you'd need
       | to patch things and divert packets and have a good representative
       | section of users, etc.
        
         | rjsw wrote:
         | Some of that is down to the PPPoE standards. I added RFC4638
         | support to the OS that I use but don't know how ISPs that don't
         | support it will respond.
        
           | netr0ute wrote:
           | I'd expect those ISPs to use jumbo frames only where they use
           | PPPoE so that it looks "transparent" to non-PPPoE traffic.
        
             | rjsw wrote:
             | That is the whole point of RFC4638, it allows you to
             | request an MTU of 1508 for the PPPoE connection so that the
             | tunnelled frames can be 1500 bytes. The problem is that it
             | tries to be backwards compatible which makes it hard to
             | deal with this request not getting granted.
        
             | [deleted]
        
           | toast0 wrote:
           | Well yeah, PPPoE is super derpy; but at least the modems
           | could be trained to adjust MSS on the way in and out to (tcp)
           | 1472, but they often don't, and sometimes only adjust mss on
           | the way out, not on the way in. My current PPPoE ISP doesn't
           | support RFC4638, and my previous ISP was worse, I had a fiber
           | connection with 1500 MTU, but their 6rd gateway was only 1492
           | MTU (so ipv6 MTU was 1472 after the tunnel) and ICMP needs
           | frag was rate limited, so IPv6 would only sometimes work
           | until I figured out I needed to adjust the v6 MTU.
           | 
           | Did you know Windows doesn't request (or process) MTU
           | information on v4 DHCP? That's a nice touch.
        
             | philjohn wrote:
             | And without dedicated packet processing chips it's a
             | nightmare on anything >= 1Gbps
        
         | zamadatix wrote:
         | by made it a bit easier: assume 1280 unless pmtud is able to
         | show otherwise. V4 had something similar with 576 but it was a
         | bit too small to just default to and started out before pmtud
         | was settled so couldn't mandate it
        
           | zamadatix wrote:
           | "by"=what my phone thinks IPv6 should be.
        
           | rubatuga wrote:
           | Yep 1280 is the minimum for IPv6. Cloudflare used to do 1280,
           | but then they said good riddance and increased it (for better
           | efficiency)
           | 
           | https://blog.cloudflare.com/increasing-ipv6-mtu/
        
       | benjojo12 wrote:
       | Previous Discussion :)
       | 
       | https://news.ycombinator.com/item?id=22364830
        
       ___________________________________________________________________
       (page generated 2021-06-29 23:02 UTC)