[HN Gopher] How 1500 bytes became the MTU of the Internet (2020)
___________________________________________________________________
How 1500 bytes became the MTU of the Internet (2020)
Author : Jerry2
Score : 91 points
Date : 2021-06-29 10:18 UTC (12 hours ago)
(HTM) web link (blog.benjojo.co.uk)
(TXT) w3m dump (blog.benjojo.co.uk)
| EricE wrote:
| Oh man - I got hives the moment that picture loaded. I think I
| still have a passel of those cards in a box somewhere :p
| krylon wrote:
| Hehe, me too. I only have one old network card lying around,
| but for some reason it's a Token Ring card. I have no idea
| where it came from, because I never used it, and I think by the
| time it found its way into my collection of obsolete hardware,
| I did not even own a computer with an ISA slot...
| meepmorp wrote:
| RJ-45, AUI, and BNC.
|
| I miss BNC because it was fun to build little sculptures out of
| the terminators and connectors. Networking is faster these
| days, but does it still make for good fidget toys? I think not.
| anonymousisme wrote:
| From what I recall, the 1500 number was chosen to minimize the
| time wasted in the case of a collision. Collisions could happen
| on the original links because instead of point-to-point links
| like we have today, they all used the same shared channel. (This
| was in the days of hubs, before switches existed.) The CSMA
| (Carrier Sense, Multiple Access) algorithm would not transmit
| while another node was using the channel, then it would refrain
| from transmitting by a random length of time before sending. If
| two nodes began transmitting at the same time, a collision would
| occur and the Ethernet frames would be lost. Error detection and
| correction (if used) occurs at higher layers. So if a collision
| occurred with a 1500 byte MTU at 10Mbps, 1.2mS of time would be
| wasted. IIRC, the 1500 byte MTU was selected empirically.
|
| Another reason for the short MTU was the accuracy of the crystal
| oscillator on each Ethernet card. I believe 10-Base Ethernet used
| bi-phase/Manchester encoding, but instead of recovering the clock
| from the data and using it to shift the bits in to the receiver,
| the cheap hardware would just assume that everything was in sync.
| So if any crystal oscillator was off by more than 0.12%, the
| final bits of the frame would be corrupted.
|
| I've actually encountered Ethernet cards that had this problem.
| They would work fine with short frames, but then hang forever
| once a long one came across the wire. The first one I saw was a
| pain to troubleshoot. I could telnet into a host without trouble,
| but as soon as I typed 'ls', the session would hang. The ARP,
| SYN/ACK and Telnet login exchanges were small frames, but as soon
| as I requested the directory listing, the remote end sent frames
| at max MTU and they were never received. They would be
| perpetually re-tried and perpetually fail because of the
| frequency error of the oscillator in the (3C509) NIC.
| gumby wrote:
| > IIRC, the 1500 byte MTU was selected empirically.
|
| That's what I was told when I worked at PARC in the 1980s.
| Folks these days forget (or never learn) how Ethernet worked in
| those days.
|
| (There's no reason people _should_ know how it worked -- the
| modern work of IEEE 802 has as much relation to that as we do
| with lemurs).
| dbcurtis wrote:
| I would say more likely in the days of coax, before hubs
| existed.
| mmastrac wrote:
| There would probably be some concrete value to having a flag day
| on the internet where we just universally agree to bump the MTU
| up to something a little larger and gain across the board.
|
| If I'm not mistaken, this happened with both TLS and DNS - there
| were multiple days where all the major service providers flipped
| the bits to allow everyone to test the systems at scale and fix
| any issues that occurred.
| kubanczyk wrote:
| 14 days after IPv6 flag day, m'kay everyone?
| kazen44 wrote:
| this is far, far harder to do compared to DNS flag days.
|
| changing the value to something else is futile. What needs to
| happen is people should allow ICMP into their networks and
| allow PMTUD to work correctly. Path MTU discovery has solved
| this issue, but people keep dropping ALL icmp traffic.
| citrin_ru wrote:
| I have a related question. Jumbo frames (e. g. 9000) look well
| suited for internal VLANs in DC networks: DC grade
| routers/switches almost always support jumbo frames, PMTUD is not
| an issue for internal networks (servers in which don't talk to
| clients with filtered ICMP).
|
| Why are they so rarely used despite an almost free performance
| benefit?
| throw0101a wrote:
| > _Why are they so rarely used despite an almost free
| performance benefit?_
|
| Try measuring the performance benefit. You'll probably find
| there isn't much/any.
|
| It may have been a boost around 2000, but modern NICs have all
| sorts offloads now.
| kazen44 wrote:
| in DC Networks, 9k frames and pathmtu is very common. the issue
| however is that the MTU across the internet is 1500, and you
| cannot selectively use mtu's depending on routes, so you end up
| with 9k MTU at edge routers which need to split that into 1500
| MTU packets.
|
| this fragmentation is absolutely brutal to performance of edge
| routers.
| toast0 wrote:
| > you cannot selectively use mtu's depending on routes
|
| You absolutely can. Assume a single network port on the
| server.
|
| On BSD, it looks something like this:
|
| ifconfig foo 10.x.y.z/24 mtu 9000
|
| route add default 10.x.y.1 -mtu 1500
|
| route add 10.0.0.0/8 10.x.y.1
|
| Linux has a mss attribute on routes which should work
| similarly.
|
| Roll your own IPV6 version, but radvd can sort of maybe offer
| network routes I think, so it might be slightly easier to
| config it that way.
| r1ch wrote:
| Doing this requires fragmenting at the IP level, which is
| notoriously unreliable over the internet. The other option
| is to send PMTU exceeded messages telling the sender to
| drop their MTU for the path, adding significant overhead
| for every new connection (assuming the sender even listens
| to such messages).
| toast0 wrote:
| How so? In my example, the default route has a MTU of
| 1500 and the localnet route has a MTU of whatever the
| interface is (9000 in this example)
|
| When you send a packet, the destination route is looked
| up to determine the MTU. If you send to the internet,
| you'll get small packets (not IP fragments), and 10. will
| get large packets.
| r1ch wrote:
| That works fine for locally generated traffic which has
| knowledge of the routes. For hosts going through a
| router, they have no idea what the outbound link MTU is,
| requiring PMTU discovery or messing with TCP MSS.
| doublerabbit wrote:
| Would it not be possible to have another network device
| before the edge router that encapsulates the packet to 1500?
| And vice versa for inbound packets?
|
| I have a rack in a DC, I've always wanted to configure jumbo
| frames between the host and VMs. However I haven't because I
| am not sure of the recourse when serving traffic.
| kazen44 wrote:
| ofcourse this would be possible (with just another router)
| but this would still result in inefficienties.
|
| IP Fragmentation is something which you do not want (it was
| removed from IPv6 for a good reason). TCP does not like it
| when IP packets get fragmented, as this usually results in
| lost PDU's when there is even the slightest of packet loss.
| Which causes TCP to resend the PDU.
| Hikikomori wrote:
| The extra complexity (best practice is to run a separate
| vlan/subnet where clients have an extra NIC with high MTU)
| isn't worth it as the only real drawback is on the wire header
| overhead (NICs have LRO/LSO and other offload mechanisms).
| tyingq wrote:
| It is pretty common for networks used for local NFS storage, or
| other uses where the packets are somewhat guaranteed to not
| leave the facility, like between Oracle RAC nodes, etc.
| briffle wrote:
| They are used often in storage networks. Back when I worked at
| a vmware shop, we also implemented them on the vmotion network,
| and it cut down considerably moving very large DB servers to
| other hosts.
| toast0 wrote:
| No comments on history, AFAIK, Packet Length Limit: 1500 was
| found on a stone tablet.
|
| But note that many high profile sites purposefully do not send
| 1500 byte packets. Path MTU detection has many cases where it
| doesn't work, so it's simpler to say f*** it and set your
| effective MTU to 1450 or 1480 and call it a day.
|
| Reflecting the MSS in SYN minus 8 (assuming a poorly configured
| PPPoE link) or 20 (assuming a poorly configured IPIP tunnel) or
| even 28 (both) is probably more effective, but stats are hard to
| gather, and AFAIK no OS makes it easy to run a test, you'd need
| to patch things and divert packets and have a good representative
| section of users, etc.
| rjsw wrote:
| Some of that is down to the PPPoE standards. I added RFC4638
| support to the OS that I use but don't know how ISPs that don't
| support it will respond.
| netr0ute wrote:
| I'd expect those ISPs to use jumbo frames only where they use
| PPPoE so that it looks "transparent" to non-PPPoE traffic.
| rjsw wrote:
| That is the whole point of RFC4638, it allows you to
| request an MTU of 1508 for the PPPoE connection so that the
| tunnelled frames can be 1500 bytes. The problem is that it
| tries to be backwards compatible which makes it hard to
| deal with this request not getting granted.
| [deleted]
| toast0 wrote:
| Well yeah, PPPoE is super derpy; but at least the modems
| could be trained to adjust MSS on the way in and out to (tcp)
| 1472, but they often don't, and sometimes only adjust mss on
| the way out, not on the way in. My current PPPoE ISP doesn't
| support RFC4638, and my previous ISP was worse, I had a fiber
| connection with 1500 MTU, but their 6rd gateway was only 1492
| MTU (so ipv6 MTU was 1472 after the tunnel) and ICMP needs
| frag was rate limited, so IPv6 would only sometimes work
| until I figured out I needed to adjust the v6 MTU.
|
| Did you know Windows doesn't request (or process) MTU
| information on v4 DHCP? That's a nice touch.
| philjohn wrote:
| And without dedicated packet processing chips it's a
| nightmare on anything >= 1Gbps
| zamadatix wrote:
| by made it a bit easier: assume 1280 unless pmtud is able to
| show otherwise. V4 had something similar with 576 but it was a
| bit too small to just default to and started out before pmtud
| was settled so couldn't mandate it
| zamadatix wrote:
| "by"=what my phone thinks IPv6 should be.
| rubatuga wrote:
| Yep 1280 is the minimum for IPv6. Cloudflare used to do 1280,
| but then they said good riddance and increased it (for better
| efficiency)
|
| https://blog.cloudflare.com/increasing-ipv6-mtu/
| benjojo12 wrote:
| Previous Discussion :)
|
| https://news.ycombinator.com/item?id=22364830
___________________________________________________________________
(page generated 2021-06-29 23:02 UTC)