[HN Gopher] Ethernet History Deepdive - Why Do We Have Different...
       ___________________________________________________________________
        
       Ethernet History Deepdive - Why Do We Have Different Frame Types?
        
       Author : un_ess
       Score  : 54 points
       Date   : 2024-08-22 08:29 UTC (8 hours ago)
        
 (HTM) web link (lostintransit.se)
 (TXT) w3m dump (lostintransit.se)
        
       | userbinator wrote:
       | _Ironically, this version of the header published in 1980 is what
       | we still use to this day._
       | 
       | IMHO Ethernet is one of the of great examples of backwards
       | compatibility in the computing world. Even the wireless standards
       | present frames to the upper layers like they're Ethernet. It's
       | also a counterexample to the bureaucracy of standards bodies ---
       | the standard that actually became widely used was the one that
       | got released first. The other example that comes to mind is OSI
       | vs DoD(TCP/IP).
        
         | betaby wrote:
         | OSI is very widely used, most of the large ISPs use it. End
         | consumers are just unaware of that fact. See
         | https://en.wikipedia.org/wiki/IS-IS
        
           | tialaramex wrote:
           | I think this is a stretch, like saying the X.500 Directory
           | System is widely used based on the fact that PKIX is
           | technically adapting X.509 and thus your TLS certificates
           | depend on the X.500 directory system. End users aren't just
           | "unaware" that it's actually the X.500 system, it
           | functionally isn't the X.500 system, PKIX mandates an
           | "alternate" scheme for the Internet and the directory called
           | for by X.500 has never actually existed.
           | 
           | Likewise then, IS-IS is the protocol that OSI standardized,
           | but we're not using as part of an OSI system.
        
             | layer8 wrote:
             | X.500 is widely used in the form of LDAP and Active
             | Directory, however.
        
               | tristor wrote:
               | Active Directory is not based on X.500 and LDAP was
               | directly created as an alternative to the DAP standard
               | that is part of X.500.
               | 
               | While X.500 is a precursor to both of these things, and
               | influenced both of these things, and both of these things
               | interoperated with X.500, they are not X.500. X.500 is
               | for all intents and purposes pretty much dead in 2024,
               | although I did deploy an X.500 based directory service in
               | 2012 and it's probably still alive and running.
        
         | rjsw wrote:
         | OSI was usable as a wide area network before TCP/IP was.
        
       | dale_glass wrote:
       | I wish we could have another and bump the packet size.
       | 
       | We're at the point where we can have millions of packets per
       | second going through a network interface, and it starts to get
       | very silly.
       | 
       | It's at the point where even a 10G connection requires some
       | thought to actually perform properly. I've managed to get
       | bottlenecked on high end hardware requiring a whole detour into
       | SR-IOV just to get back to decent speeds.
        
         | supahfly_remix wrote:
         | > I wish we could have another and bump the packet size.
         | 
         | The clock precision (100s of ppm) of the NIC oscillators on
         | either side of a network connection gives a physical upper
         | limit on the Ethernet packet size. The space between the
         | packets lets the slower side "catch up". See
         | https://en.wikipedia.org/wiki/Interpacket_gap for more info.
         | 
         | We could use more precise oscillators to have longer packets
         | but at a more expensive cost.
        
           | monocasa wrote:
           | You don't need that as much on modern protocols. The point of
           | 8b/10b or 64b/66b is that it guarantees enough edges to allow
           | receivers to be self clocking from the incoming bits being
           | more or less thrown directly into a PLL.
        
             | namibj wrote:
             | That's a separate concern.
             | 
             | The previously mentioned issue is that to never buffer
             | packets in a reclocking repeater on a link, you _need_ the
             | incoming packet rate to never be higher than the rate at
             | which you can send them back out, or else you'd fill
             | up/buffer.
             | 
             | If your repeaters are actually switches, this manifests as
             | whether you occasionally drop packets on a full link with
             | uncongested switching fabric. Think two switches with a 10G
             | port and 8 1G ports each used to compress 8 long cables
             | into one (say, via vlan-tagging based on which of the 8
             | ports).
        
         | api wrote:
         | MTUs are one of the eternal gremlins of networking, and any
         | choice of MTU will almost certainly be either too large for the
         | present day or too small for the future. 1500 was chosen back
         | when computers ran at dozens of _megahertz_ and it was actually
         | kind of large at the time.
         | 
         | Changing the MTU is awful because parameters like MTU get baked
         | into hardware in the form of buffer sizes limited by actual RAM
         | limits. Like everything else on a network once the network is
         | deployed changing it is very hard because you have to change
         | everything along the path. Networks are limited by the lowest
         | common denominator.
         | 
         | This kind of thing is one of the downsides of packet switching
         | networks like IP. The OSI folks envisioned a network that
         | presented a higher level interface where you'd open channels or
         | send messages and the underlying network would handle _all_ the
         | details for you. This would be more like the classic analog
         | phone network where you make a phone call and a channel is
         | opened and all details are invisible.
         | 
         | It's very loosely analogous to CISC vs RISC where the OSI
         | approach is more akin to CISC. In networking RISC won out for
         | numerous reasons, but its simplicity causes a lot of deep
         | platform details to leak into upper application layers. High-
         | level applications should arguably not even have to think about
         | things like MTU, but they do.
         | 
         | When higher level applications have to think about things like
         | NAT and stateful firewall traversal, IPv4 vs IPv6, port
         | remapping, etc. is where it gets _very_ ugly.
         | 
         | The downside of the OSI approach is that innovation would
         | require the cooperation of telecoms. Every type of connection,
         | etc., would be a product offered by the underlying network. It
         | would also give telecoms a ton of power to nickel and dime,
         | censor, conduct surveillance, etc. and would make anonymity and
         | privacy very hard. It would be a much more managed Internet as
         | opposed to the packet switching Wild West that we got.
        
         | leoc wrote:
         | (Intentional) jumbo frames at layer 2 and expanded MTUs at
         | layer 3 are certainly available (as you may know). In fact it
         | seems (I am, it should be obvious, not an expert) that using
         | jumbo frames is more or less the common practice by now. There
         | does in fact seem to have been some standards drama about this,
         | too: I can't find it now, but IIRC in the '00s someone's
         | proposal to extend the header protocols to allow the header to
         | indicate a frame size of over 1500 bytes was rejected, and
         | nothing seems to have been done since. At the moment it seems
         | that the best way to indicate max. Ethernet frame sizes of over
         | 1500 is an optional field in LLDP(!)
         | https://www.juniper.net/documentation/us/en/software/junos/u...
         | and the fall-back is sending successively larger pings and
         | seeing when the network breaks(!)
         | https://docs.oracle.com/cd/E36784_01/html/E36815/gmzds.html .
        
           | scifi wrote:
           | I'm not certain what my point is, but I wanted to mention
           | that jumbo frames don't work over the Internet. More of a LAN
           | thing.
        
             | toast0 wrote:
             | My local internet exchange has a 1500 vlan and a 9000 vlan.
             | My understanding is there are many fewer peers on the 9000
             | vlan, but it's not zero.
             | 
             | If you want to use jumbo packets on the internet at large,
             | you need to have working path MTU detection, which
             | realistically means at least probing, but you really should
             | have that at 1500 too, because there's still plenty of
             | broken networks out there. My guess is you won't have many
             | connections with an effective mtu above 1500, but you might
             | have some.
        
         | toast0 wrote:
         | I mean, larger packets (and working path MTU detection) could
         | be useful, but with large (1500) byte packets and reasonable
         | hardware, I never had trouble pushing 10G from the network
         | side. Add TLS and some other processing, and older hardware
         | wouldn't keep up, but not because of packetization. Small
         | packets is also a different story.
         | 
         | All my hardware at the time was xeon 2690, v1-4. Nics were
         | Intel x520/x540 or similar (whatever SuperMicro was using back
         | then). IIRC, v1 could do 10G easy without TLS, 8-9G with TLS,
         | v3 improved AES acceleration and we could push 2x10G. When I
         | turned off NIC packetization acceleration, I didn't notice much
         | change in CPU or throughput, but if packetization was a
         | bottleneck it should have been significant.
         | 
         | At home, with similar age desktop processors with @ dual core
         | pentium g3470 (haswell, same gen as a 2690v3), I can't quite
         | hit 10g in iperf, but it's closeish, another two cores would
         | probably do it.
         | 
         | In some cases, you can get some big gains in efficiency by
         | lining up the user space cpu with the kernel cpu that handles
         | the rx/tx queues that the NIC hashes the connection to, though.
        
         | immibis wrote:
         | Maximum packet size is already configurable on most NICs. 9000
         | is a typical non-default limit. If you increase the limit, you
         | must do so on all devices on the network.
         | https://en.wikipedia.org/wiki/Jumbo_frame
        
         | throwup238 wrote:
         | _> I wish we could have another and bump the packet size._
         | 
         | That's why I'm in full support of a world ending apocalypse
         | that allows society to restart from scratch. We've made so many
         | bad decisions this time around, with packet sizes being some of
         | the worst.
        
           | amelius wrote:
           | Maybe we can then also redefine pi as 2*pi, while we're at
           | it.
        
       ___________________________________________________________________
       (page generated 2024-08-22 17:00 UTC)