[HN Gopher] Intel I219-LM Running at ~60% of Maximum Speed Due t...
       ___________________________________________________________________
        
       Intel I219-LM Running at ~60% of Maximum Speed Due to Linux Driver
       Bug
        
       Author : zdw
       Score  : 59 points
       Date   : 2023-04-23 15:50 UTC (7 hours ago)
        
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
        
       | LinuxBender wrote:
       | It's nice to see them giving this some attention. I hope they
       | also look into the I225-V at the myriad of different negotiation
       | bugs that change with each firmware revision on 2.5gb networks
       | _that also vary with different offloading features enabled
       | /disabled including EEE_. Each 2.5gb switch also exhibits
       | different behavior which is probably not Intel's fault.
        
         | rasz wrote:
         | > is probably not Intel's fault
         | 
         | consensus is Intel went down the shitter quality wise, just one
         | example https://www.youtube.com/watch?v=DXNyHFOWx_k
        
           | LinuxBender wrote:
           | Could be. Most of the folks I interacted with in the past
           | have retired. I think I am just going to revert back to full
           | tower PC's for anything I do, even firewalls and appliances
           | so that I am not locked into what may be abandonware. I was
           | hoping to use less power and take up less space with the
           | mini-PC's but in hind-sight that may have been a mistake.
        
         | mirker wrote:
         | When people say you need the third hardware revision to get
         | functional performance, and it still doesn't work, you should
         | conclude it's poorly designed. The fact that these chips were
         | being sold with old hardware revisions also seems anti-
         | consumer.
        
           | rubatuga wrote:
           | Agreed, Intel gave up and released the I226 because their
           | I225 brand was so tarnished. I removed the I225 from my
           | server after I noticed the driver crashing during heavy loads
           | (Ubuntu 22.04). For USB I've been having success with the
           | rtl8156bg from juplink.
        
         | snvzz wrote:
         | I got tired of fighting that NIC, and got a Realtek 2.5g one.
         | 
         | All my problems gone, life is good.
        
           | LinuxBender wrote:
           | I would totally do that but I have a few mini-PC's that have
           | the I225-V and I can't swap them out. I've used
           | USB3.2->Ethernet dongles as work arounds until I could find a
           | switch that was happy with it. Same with the latest Protectli
           | firewalls, they also have the I225-V. For those I have to use
           | ethtool to force a specific speed which is not ideal. I'm
           | thinking mini-PC's are maybe a mistake in this generation of
           | NICs.
        
           | userbinator wrote:
           | I've had good experience with Realtek NICs too, not 2.5g yet
           | but their 10/100/1000 models. I suspect that's because they
           | use a far simpler design with not much in the way of fancy
           | features[1]. They also tend to benchmark slower with higher
           | CPU usage, but I'll take slightly slower but stable over fast
           | but intermittent connectivity.
           | 
           | [1] Ironically, in this case the Intel has to have a fancy
           | feature, intended for increasing performance, disabled in
           | order to increase performance.
        
             | cduzz wrote:
             | This is the classic Sutherland "wheel of reincarnation"
             | cycle -- a function (commonly graphics acceleration or
             | network connectivity) will be provided by the CPU, which is
             | found to be a bottleneck so the function is offloaded to a
             | dedicated device, which then becomes a bottleneck as the
             | CPU receives more dedicated focus on performance
             | improvement than some bespoke, single-purpose offload
             | engine.
        
             | toast0 wrote:
             | I've heard good things about Realtek's 2.5G nics, and bad
             | things about Intel's; I've seen speculation that underlying
             | cause is cross compatability between the two 2.5g specs
             | that were merged into the standard and that most 2.5g
             | switches use chips from companies that were behind the spec
             | Intel was not behind.
             | 
             | For 1G realteks, I've had ok luck with them, but I do see
             | some issues. I recall having some problem with them while
             | running the Linux tree drivers, but I've since switched to
             | all FreeBSD. With my current realtek NICs and FreeBSD 13.1,
             | I have the choice of the kernel driver where sometimes the
             | NIC will stop processing packets, and the NIC acknowledges
             | reset, but doesn't actually immediately reset and sometimes
             | processes old packets after resetting, resulting in wild
             | writes and bad behavior. There's a vendor driver, which
             | doesn't seem to get into that bad state, but it does enable
             | ethernet PAUSE frames which are an abomination. The vendor
             | driver is full of undocumented magic values and there's no
             | public documentation on the NICs at all, so figuring out
             | what to frob to make the NIC not get stuck, but not send
             | out PAUSE frames would be an exercise in frustration, that
             | I'm not willing to do.
             | 
             | Either way, the interrupt design is deficient: there's a
             | shared interrupt for rx, tx and administration, and the
             | status register doesn't really work right either --- it's
             | possible for the host and device to disagree about what
             | irqs were acknowleged and then things will get stuck (that
             | doesn't seem to be the FreeBSD driver issue though; you can
             | work around the stuck status communications by just
             | assuming something probably happened aftet a few seconds,
             | or checking for descriptor progess on rx and tx on any
             | interrupt, etc). Having only a single interrupt means you
             | can't meaningfully process incomming packets and finished
             | outgoing descriptors in parallel which makes it hard to get
             | full throughput in both directions simultaneously.
             | 
             | Anyway --- if they work for you, great. I'm going to avoid
             | them where practical, and be careful with them elsewhere.
        
           | jeffbee wrote:
           | The Realtek NIC embedded in my Asus motherboard hits maybe
           | 600mbps with a tailwind. It only has 1 receive queue and
           | doesn't support interrupt coalescing. Junk.
        
       | andix wrote:
       | One would expect Intel to do periodic regression tests on Linux
       | kernel releases on their hardware. A lot of Intel customers use
       | Linux, and they may switch to other vendors if Intel hardware has
       | bad Linux support.
        
         | arghwhat wrote:
         | They do, and they provide some of the most extensive regression
         | testing capabilities provides to kernel development.
         | 
         | But to be fair, this is an old NIC with a hardware bug that
         | requires an odd workaround disabling an otherwise very useful
         | performance feature. Missing that edge case is understandable.
        
       | jmclnx wrote:
       | I wonder if this is an issue with the BSDs too. Sometimes they
       | peek at Linux for these things.
        
         | wmf wrote:
         | This is a hardware bug so I would expect it to affect all OSes.
         | Turning off TSO is counterintuitive so drivers tend to turn it
         | on by default.
        
         | AdrianB1 wrote:
         | Probably not, BSD network drivers were a bit more advanced than
         | Linux ones as some companies (like Netflix) invested time and
         | effort in optimizations and monitoring. I am not saying BDS is
         | better, just a few things are quite different vs Linux.
        
           | betaby wrote:
           | > a bit more advanced than Linux ones
           | 
           | and then we look at the wifi stack in BSD(s)...
        
             | AdrianB1 wrote:
             | I don't think that companies like Netflix are interested in
             | WiFi to serve massive amounts of video :)
        
         | KyleSanderson wrote:
         | It isn't. It's a pure driver bug with all of these. There's a
         | pending patch to clear the MSI registers on disconnect that I'm
         | planning on driving to testing tomorrow. I hounded them for 4
         | months by sending full stacks with symbols and they don't have
         | any hardware in the lab (which is very common now,
         | unfortunately)...
         | 
         | If you want to recover, set a ifdown script to `rmmod igc &&
         | modprobe igc` and you will never panic or hang, just have a
         | longer interface bounce time. I'm running this on a 6 nic
         | system which is completely egregious, but it hasn't crashed /
         | hung / panic'd in the field since doing that.
        
       | fulafel wrote:
       | Interesting that in a era of 100+ Gbit Ethernet we seem to need
       | the offload features working just right to get even 1 Gbit.
        
         | arghwhat wrote:
         | Well in this case the bug is that the feature doesn't work and
         | needs to be disabled.
         | 
         | But generally speaking you want all the offload you can get.
         | For 1G, it allows pushing power efficiency much further. For
         | 100G, it allows you to spend time on something other than
         | parsing and writing packets.
        
       | legulere wrote:
       | That makes me wonder how much waste there is, just because there
       | is something broken without anyone noticing. Recently a customer
       | had two nearly identical systems behaving completely different in
       | regards to performance. It turned out it simply was some bios
       | setting.
        
         | magicalhippo wrote:
         | > That makes me wonder how much waste there is, just because
         | there is something broken without anyone noticing.
         | 
         | I'd say a lot. Not seldom when I go over to a colleague I
         | notice their laptop is breathing hard. "Oh yeah, so annoying!
         | It's been doing that for a while now". Check Task Manager and
         | sure enough, some process is stuck sucking 100% of a core... 5
         | CPU days worth.
         | 
         | Even I find it difficult to spot this, with my 16 cores. I got
         | an efficient cooler, so no fan spins up noticeably. Randomly
         | check Task Manager, oh explorer.exe is pegging a core, and has
         | been using 15 days worth of CPU... Gee thanks!
        
       | malkia wrote:
       | Not a network guy, but couldn't this be detected (for any other
       | card too) by running some mandatory tests?
        
       | userbinator wrote:
       | _you will want to look forward to upgrading your Linux kernel
       | build soon... [...] due to a regression introduced back in 2020._
       | 
       | If anything, that message seems to be implying to _not_ upgrade
       | if everything is already working well.
       | 
       | A bit tangential, but ever since they came out with the '217 I've
       | thought this is one of the worst-named products and could never
       | remember what the actual letter is. Here it's an uppercase I, but
       | even Intel seems to think it's a lowercase L (for LAN?)
       | sometimes:
       | 
       | https://www.intel.com/content/www/us/en/support/articles/000...
        
       ___________________________________________________________________
       (page generated 2023-04-23 23:00 UTC)