[HN Gopher] Intel I219-LM Running at ~60% of Maximum Speed Due t...
___________________________________________________________________
Intel I219-LM Running at ~60% of Maximum Speed Due to Linux Driver
Bug
Author : zdw
Score : 59 points
Date : 2023-04-23 15:50 UTC (7 hours ago)
(HTM) web link (www.phoronix.com)
(TXT) w3m dump (www.phoronix.com)
| LinuxBender wrote:
| It's nice to see them giving this some attention. I hope they
| also look into the I225-V at the myriad of different negotiation
| bugs that change with each firmware revision on 2.5gb networks
| _that also vary with different offloading features enabled
| /disabled including EEE_. Each 2.5gb switch also exhibits
| different behavior which is probably not Intel's fault.
| rasz wrote:
| > is probably not Intel's fault
|
| consensus is Intel went down the shitter quality wise, just one
| example https://www.youtube.com/watch?v=DXNyHFOWx_k
| LinuxBender wrote:
| Could be. Most of the folks I interacted with in the past
| have retired. I think I am just going to revert back to full
| tower PC's for anything I do, even firewalls and appliances
| so that I am not locked into what may be abandonware. I was
| hoping to use less power and take up less space with the
| mini-PC's but in hind-sight that may have been a mistake.
| mirker wrote:
| When people say you need the third hardware revision to get
| functional performance, and it still doesn't work, you should
| conclude it's poorly designed. The fact that these chips were
| being sold with old hardware revisions also seems anti-
| consumer.
| rubatuga wrote:
| Agreed, Intel gave up and released the I226 because their
| I225 brand was so tarnished. I removed the I225 from my
| server after I noticed the driver crashing during heavy loads
| (Ubuntu 22.04). For USB I've been having success with the
| rtl8156bg from juplink.
| snvzz wrote:
| I got tired of fighting that NIC, and got a Realtek 2.5g one.
|
| All my problems gone, life is good.
| LinuxBender wrote:
| I would totally do that but I have a few mini-PC's that have
| the I225-V and I can't swap them out. I've used
| USB3.2->Ethernet dongles as work arounds until I could find a
| switch that was happy with it. Same with the latest Protectli
| firewalls, they also have the I225-V. For those I have to use
| ethtool to force a specific speed which is not ideal. I'm
| thinking mini-PC's are maybe a mistake in this generation of
| NICs.
| userbinator wrote:
| I've had good experience with Realtek NICs too, not 2.5g yet
| but their 10/100/1000 models. I suspect that's because they
| use a far simpler design with not much in the way of fancy
| features[1]. They also tend to benchmark slower with higher
| CPU usage, but I'll take slightly slower but stable over fast
| but intermittent connectivity.
|
| [1] Ironically, in this case the Intel has to have a fancy
| feature, intended for increasing performance, disabled in
| order to increase performance.
| cduzz wrote:
| This is the classic Sutherland "wheel of reincarnation"
| cycle -- a function (commonly graphics acceleration or
| network connectivity) will be provided by the CPU, which is
| found to be a bottleneck so the function is offloaded to a
| dedicated device, which then becomes a bottleneck as the
| CPU receives more dedicated focus on performance
| improvement than some bespoke, single-purpose offload
| engine.
| toast0 wrote:
| I've heard good things about Realtek's 2.5G nics, and bad
| things about Intel's; I've seen speculation that underlying
| cause is cross compatability between the two 2.5g specs
| that were merged into the standard and that most 2.5g
| switches use chips from companies that were behind the spec
| Intel was not behind.
|
| For 1G realteks, I've had ok luck with them, but I do see
| some issues. I recall having some problem with them while
| running the Linux tree drivers, but I've since switched to
| all FreeBSD. With my current realtek NICs and FreeBSD 13.1,
| I have the choice of the kernel driver where sometimes the
| NIC will stop processing packets, and the NIC acknowledges
| reset, but doesn't actually immediately reset and sometimes
| processes old packets after resetting, resulting in wild
| writes and bad behavior. There's a vendor driver, which
| doesn't seem to get into that bad state, but it does enable
| ethernet PAUSE frames which are an abomination. The vendor
| driver is full of undocumented magic values and there's no
| public documentation on the NICs at all, so figuring out
| what to frob to make the NIC not get stuck, but not send
| out PAUSE frames would be an exercise in frustration, that
| I'm not willing to do.
|
| Either way, the interrupt design is deficient: there's a
| shared interrupt for rx, tx and administration, and the
| status register doesn't really work right either --- it's
| possible for the host and device to disagree about what
| irqs were acknowleged and then things will get stuck (that
| doesn't seem to be the FreeBSD driver issue though; you can
| work around the stuck status communications by just
| assuming something probably happened aftet a few seconds,
| or checking for descriptor progess on rx and tx on any
| interrupt, etc). Having only a single interrupt means you
| can't meaningfully process incomming packets and finished
| outgoing descriptors in parallel which makes it hard to get
| full throughput in both directions simultaneously.
|
| Anyway --- if they work for you, great. I'm going to avoid
| them where practical, and be careful with them elsewhere.
| jeffbee wrote:
| The Realtek NIC embedded in my Asus motherboard hits maybe
| 600mbps with a tailwind. It only has 1 receive queue and
| doesn't support interrupt coalescing. Junk.
| andix wrote:
| One would expect Intel to do periodic regression tests on Linux
| kernel releases on their hardware. A lot of Intel customers use
| Linux, and they may switch to other vendors if Intel hardware has
| bad Linux support.
| arghwhat wrote:
| They do, and they provide some of the most extensive regression
| testing capabilities provides to kernel development.
|
| But to be fair, this is an old NIC with a hardware bug that
| requires an odd workaround disabling an otherwise very useful
| performance feature. Missing that edge case is understandable.
| jmclnx wrote:
| I wonder if this is an issue with the BSDs too. Sometimes they
| peek at Linux for these things.
| wmf wrote:
| This is a hardware bug so I would expect it to affect all OSes.
| Turning off TSO is counterintuitive so drivers tend to turn it
| on by default.
| AdrianB1 wrote:
| Probably not, BSD network drivers were a bit more advanced than
| Linux ones as some companies (like Netflix) invested time and
| effort in optimizations and monitoring. I am not saying BDS is
| better, just a few things are quite different vs Linux.
| betaby wrote:
| > a bit more advanced than Linux ones
|
| and then we look at the wifi stack in BSD(s)...
| AdrianB1 wrote:
| I don't think that companies like Netflix are interested in
| WiFi to serve massive amounts of video :)
| KyleSanderson wrote:
| It isn't. It's a pure driver bug with all of these. There's a
| pending patch to clear the MSI registers on disconnect that I'm
| planning on driving to testing tomorrow. I hounded them for 4
| months by sending full stacks with symbols and they don't have
| any hardware in the lab (which is very common now,
| unfortunately)...
|
| If you want to recover, set a ifdown script to `rmmod igc &&
| modprobe igc` and you will never panic or hang, just have a
| longer interface bounce time. I'm running this on a 6 nic
| system which is completely egregious, but it hasn't crashed /
| hung / panic'd in the field since doing that.
| fulafel wrote:
| Interesting that in a era of 100+ Gbit Ethernet we seem to need
| the offload features working just right to get even 1 Gbit.
| arghwhat wrote:
| Well in this case the bug is that the feature doesn't work and
| needs to be disabled.
|
| But generally speaking you want all the offload you can get.
| For 1G, it allows pushing power efficiency much further. For
| 100G, it allows you to spend time on something other than
| parsing and writing packets.
| legulere wrote:
| That makes me wonder how much waste there is, just because there
| is something broken without anyone noticing. Recently a customer
| had two nearly identical systems behaving completely different in
| regards to performance. It turned out it simply was some bios
| setting.
| magicalhippo wrote:
| > That makes me wonder how much waste there is, just because
| there is something broken without anyone noticing.
|
| I'd say a lot. Not seldom when I go over to a colleague I
| notice their laptop is breathing hard. "Oh yeah, so annoying!
| It's been doing that for a while now". Check Task Manager and
| sure enough, some process is stuck sucking 100% of a core... 5
| CPU days worth.
|
| Even I find it difficult to spot this, with my 16 cores. I got
| an efficient cooler, so no fan spins up noticeably. Randomly
| check Task Manager, oh explorer.exe is pegging a core, and has
| been using 15 days worth of CPU... Gee thanks!
| malkia wrote:
| Not a network guy, but couldn't this be detected (for any other
| card too) by running some mandatory tests?
| userbinator wrote:
| _you will want to look forward to upgrading your Linux kernel
| build soon... [...] due to a regression introduced back in 2020._
|
| If anything, that message seems to be implying to _not_ upgrade
| if everything is already working well.
|
| A bit tangential, but ever since they came out with the '217 I've
| thought this is one of the worst-named products and could never
| remember what the actual letter is. Here it's an uppercase I, but
| even Intel seems to think it's a lowercase L (for LAN?)
| sometimes:
|
| https://www.intel.com/content/www/us/en/support/articles/000...
___________________________________________________________________
(page generated 2023-04-23 23:00 UTC)