[HN Gopher] Hunting a bug in the i40e Intel driver
___________________________________________________________________
Hunting a bug in the i40e Intel driver
Author : todsacerdoti
Score : 40 points
Date : 2021-07-29 21:31 UTC (1 hours ago)
(HTM) web link (blog.cri.epita.fr)
(TXT) w3m dump (blog.cri.epita.fr)
| rzezeski wrote:
| > During those tests, we noticed the machines were randomly
| freezing after some time, so we decided to upgrade the firmware
| of the network cards,
|
| Reminds me of the various i40e Tx freezes I debugged while at
| Joyent. Granted, this is the illumos driver, not Intel's, but
| basically there were issues with the programming guide that I had
| to figure out the hard way. The 700-series controllers have not
| been the easiest to work with.
|
| https://smartos.org/bugview/OS-7492 [Tx freeze when b_cont chain
| exceeds 8 descriptors]
|
| https://smartos.org/bugview/OS-7457 [i40e Tx freezes on zero
| descriptors]
| nn3 wrote:
| Just to save you a somewhat pointless read, they didn't really
| debug anything but just found the right forum to ask.
| AceJohnny2 wrote:
| Not entirely pointless, they did provide some useful tips (I
| wasn't aware of Bcc), but yeah the story ends with them not
| resolving the issue and just using a different version of the
| driver that doesn't have the bug.
| kbenson wrote:
| They debugged the system, not the driver. The way they did that
| was to identify and confirm it was the driver that caused the
| problem and in what circumstances, so they could report it to
| the people responsible for actually dealing with that.
|
| That's still a form of debugging. It's all a matter of
| perspective. If you had a hardware device that you were
| interacting directly with in an applicaiton, and you found that
| if you utilized in in a specific way it crashed, so you changed
| how the application used it so it wouldn't crash, that would be
| debugging the application, even if not really debugging the
| hardware.
| MauranKilom wrote:
| As a counterpoint, I found the journey interesting and learned
| a lot about various tools on the way. Only caveat is that they
| didn't end up pinpointing the error - understandable, given
| that they are not paid to fix bugs in Intel code, and Intel
| having fixed the bug already in a newer version anyway.
___________________________________________________________________
(page generated 2021-07-29 23:00 UTC)