[HN Gopher] Reverse engineering Dell iDRAC to get rid of GPU thr...
___________________________________________________________________
Reverse engineering Dell iDRAC to get rid of GPU throttling
Author : f_devd
Score : 160 points
Date : 2023-05-10 17:29 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| amir734jj wrote:
| I'm dealing with something similar. I wanted to use Redfish to
| clear out hard drives but storage is not standardize across
| different vendors. Dell has a secure erase. HPE gen10 has smart
| storage and anything older doesn't have any useful functionality
| in their Redfish API. What a mess. So I need to use PXE booting
| and probably winpe to do this.
| Terminal135 wrote:
| The repo claims that the servers themselves throttle the GPUs,
| but isn't it the GPUs themselves that can throttle or maybe the
| OS? Neither of those are controlled by the server (hopefully) so
| is there a different system at play here?
| csdvrx wrote:
| No, that's controlled by the server: try lspci -vv on any linux
| system. Look at the link speed and width, like LnkSta: Speed
| 8GT/s, Width x2: x2 means 2 lanes.
|
| Try:
|
| `sudo lspci -vv | grep -P
| "[0-9a-f]{2}:[0-9a-f]{2}\\.[0-9a-f]|downgrad" |grep -B1
| downgrad`
|
| Besides the speed, you can have another problem with lanes
| limitations.
|
| For example, AMD CPUs have a lot of lanes, but unless you have
| an EPYC, most of them are not exposed, so the PCH tries to
| spread its meager set among the devices connected to your PCI
| bus, and if you have a x16 GPU, but also a WIFI adapter, a WWAN
| card and a few identical NVMe, you may find only of the NVMe
| benchmarks at the throughput you expect.
| ilyt wrote:
| > For example, AMD CPUs have a lot of lanes, but unless you
| have an EPYC, most of them are not exposed, so the PCH tries
| to spread its meager set among the devices connected to your
| PCI bus, and if you have a x16 GPU, but also a WIFI adapter,
| a WWAN card and a few identical NVMe, you may find only of
| the NVMe benchmarks at the throughput you expect.
|
| example from my X670E board
|
| * first NVME = 4x gen 5
|
| * second= 4x gen 4
|
| * 2 USB ports connected to CPU (10/5 Gbit)
|
| and EVERYTHING ELSE goes thru 4x gen 4 PCIE bus, including
| additional 3x nvme, 7 SATA ports, a bunch of USBs, few 1x
| PCIE ports, network, etc.
| toast0 wrote:
| > For example, AMD CPUs have a lot of lanes, but unless you
| have an EPYC, most of them are not exposed, so the PCH tries
| to spread its meager set among the devices connected to your
| PCI bus, and if you have a x16 GPU, but also a WIFI adapter,
| a WWAN card and a few identical NVMe, you may find only of
| the NVMe benchmarks at the throughput you expect.
|
| Most AM4 boards put an x16 slot direct to the CPU, and an x4
| direct linked NVMe slot. That's 20 of the 24 lanes; the other
| 4 lanes go to the chipset, which all the rest of the
| peripherals are behind. (There's some USB and other I/O from
| the cpu, too). AM5 CPUs added another 4 lanes, which is
| usually a second cpu x4 slot.
|
| Early AM4 boards might not have a cpu x4 NVMe slot, and those
| 4 cpu lanes might not be exposed, and the a300/x300
| chipsetless boards don't tend to expose everything, but where
| else are you seeing AMD boards where all the CPU lanes aren't
| exposed?
| ilyt wrote:
| > Most AM4 boards put an x16 slot direct to the CPU, and an
| x4 direct linked NVMe slot. That's 20 of the 24 lanes; the
| other 4 lanes go to the chipset, which all the rest of the
| peripherals are behind. (There's some USB and other I/O
| from the cpu, too). AM5 CPUs added another 4 lanes, which
| is usually a second cpu x4 slot.
|
| Mine just go to second NVMe weirdly enough.
| toast0 wrote:
| I meant to say the additional x4 is usually a second cpu
| x4 [NVMe] slot. Not a pci-e x4 slot.
| csdvrx wrote:
| > Early AM4 boards might not have a cpu x4 NVMe slot, and
| those 4 cpu lanes might not be exposed, and the a300/x300
| chipsetless boards don't tend to expose everything
|
| I'm sorry, I oversimplified, and said "most of them" while
| I should have said "not all of them" as 20/24 is more
| correct for B550 chipsets (the most common for AM4) instead
| of trying to generalize.
|
| Your explanation is more correct that mine.
|
| For anyone who might want extra details about the number of
| lanes per CPU, https://pcguide101.com/motherboard/how-many-
| pcie-lanes-does-... is a good read that shows the
| difference for APUs.
| toast0 wrote:
| I'm still not quite sure what you're trying to say?
|
| Lanes behind the chipset are multiplexed, and you can't
| get more than x4 throughput through the chipset (and the
| link speed between the cpu and the chipset varies
| depending on the chipset and cpu). But that's not a
| problem of the CPU lanes not being exposed, it's a
| problem of "not enough lanes" or more likely, lanes not
| arranged how you'd like. On AM4, if your GPU uses x16,
| and one NVMe uses x4, then everything else is going to be
| squeezed through the chipset. On AM5, you usually get two
| x4 NVMe slots, but again everything else is squeezed
| through the chipset; x670 is particularly constrained
| because it just puts a second chipset downstream of the
| first chipset, so you're just adding more stuff to
| squeeze through the same x4 link to the CPU.
|
| Personally, I found that link to be more confusing than
| just reading through the descriptions on wikipedia for a
| particular Zen version. For example
| https://en.wikipedia.org/wiki/Zen_3 ... just text search
| in the page for "lanes" and it explains for all the
| flavors of chips how many lanes, and how many go to the
| chipset. Similarly the page for AMD chipsets is pretty
| succinct https://en.wikipedia.org/wiki/List_of_AMD_chipse
| ts#AM5_chips...
| formerly_proven wrote:
| There's a reason why so many motherboard makers avoid
| putting a block diagram in their manuals and go for
| paragraphs of legalese instead, and laziness is only half
| of it.
| formerly_proven wrote:
| PCIe devices can only draw a limited wattage until the host
| clears them for higher power. There is also a separate power
| brake mechanism (optional part of PCIe) mentioned in the
| article, which has been proposed by nVidia for PCIe so it seems
| likely their GPUs support it.
| f_devd wrote:
| I can actually answer this (as it is how I stumbled on to the
| repo), it's through a signal from the motherboard called Pwrbrk
| (Power Brake), Pin 30 on PCIe. It tells the PCIe device to
| maintain a low-power mode, in the case of Nvidia GPUs it's
| about 50W (300Mhz out of 2100Mhz in my case).
|
| You can check if it's active using `nvidia-smi -q | grep
| Slowdown` as shown in the post
| somat wrote:
| BMC's in general leave me uneasy.
|
| I like the idea, it is a small computer that is used to monitor
| and control your big computer. But hate the implementation. Why
| are they all super secret special firmware blobs? Why can't I
| just install my linux of choice and run the manufacturers
| software? This would still suck but not as bad as the full stack
| nonsense they foist on you at this point.
| WaitWaitWha wrote:
| Was responsible to poke at BMC security in data centers.
|
| BMCs lack the security fundamentals and often behave like cheap
| IoT knock-offs. They often use outdated kernels, libraries, and
| security mechanism.
| jdwithit wrote:
| Yeah they are utter garbage. For _years_ you had to use Java
| 6 with absolutely every modern security measure turned off in
| both the JVM runtime itself and your browser to access Dell
| DRACs. Accept expired certs, run unsigned code, I 'm sure
| this is all fine ...
|
| I mostly work in the cloud now but when I last had to manage
| a bunch of physical machines we had a physically separate
| network accessed via its own VPN to get onto the BMCs.
| Because yeah, the security situation was a joke.
| hsbauauvhabzb wrote:
| I found that if you leave your bmc unplugged on a super
| micro, it'll conveniently bridge it to whatever other
| Ethernet is plugged in, meaning an outage of your
| management network may roll over to another network
| unintentionally.
|
| Id put money on there being preauth vulnerabilities in
| those things, judging by the engineering quality.
| kabdib wrote:
| Ran a fleet of servers with terrible BMCs. We kept those on a
| well-sealed-off private network.
|
| Woe betide you if you run into one of the BMC implementations
| that shares a host network interface; no separate cable!
| These things are terrible from a security standpoint.
| septune wrote:
| i hope you disable OS passtrough because it could be gore
| [deleted]
| scifi wrote:
| To add to this, manufacturers (HPE) are now requiring an
| expensive annual fee in order for customers to use the
| hardware.
| yjftsjthsd-h wrote:
| Actually, why aren't they literally normal little computers?
| Like, fully open, bring your own OS computers. All it needs is
| some peripherals - Ethernet, its own host USB, gadget mode USB
| to present mouse/kb/storage to the main computer, video capture
| card, some GPIO to control power - but there's nothing all that
| special there; then you just install Debian or w/e and control
| the perfectly standard interfaces at will.
| wmf wrote:
| The interface between the BMC and motherboard is unique to
| each motherboard, especially for "added value" features that
| some servers have. DC-SCM is working on standardizing this
| but I don't know how interoperable it will be.
| [deleted]
| donalhunt wrote:
| Was responsible for trying to improve the management and
| operation of a large fleet of BMCs for a while. Plenty of bugs
| and pace of releases is slow. :(
|
| Definitely an area where a more open ecosystem would improve
| the pace of innovation.
| chasil wrote:
| Sometimes the bugs end up at a security conference.
|
| https://airbus-seclab.github.io/ilo/BHUSA2021-Slides-
| hpe_ilo...
| hinkley wrote:
| We need a Linux for BMCs. Oxide is working on one, but I'd
| like to see a contender fielded from the seL4 community,
| along with some other folks. For example, why doesn't Wind
| River have one already?
| NexRebular wrote:
| Why linux? Why not *BSD?
| gaius_baltar wrote:
| > Why linux? Why not *BSD?
|
| GPL can force manufacturers to cooperate with users. Of
| course, they can still use closed source binary modules
| and userland programs ...
| NexRebular wrote:
| GPL can also turn manufacturers away. I would rather have
| variation in the possible BMC operating systems instead
| of sticking linux everywhere and contributing to a
| monoculture.
| ilyt wrote:
| ...which you wouldn't get with BSD as they'd just close
| it down and ship you binary, as they would have no
| obligation or reason to.
|
| The BSD "freedom" is not for the user of the software,
| it's for corporation to take.
| NexRebular wrote:
| And how would this affect the original open BSD-based BMC
| firmware which would be available? If the corporations
| want to maintain their own fork, it's on them.
|
| The original BSD release will stay open and free for
| everyone to utilize.
| IntelMiner wrote:
| How can you trust that a closed source OpenBSD fork is
| secure when there's no way to audit the quality ( _or
| lack thereof_ ) in the firmware the vendor gives you?
|
| If it's GPL you can at least interrogate the code release
| and make an informed decision
| wmf wrote:
| Software diversity is a luxury customers won't pay for.
| taneq wrote:
| That sure sounds appealing to manufacturers.
| wmf wrote:
| OpenBMC is the leader at this point.
| theideaofcoffee wrote:
| They're special firmware blobs because generally the OEMs
| aren't building their BMCs from scratch as they might with
| their main boards and other components. They're generally
| getting the bmc SoC from the likes of Aspeed and others who are
| the ones keeping them closed up. I've tried to get the magic
| binaries and source for various projects but have given up
| because there are so many layers of NDAs and sales gatekeepers.
| I'm not entirely sure who makes the dell bmcs but I know
| supermicro bundles Aspeed (at least they did with older
| generations of their main boards.)
|
| I agree with you that you should be able to run whatever since
| in the end it's just another computer, but the manufacturers
| believe otherwise since there's "valuable IP" or whatever
| nonsense (insert rollseyes emoji here).
|
| There are open specs like redfish but still doesn't get to the
| heart of the matter.
| Y_Y wrote:
| It may no be "enterprise" enough for a given employer, by it's
| not hard to replicate most of this functionality with cheap
| (and open) hardware. For example I had a case that called for
| BMC that I resolved with a spare raspi3b, a very cheap capture
| device and a "smart plug". Total cost of materials was about 30
| euro and (for me) it wasn't any harder to operate that an
| idrac.
| ilyt wrote:
| It would also be sooo much easier to automate everything if it
| just ran slightly custom Debian install instead of... whatever
| the fuck abomination manufacturer made.
| [deleted]
| bubblethink wrote:
| Dell is peak asshole design. They also blast fans as full speed
| if you install GPUs that you don't buy from them. Fuck them.
| ryanjshaw wrote:
| They also stop producing laptop batteries after a few years
| while refusing to let the laptop charge 3rd party battery
| replacements, significantly limiting the useful life of their
| laptops.
| ComputerGuru wrote:
| You can actually call a small business rep and get them to
| order it for you. They still produce them or else have them
| in stock, just don't sell them online.
| flykespice wrote:
| Isn't that illegal? Hijacking your customer pc if they install
| something that isn't from them?
| Maxburn wrote:
| Nominally done for your protection. Lowering power (clock)
| and heat load (fast fan) for unapproved gear prevents things
| from going dead and getting people REALLY mad and likely
| reduces warranty claims.
| causi wrote:
| Don't forget how Dell included/includes DRM in their laptop
| chargers to prevent customers from buying cheap aftermarket
| replacements. Of course the wire for the DRM functionality is
| as thin as possible and is always the first thing to break.
| aeadio wrote:
| They don't prevent you from using aftermarket chargers. They
| just display a warning, which can be disabled in the BIOS.
| martijnvds wrote:
| They also set CPU speed to minimum.
| josephh wrote:
| In case you're referring to their servers (from
| https://dl.dell.com/manuals/common/poweredge_pcie_cooling.pd...
| ):
|
| > The automatic system cooling response for third-party cards
| provisions airflow based on common industry PCIe requirements,
| regulating the inlet air to the card to a maximum of 55degC.
| The algorithm also approximates airflow in linear foot per
| minute (LFM) for the card based on card power delivery
| expectations for the slot (not actual card power consumption)
| and sets fan speeds to meet that LFM expectation. Since the
| airflow delivery is based on limited information from the
| third-party card, it is possible that this estimated airflow
| delivery may result in overcooling or undercooling of the card.
| Therefore, Dell EMC provides airflow customization for third-
| party PCIe adapters installed in PowerEdge platforms.
|
| You need to use their RACADM interface to update the minimum
| LFM for your card.
| somehnguy wrote:
| Restrictive nonsense seems common in the server space
| unfortunately. HPe do similar things. IIRC they disabled
| certain features if you used non-HPe 'approved' hard drives.
| formerly_proven wrote:
| Are there even any good server vendors? Dell, HPE and Lenovo
| do their lock-in shit. Supermicro's BMC is pretty bad.
| xFusion is totally-not-Huawei-I-pwomise. There's a few more
| that come to mind but all of them are niches like HPC and
| don't really do sales on a small scale.
| bg46z wrote:
| Dell also does this with their EMC storage arrays, it's meant
| to push you towards their pro services. You are supposed to
| tell the array to order drives for you from pro services and
| someone from some nameless MSP contracted with dell installs
| it for you at a 10x markup.
| csdvrx wrote:
| Other manufacturers do worse and prevent boot if your PCI ids
| aren't on a positive list.
|
| This is for example present on thinkpads, and while you could
| patch the bios before, Intel bootguard now prevents you do that
| "for your own protection" :)
|
| I hope the MSI leak contains actual bootguard keys for intel
| 11th gen+, and can be used to allow "unauthorized" PCI modules
| on modern thinkpads!
| Avery3R wrote:
| bootguard keys are oem specific
| csdvrx wrote:
| The MSI leak comments like https://sizeof.cat/post/leak-
| intel-private-keys-msi-firmware... mentioned the bootguard
| keys maybe have been common to other manufacturers: _" It
| is assumed that the keys for downloading the guard are not
| limited to compromising MSI products and can also be used
| to attack equipment from other manufacturers using Intel's
| 11th, 12th and 13th generation processors (for example,
| Intel, Lenovo and Supermicro boards are mentioned)"_
| mjg59 wrote:
| Those are board that MSI OEMed for other vendors.
| csdvrx wrote:
| So servers and desktops only, which crushes my hope of
| getting rid of bootguard on my laptop.
| undersuit wrote:
| I'd love to buy a AMD Ryzen 3 Pro 5350g, but I don't want to
| deal with the stupid locked CPUs floating around from Lenovo
| pulls.
| wokkel wrote:
| What worked for some older think machines (haven't tried it
| on my thinkpad though) is to update the bios (can be to the
| same version) and change the serial nr.to all zeroes (the
| update script asks if you want that). That got rid of the
| wifi whitelist i encountered.
| csdvrx wrote:
| Very interesting!
|
| Could you please explain which serial? Can you do dmidecode
| and tell me which Handle/ UUID is all zeroes?
|
| Even if it may not directly apply to current thinkpads, it
| implies the UEFI module might have other conditionals
| before going on checking the positive list - something that
| should be easy to check by reversing the
| LenovoWmaPolicyDxe.sct PE32.
| burnte wrote:
| HP is way worse than Dell. Fans at full speed I can handle.
| Servers permanently throwing errors because a part isn't HP
| branded, that's peak asshole design. So is refusing to measure
| status of drives if I use a third party drive sled (which is
| some folded metal, and some LEDs on a flex PCB) AND throwing
| errors about them.
| BizarroLand wrote:
| Lenovo soft bricks their workstations if they detect non-
| branded cards in them.
|
| No upgrading my wifi, thats a nono!
| throwway120385 wrote:
| The wifi thing is slightly understandable because FCC
| requires you to limit your radiated emissions, and when you
| do the certification you have to control the entire
| configuration to pass the testing, which means that your
| radio is paired with your antenna cable and antenna in the
| body of the device. Allowing people to replace the radio
| without a paired antenna cable and antenna could cause
| radiated emissions to fall outside of the spectrum allowed
| by the FCC. It's dumb in practice but at least somewhat
| understandable in principle.
| pxmpxm wrote:
| Hmm I've got an HP Z6 that doesn't seem to care?
|
| Are you sure this isn't a 1u rack will fry itself if you put
| a spaceheater type of thing inside of it?
| burnte wrote:
| 5u system, ML350Gen9. I ignore the errors, but of course
| that means that if a real error pops up there, I won't
| know. It's a lower-urgency server of my own so it's ok, but
| annoying as well in production.
|
| I see the Z6 is a workstation unit, they're going to be
| more flexible there.
| JLCarveth wrote:
| > E.g. it is well known that adding a third-party PCIe NIC
| makes fans run at the maximum speed.
|
| From the article
| otherjason wrote:
| I observed this behavior in a Dell system about a decade ago,
| but based on experience over the last 5 years or so with
| PowerEdge servers, installing a third-party GPU no longer
| triggers the (extremely loud) maximum fan-speed response.
| BizarroLand wrote:
| Even if that's the case the damage is done.
|
| I still tell people about capacitor plaque and how we should
| have class actioned Dell out of existence over it
___________________________________________________________________
(page generated 2023-05-10 23:00 UTC)