[HN Gopher] Modern CPUs have a backstage cast
___________________________________________________________________
Modern CPUs have a backstage cast
Author : hlandau
Score : 122 points
Date : 2023-05-30 17:16 UTC (5 hours ago)
(HTM) web link (www.devever.net)
(TXT) w3m dump (www.devever.net)
| chasil wrote:
| "...this is interesting is because POWER9 is basically the first
| time the public got a real view of how sophisticated the
| backstage cast actually is of a modern server CPU."
|
| Not quite correct; the OpenSPARC T1 and T2 were publicly released
| and available by 2008.
|
| https://www.oracle.com/servers/technologies/opensparc.html
|
| "Large parts of this process are handled by vendor-supplied
| mystery firmware blobs, which may as well be boxes with "???"
| written in them.
|
| The maintainers of the me_cleaner script likely have the clearest
| view of what is known.
|
| https://github.com/corna/me_cleaner
| hlandau wrote:
| >Not quite correct; the OpenSPARC T1 and T2 were publicly
| released and available by 2008.
|
| Points for mentioning this! But things have come a long way
| since 2008. You can get Intel ME-less machines from the 2008
| era. Not sure if OpenSPARC T2 has any management cores.
|
| >The maintainers of the me_cleaner script likely have the
| clearest view of what is known.
|
| Yep, absolutely. Much of what we know is thanks to the efforts
| of researchers like these. See also the talks on finding the
| 'Red Unlock' mode of modern Intel CPUs.
| jakeogh wrote:
| Microcode access on Atom
| https://www.youtube.com/watch?v=5Pq1FmxS6H8
| kccqzy wrote:
| > It's responsible for initialising the chip and getting it out
| of bed enough to the point where at least one of the main cores
| can run using cache-as-RAM mode
|
| The somewhat surprising but true implication is that on boot, the
| CPU is initialized before the RAM is initialized. So there is a
| window of time during boot when the main core on the CPU is
| running instructions that cannot access the RAM. Even on
| register-starved x86 it is possible to write code without using
| RAM, but it certainly seems more convenient to me to treat the
| cache as RAM.
|
| Documentation for a special compiler that compiles to code that
| doesn't use RAM:
| https://github.com/wt/coreboot/blob/master/util/romcc/romcc....
| notacoward wrote:
| I got some exposure to this at SiCortex, where we had our own
| MIPS-based processors and so had to do many of these things in
| software. There was one ColdFire (embedded 68K) processor
| running mClinux per board, plus 27 of our own. This "Module
| Service Processor" would boot first, fetch a boot image from
| the one-per-system "System Service Processor" (pretty generic
| PC), load that _via JTAG_ into each node 's cache, then finally
| set each one loose to do things like memory registration and
| interconnect setup. This all set the stage for the actual Linux
| boot, which itself involved two stages with a switch_root in
| between. My very first assignment was to work on some of that
| MSP-to-node stuff, then later I had to dive into memory
| registration at least twice, even though both were pretty far
| from my real specialty. Small company, y'know.
|
| This kind of low-level work is significantly more complicated
| than even most kernel developers realize - hence the need for
| articles like OP. Ditto for anything on large (more than
| single-board) systems. The intersection of the two was,
| frankly, a bit exhausting. Just keeping track of all the moving
| parts and their respective states induced a cognitive load that
| made debugging other already-hard problems that much more
| difficult. My hat's off for anyone who has kept on doing that
| stuff longer than I did, or who has to do it in an environment
| where vendors are keeping so many secrets.
| lgg wrote:
| This stuff is certainly pretty rarified these days. I
| remember when the PPC970 came out people were shocked how
| difficult it was to bootstrap. IBM didn't really care as
| POWER4 (from which it was derived) was not a merchant chip
| and they had management processors (and very high margins) in
| all their machines to handle it. Apple was the launch partner
| and even back then had a lot of in house expertise doing this
| sort of work. Everyone else who tried to use it was in for
| some real pain and most of them gave up. The guys doing the
| eval boards with support from IBM literally posted this: http
| s://web.archive.org/web/20060715134515/http://www.970eva...
|
| TL;DR, the last line is "Once all of the above is completed,
| the processor will be able to successfully fetch instructions
| from a boot source. You are now effectively at the same point
| you would have been 5 months ago, had this been a standard
| 750 bringup... Board bringup from this point should be very
| straightforward and follow established methods."
| notacoward wrote:
| That's an amazing document. Practically every sentence,
| though tersely stated, hints at hours (or worse) of
| experimentation and head-scratching. The "would have been 5
| months ago" bit at the end is remarkably restrained. I'm
| certain I would have quit (or worse) by that point. Respect
| and condolences to whoever did this.
| jacquesm wrote:
| I had to do some of this while bringing up a 486 to run my
| own kernel. Very frustrating, to the point that I had the
| reset switch of the machine wired to a sustain pedal just so
| that I didn't have to dive under the desk all the time.
| intelVISA wrote:
| CAR is a gem. It's great for lite OSes in hostile envs.
| derefr wrote:
| Funny enough, a modern CPU doing CAR still has more memory
| than a PC from the 1980s. Presuming you statically recompiled
| them, you could run entire SNES games from a modern CPU's
| cache!
|
| (And that being said, now I'm wondering whether you could
| force eviction and retainment into L3 cache on demand, to
| achieve something like memory bank switching...)
| JonathonW wrote:
| SNES games? You could comfortably run Windows 95 within the
| L3 cache on many recent Intel processors (the one in my
| 2019-era MBP has 16 MB of L3 onboard; current generation
| processors go even bigger and Windows 95 only needs 4 MB).
|
| It's not really clear to me from the limited bits of info
| that I've read whether or not L3 is guaranteed to be
| accessible when doing CAR, but, if it is, you've got enough
| memory available to do a lot of stuff. (And even the L2
| cache is starting to get pretty big on the higher-end
| current-gen chips.)
| derefr wrote:
| Well, keep in mind that in the sort of state the computer
| is in when doing CAR, you don't get to talk to storage
| devices; nor do you get the benefit of having some kind
| of ROM on the bus. I know Windows 95 is happy to run from
| 4MB RAM _with access to a hard drive_ ; but how much
| memory would W95 need for a "bootable live-CD
| environment" where the disk image must be resident (if
| compressed) in memory along with all work RAM?
|
| (This is why I compared to the SNES: if you have to map
| the SNES's RAM _and_ [every bank of] the game 's ROM,
| then you're looking at 4-16MB depending on the game. The
| SNES is pretty much the newest console whose games would
| entirely fit, I think.)
| dfox wrote:
| I think that the only thing that prevents you from
| ignoring the memory controller and initializing the rest
| of x86 board while still remaining in the CAR mode is the
| sheer ridiculousness of doing that. As for whether you
| have an memory-mapped ROM available I'm not exactly sure,
| but the high-level model of what x86 firmware does seems
| to imply, that the hardware maps an part of SPI Flash at
| the address range where there was an ROM chip on the
| 8086/286/386 PCs (the actual address ranges are
| different).
| justsomehnguy wrote:
| About 35Mb IMSMR. Win98 could be stripped to around 50Mb
| without a loss of functionality.
|
| NB L3 is unified most of the time, but but with L2 you
| still need to distinguish between data/code.
| Arrath wrote:
| Man now I want to see this.
| hlandau wrote:
| In fact, the largest POWER9 CPUs have up to 110MB of
| L3... and Zen 4's L3 apparently maxes out at 384MB(!!).
| dist-epoch wrote:
| A modern motherboard can update it's BIOS from a USB stick
| WITHOUT a CPU or memory installed.
|
| Think about that. The motherboard "knows" how to read a FAT file
| system from a USB mass storage device, verify it's digital
| signature and flash it with no main CPU or memory.
| wmf wrote:
| I assume this is done with a microcontroller on the board.
| sebazzz wrote:
| And also born out necessity, given that many Intel and AMD
| boards can't be booted with a too new CPU if the BIOS doesn't
| know about it - not even for flashing a new BIOS - so you
| needed to borrow an old CPU just for the sake of upgrading
| the BIOS.
| ls612 wrote:
| It was originally to solve the issue where if you lost
| power flashing your bios you'd brick the system
| irrevocably. Now even if the bios is corrupt and the system
| won't boot you can reflash a known good firmware with stock
| settings to get back up and running.
| chasil wrote:
| The ARC processor was formerly in the northbridge of the
| chipset.
|
| Intel has since replaced this with an 80486 in modern
| designs; perhaps it also is implemented in the northbridge.
|
| https://en.wikipedia.org/wiki/ARC_(processor)
| wmf wrote:
| I think you're talking about the ME but I don't think the
| ME is responsible for "BIOS" flashing. I think it must be a
| separate microcontroller. This is kind of the point of the
| original blog post: don't go looking for "the
| microcontroller" because there isn't just one; there are
| many.
| Simplicitas wrote:
| Doesn't mention the special core reserved for the NSA and other
| national security agencies :-)
| travisgriggs wrote:
| I miss these kinds of articles on the net. Is anyone else
| reminded of the CPU Praxis articles that were part of ARS
| Technica's early rise to popularity? I really miss those. This
| article, is of course, much shorter, but still, I miss that sort
| of content on the internet.
| JdeBP wrote:
| As the author of https://superuser.com/a/347115/38062 and
| https://superuser.com/a/345333/38062, you have my sympathy about
| the "pack of lies" involving real mode and several wrong
| combinations of selector and offset.
| JdeBP wrote:
| It's also worth adding that none of this is new. There's always
| been a reason that the "C" in "CPU" has stood for "central".
| The idea that there are other, non-central, processors around
| the place goes back a long time.
|
| Four particular ones come to mind:
|
| * The DPT range of SCSI host bus adapter cards, many years ago,
| had an full blown MC680x0 processor on the card.
|
| * Connor Krukosky, who famously installed a mainframe in his
| basement with a console front-end processor that was a PC
| machine running OS/2.
|
| * PC/AT keyboards had on-board microcontrollers running
| programs.
|
| * And of course who can forget the BBC Micro's Tube?
|
| It's the short period in history where people thought that
| computers came with only one processor that is the real oddity.
| (-:
| jacquesm wrote:
| The Tube used the processor in the Tube as the CPU when it
| was connected but otherwise the CPU was the CPU in the BBC
| Micro itself. With the Tube CPUs connected (68K, Z80, 65C02,
| 32016 and more) the BBC processor served as I/O processor.
|
| The elegant and well adhered to OS calls made this a
| straightforward process, if your program ran on the BBC
| standalone it would work across the Tube for the 65(C)02, but
| for other coprocessors you had to at a minimum recompile and
| probably rewrite quite a bit of your code.
|
| https://sites.google.com/site/jamesskingdom/Home/computers-e.
| ..
|
| In a typical PC there are > 10 actual processors in the
| various peripheral and controller chips, and then there is
| the management engine (a full blown computer in its own
| right) or equivalent and usually almost every peripheral will
| have one or more processors as well.
| hlandau wrote:
| Had to reverse engineer a real mode PCI option ROM once... that
| was extremely unpleasant [1]. And then of course there's
| "Unreal Mode".
|
| Moreover Intel is just this week actually finally proposing
| removing real mode. [2] I'm a bit worried for what this means
| for emulation of old 16-bit Windows and DOS software under Wine
| (one of the great ironies that Wine can still run Win16
| programs on an x64 host OS when Windows can't) - though I
| suspect the performance requirements of such software is so low
| by modern standards that emulating such programs wouldn't pose
| any challenge.
|
| [1] https://www.devever.net/~hl/ortega [2]
| https://www.phoronix.com/news/Intel-X86-S-64-bit-Only
| JdeBP wrote:
| See https://news.ycombinator.com/item?id=36074093 for a more
| significant worry. Emulating a CPU is not affected as much as
| code that would otherwise have still run on the bare
| hardware.
| shrubble wrote:
| A while ago I bought some older AMD 8350 systems, which
| apparently are the last without a PSP, the platform security
| processor.
|
| I did this as a sort of 'just in case' setup, was planning to put
| OpenSolaris on it and run things under Zones or LX zones and to
| run it as a backup server. Fast enough to get some work done and
| possibly more secure if the PSP is ever used/broken
| maliciously...
| jacquesm wrote:
| That may well end up being a very prescient move. Be prepared
| to be labeled a tinfoil hat type until then, but I definitely
| think you are wise to take a precaution.
| buildbot wrote:
| "Turtles all the way down" Modern CPUs are so complex you need
| simpler ones to abstract it! Very cool breakdown of how power9
| does this.
| giuliomagnifico wrote:
| I understood nothing (as a sysadmin) but this looks like a very
| interesting article for who can understand it.
| bicolao wrote:
| I think you can see a modern CPU as a network. There are some
| beefy servers doing all the heavy lifting which is what the
| outsiders see. But there's also a few smaller servers here and
| there monitoring the system (or even responsible for powering
| on the entire network).
| hlandau wrote:
| Author here. This is very much the case for a computer system
| as a whole also. Basically a network of cooperating
| microprocessors, including in I/O peripherals etc.
|
| PCIe in particular is literally a packet-switched computer
| network - it has a physical layer, data link layer, and a
| transaction layer which is basically packet switched. There
| are even proprietary solutions for tunnelling PCIe over
| Ethernet.
| di4na wrote:
| And you have smaller one that basically pxe boot the bigger
| one and manage the power, cooling, etc. It is datacenters
| all the way down.
|
| As someone that used to do embedded, there is a reason i
| felt most at home in erlang and elixir.
|
| Their processes that share nothing and use message passing
| was really close to how it looks to build and code for an
| embedded platform.
| p_l wrote:
| To make it even funnier - Digital's last Alpha CPU, EV7,
| which was essentially the ancestor of AMD K8 (which finally
| brought "mesh" networking to mainstream PCs), actually had
| IP-based internal management network!
|
| Each EV7 computer had, instead of normal BMC, a bigger
| management node connected to 10MBit ethernet hub (twisted
| ethernet, fortunately :P), and this network was then
| connected to things like I/O boards, power control, system
| boards... including to each individual EV7 CPU. Each so
| connected component had a small CPU with ethernet that was
| responsible for interfacing their specific component to the
| network, and when the system booted part of it involved
| prodding the CPUs over ethernet to put them into
| appropriate halt state from which they could start booting.
| wmf wrote:
| Much of the openness of Power7/8/9 was _encouraged_ by Google who
| wanted to have control over all the firmware, even the secret
| firmware. I think Google is also auditing PSP /ME source code but
| the public only sees the audit results.
___________________________________________________________________
(page generated 2023-05-30 23:00 UTC)