[HN Gopher] Booting Modern Intel CPUs
       ___________________________________________________________________
        
       Booting Modern Intel CPUs
        
       Author : zdw
       Score  : 362 points
       Date   : 2023-04-17 04:07 UTC (18 hours ago)
        
 (HTM) web link (mjg59.dreamwidth.org)
 (TXT) w3m dump (mjg59.dreamwidth.org)
        
       | lynguist wrote:
       | There are so many steps and sidesteps there! Could someone who
       | has access to GPT-4 API feed this text in and ask it to produce
       | the topological or chronological step-by-step form of it? I'd be
       | curious to see it.
        
         | rep_lodsb wrote:
         | Maybe you should read this article on the same site:
         | 
         | https://mjg59.dreamwidth.org/64090.html
        
           | twic wrote:
           | Clearly ChatGPT has just reasoned out all the secret NSA-
           | mandated additions to TPM 2.0 that are not listed in the
           | specification.
        
       | yuuta wrote:
       | That's a pretty good article, but I expect more in-depth
       | information on booting modern Intel CPUs ... I am very interested
       | in modern UEFI / BIOS firmware development and how do they bring
       | up x86 CPUs, but unfortunately there are very little source (I
       | guess, except for EDK2), and the majority (?) of x86 firmwares
       | are proprietary. Booting x86 is much more complicated than
       | writing a linker script with a vector table for your
       | microcontroller ... so, this seems very interesting.
        
       | smikhanov wrote:
       | There's a very detailed but less emotional writeup on the same
       | topic from 2018, highly recommended:
       | https://binarydebt.wordpress.com/2018/10/06/how-does-an-x86-...
        
         | ignoramous wrote:
         | And a _how computers boot_ discussion this previous month:
         | https://news.ycombinator.com/item?id=35229045
        
       | evilos wrote:
       | So I guess this is what Bryan Cantrill meant when he said that a
       | million kittens were slaughtered every time you boot your CPU.
        
         | empiricus wrote:
         | Sometimes I think our universe works by killing kittens (1 mil
         | every hour). (The alternative explanation for all the dead
         | kittens is just Moloch).
        
       | porphyry3 wrote:
       | > ...reprograms the CPU into a sensible mode (ie, one without all
       | this segmentation bullshit)...
       | 
       | Protected mode still has segmentation except that segment
       | registers now index into either a global or local descriptor
       | table (DT). Linux tries to make this transparent by setting up
       | descriptors where logical addresses coincide with linear
       | addresses (at least for CS and DS).
        
       | rkagerer wrote:
       | Thanks for educating me as to the dumpster fire of patched
       | together steps that take place in the first milliseconds of
       | booting my PC.
       | 
       | With regard to this:
       | 
       |  _Intel CPUs ship with built-in microcode, but it 's frequently
       | old and buggy and it's up to the system firmware to include a
       | copy that's new enough that it's actually expected to work
       | reliably._
       | 
       | ...I wish companies put more effort into releasing finished,
       | polished products. Not just Intel. I played with an ESP32-S3
       | recently and learned not only was the entire digi controller for
       | ADC2 'dropped' from support due to bad silicon, but using ADC1
       | with DMA was fraught with gotchas and blatently incorrect info in
       | the documentation (reported at least a dozen to EspressIf and
       | they acknowledged).
        
         | sgtnoodle wrote:
         | The ESP32 ADC is a dumpster fire. I used one to sample an
         | analog signal at 8Khz, looking for 400uS wide pulses (pacemaker
         | capture pulses.) I originally started with an interrupt that
         | triggered conversion from software. It would completely miss
         | pulses 10-20% of the time and instead drift around, as if the
         | ADC was floating rather than connected to anything. I finally
         | got it to work reliably by switching to DMA through the I2S
         | peripheral.
         | 
         | Pretty much every hobby project I look at, people complain
         | about how bad the ADC is, and then just proceed to take N
         | samples and average them. It's not that the ADC is bad, it's
         | that the underlying firmware is buggy. I suspect there's a race
         | condition with the main CPU and a coprocessor that's using the
         | ADC for some internal wifi stuff.
        
           | rkagerer wrote:
           | Yeah I felt that pain :-).
           | 
           | On the ESP32-S3 ADC2 is shared with Wifi, but I don't think
           | that's the case for ADC1. I made headway by throwing away all
           | their boilerplate code and setting up the raw peripheral
           | registers myself. I actually prefer it that way without all
           | the abstraction underneath. Was getting nice results at 80kHz
           | sample rate.
           | 
           | Would love to turn it into a write-up as I'm not sure anyone
           | has done this before with that chip (at least I couldn't find
           | any sample code around). It's actually all in MicroPython at
           | the moment to boot... but wouldn't be hard to convert to C
           | (and might even clean things up due to the more natural fit
           | of that language working with bitfields). What I'd really
           | like is to make it into a DMA-based ADC module for
           | MicroPython but I'm not familiar enough with compiling that
           | platform from scratch, particularly on Windows.
        
             | matthewfcarlson wrote:
             | If you do end up writing it up, consider dropping it in the
             | hackaday tip line (https://hackaday.com/submit-a-tip/). It
             | sounds incredibly useful.
        
         | gkhartman wrote:
         | Thanks for the tip about the ESP32-S3 ADC2. Coincidentally, I
         | was about to embark on that trail only to learn the hard way.
         | You've likely saved me a good deal of frustration.
        
         | ChuckNorris89 wrote:
         | This is why companies still choose to buy expensive
         | microcontrollers from big name reputable manufacturers
         | (Microchip, TI, SiLabs, NXP, Nordic, STM, Infineon, Renesas,
         | etc.) instead of going for the cheaper Chinese ARM
         | microcontrollers at half the price.
         | 
         | They tend to usually test their designs a lot more thoroughly,
         | which adds to the final cost, and even when their products do
         | come with flaws, they're more likely to be transparent about it
         | and assist you with workarounds and sometimes send over field
         | engineers on-site to help you.
         | 
         | They also tend to be more honest about the specs, capabilities
         | and limitations of their products in the datasheets. Finding
         | out half-way through the design phase that some claims in the
         | datasheet are bogus is a no-go for most companies.
         | 
         | If you're just doing hobby work, or tinkering, or planning to
         | ship millions of bottom of the barrel products on AliExpress
         | with no intention of providing any warranty or customer support
         | for them, then it's fine to go with whatever's the absolute
         | cheapest, but serious companies like Apple, Sony, etc. who care
         | about the customer experience, won't risk delaying a product
         | launch because they wanted to save 50 cents on a new unproven
         | cheap microcontroller who's ADCs don't work right.
        
           | monocasa wrote:
           | I don't know about all of that. TI's Stellaris
           | microcontrollers were probably the worst chips I've had to
           | deal with including sketchy Chinese chips.
        
           | mmac_ wrote:
           | I would generally agree with this, however I do find the
           | esp32 on the whole has been as reliable as any of the major
           | brands (although haven't had to use an ADC). We don't drive
           | them too hard though, but their cheapness has pushed the
           | bigger guys to get more competitive on pricing which is a
           | good thing. TI in particular seems to have sharpened their
           | pencil a bit, maybe due to all their new fabs coming online?
           | 
           | The big guys still screw up, and over the journey I've
           | noticed quite a few MCU subfamilies go EOL far before they
           | should and it's usually due to silicon that has too many bugs
           | in it. Maybe the big guys told them 'no' so there weren't any
           | decent volumes on them anymore and they were forced to adapt.
           | 
           | Sometimes they're a bit more subtle. You've probably seen
           | quite a few 'A' revision part numbers recently where they
           | clearly keep the same MCU but fix the bugs. See this on other
           | IC's as well.
           | 
           | For us, logistics (supply chain) and a solid support team are
           | the highest importance. It's rare that we're locked into a
           | single vendor due to a must-have-feature. These requirements
           | narrow down our choices very quickly, and I'm sure it varies
           | region-by-region (and how much $$$ you spend).
        
           | RicoElectrico wrote:
           | Uh, have you ever seen the length of a typical STM32 errata
           | sheet? ;)
        
             | voxadam wrote:
             | I'd rather have a long errata sheet than no errata at all.
             | I'll take truth in advertising any day over burring my head
             | in the sand and pretending that everything's perfect.
        
             | ChuckNorris89 wrote:
             | Uh, have you read the part in my comments where I said that
             | they're morel likely to be honest about it? Having a short
             | errata or no errata at all doesn't mean the product is
             | flawless. It could be that they don't know all the faults
             | yet, or aren't sharing them, or both.
             | 
             | For ST it could be that they seem to be the most popular
             | cheap ARM microcontrollers for hobbyists and consumer
             | products, so it's easier to find faults in them since so
             | many companies use them, similar how the most used pieces
             | of software also have the most vulnerabilities reported on
             | them.
             | 
             | Also, maybe ST was not a great example on my end, as they
             | seem to have an obsession lately with outsourcing and
             | farming out everything to the cheapest offshore location
             | possible and penny pinching to the extreme for their
             | consumer oriented parts. I don't blame them though,
             | competition in the generic ARM microcontroller market is
             | cutthroats and margins are slim and salaries are low and
             | your major customers (Sony, Apple, Samsung, LG, etc) keep
             | putting the pressure on you to lower your prices or
             | threaten to look somewhere else.
        
               | xobs wrote:
               | I've most definitely run into issues with others in that
               | list, and my reports to the companies never generated an
               | errata.
               | 
               | One fun one was that when accessing the External Memory
               | Interface module (think: parallel ROM) and switching from
               | Bank 1 to Bank 0, the HDMI controller would get reset,
               | but only when configured for two banks of 32 MB.
               | 
               | If I did one bank of 64 MB and just used the extra pin as
               | CS, it worked just fine.
               | 
               | There are lots of similar quirks in every chip.
        
               | ChuckNorris89 wrote:
               | From my time in the semi industry, when reporting issues
               | for the erata, it depends who _you_ are.
               | 
               | Are you some hobbyist or small company who sent your
               | issue report to some generic company email address? Then
               | your report most likely neve reached anyone on the team
               | working on that chip, but probably reached some clueless
               | jobsworth who didn't know what to do with it because
               | large semi companies are highly siloed and there's no
               | centralized management for such things, so you need
               | direct contact with the team responsible for that chip.
               | 
               | If you're large customer, you then have direct email
               | addresses of support engineers, application engineers and
               | the engineers who worked on the chip itself, and so your
               | reports will definitely taken seriously.
               | 
               | There's also another issue. There's public datasheets and
               | eratas which are often not updated or fully transparent,
               | and then there's confidential datasheets and eratas,
               | which are updated and issued under NDA to industrial
               | customers on a need to know basis. As a hobbyist you
               | rarely get all the truthful info.
               | 
               | It's not ideal, but the semi industry mostly focuses on
               | large customers who buy in large volumes of product, not
               | on hobbyists and tinkerers.
        
               | 1827162 wrote:
               | And that cost cutting is likely due to the rise of
               | Chinese STM32 clones from companies most of us have never
               | heard of such as GigaDevice, CKS, Geehy, MindMotion, APEX
               | Semiconductors, etc...
               | 
               | Some die photos here: https://mecrisp-stellaris-
               | folkdoc.sourceforge.io/clones-stm3...
        
         | SassyGrapefruit wrote:
         | I think my favorite part of this discussion is that most
         | developers give this sort of process/code a mystical vibe. Like
         | it's some sacred code built by a priesthood of the world's
         | greatest and smartest programmers. Not accessible to mere
         | mortals.
         | 
         | I encourage developers that work for me to just read the code
         | and documentation. My advice to them is usually something along
         | the lines of...
         | 
         |  _Think of how you would have done it as a sophomore in college
         | completing an overdue assignment... chances are it works just
         | like that_
        
         | taspeotis wrote:
         | Sure but I have some sympathy here ... a contemporary CPU
         | probably has a million different capabilities and if some not-
         | even-single-digit-percentage of them don't work that's a lot of
         | errata. Plus it's hardware! So it's hard to fix it once it's
         | out.
        
           | saagarjha wrote:
           | Single digit percentage? CPUs are generally bug-free to
           | several orders of magnitude greater than that.
        
             | taspeotis wrote:
             | Almost as if they have a not-even-single-digit-percentage
             | of bugs...
        
         | rbanffy wrote:
         | > dumpster fire of patched together
         | 
         | That's more or less the history of the x86, from the 8086 and
         | on. I'd add a couple extra expletives as well. I can imagine a
         | number of better ways seeding CPU state prior to starting it in
         | a way they could all be neatly started in parallel.
         | 
         | > I wish companies put more effort into releasing finished,
         | polished products.
         | 
         | Some do. It's just that their hardware is expensive and not
         | very available outside some niches.
        
       | vaxman wrote:
       | "It then reads the initial block of the firmware (the Initial
       | Boot Block, or IBB) into RAM (or, well, cache, as previously
       | described) and parses it. There's a block that contains a public
       | key - it hashes that key and verifies that it matches the SHA256
       | from the fuses. It then uses that key to validate a signature on
       | the IBB. If it all checks out, it executes the IBB and everything
       | starts looking like the nice simple model we had before."
       | 
       | Should all be performed by ME/PSP now, but the Intel ME/AMD PSP
       | can't be updated (even with new microcode) and older versions are
       | known to contain serious vulnerabilities. Thus we can reduce all
       | of our options down to either (a) not running with Secure Boot or
       | (b) running with Secure Boot, but only on the very latest
       | Intel/AMD processors. So if we have the very latest processors,
       | then we only need to worry about validating "critical" microcode
       | updates to them and that doesn't need to happen at power-on at
       | all, it can happen in the ME/PSP, but we do need to know about
       | them at the C-suite level, since it will impact purchasing of
       | replacement gear, financial statements and risk analysis
       | decisions.
       | 
       | That means we should power-on directly into the on-die ME/PSP and
       | it, in turn, can bring up the RAM, PCI, etc. and check (online or
       | on attached storage) for "critical" updates and, if found, emit a
       | "distress beacon" on the network while also checking for a signal
       | (from a resistor, the network or attached storage) that will
       | inform it to download and update (rather than simply halting) the
       | processor. This allows Management (the ones who have to sign
       | SarBox, not just the system admins) to be informed of emerging
       | vulnerabilities in their infrastructure (because of the "distress
       | beacons" triggering indicators on their enterprise dashboards) so
       | that they may choose to take explicit action to allow processing
       | to continue (by emitting the signal that all ME/PSP would look
       | for to enable critical microcode updates instead of halting).
       | 
       | Likewise, we need ME/PSP to also start verifying all connected
       | communication interfaces (before and after boot) by checking
       | digital signatures that chip manufacturers obtain from Intel/AMD
       | --much like what developers have to do to get their code to run
       | in kernel space on an Apple Mac. I mean, CVE-2022-21742, et. al.
       | --not just Thunderbolt and USB attacks.
       | 
       | The only other rational solution really is to run with Secure
       | Boot disabled (because older Intel/AMD processors cannot be
       | trusted anyway --thanks to their non-updatable components like
       | ME/PSP-- even with post-production microcode updates, even with
       | Secure Boot).
       | 
       | Anyway, 11 years later, will leave this here ..
       | https://www.extremetech.com/defense/133773-rakshasa-the-hard...
        
         | mjg59 wrote:
         | > the Intel ME/AMD PSP can't be updated
         | 
         | Yes they can, their firmware is in the same flash as the system
         | firmware.
        
       | booi wrote:
       | This is insanity. I wonder if ARM CPUs were able to start from
       | scratch or a better place.
        
         | grishka wrote:
         | For one, there never was, and still isn't, any universal BIOS-
         | like spec for how an ARM CPU should boot and which peripherals
         | it should have.
        
           | [deleted]
        
           | penguin_booze wrote:
           | Arm v7 was a Wild West, but with v8, Arm tried to standardize
           | a lot. The Arm Trusted Firmware is the reference boot
           | firmware implementation for v8+ CPUs: https://github.com/ARM-
           | software/arm-trusted-firmware.
           | 
           | I'd think most of the referece documents can be discovered
           | from that code base.
           | 
           | Relatedly, from the perspective of hands-on programming, the
           | System Programmer's guide is _the_ manual to start with:
           | https://developer.arm.com/documentation/den0024/a/.
        
         | yencabulator wrote:
         | To drive home how far from sane common ARM bootup sequences
         | are, the Raspberry Pi is started by its GPU. You can think of
         | an RPi as a proprietary GPU with an auxiliary ARM CPU.
        
           | jsmith45 wrote:
           | Yeah, and the GPU code is designed to read the linux kernel
           | into RAM, and set it up so it begins executing the linux
           | kernel as the inital instructions of the ARM CPU. If you want
           | some more normal bootloader like u-boot you need to jump
           | through hoops to make sure the GPU based bootloader can treat
           | it like a weird Linux Kernel.
           | 
           | (In theory, with source for the GPU 2nd stage bootloader one
           | could change things, but RPI foundation does not provide
           | access to that source).
        
         | [deleted]
        
       | moose_man wrote:
       | When technical debt strangles your entire business. "We'll fix
       | this next release"
        
         | mnd999 wrote:
         | It's not technical debt, it's a feature. Being able to boot and
         | run software from decades ago has served them well in the past.
        
           | mjg59 wrote:
           | That's an argument for including support for real mode,
           | rather than for coming up in it. Modern systems booting
           | legacy software are already transitioning into protected mode
           | to run the UEFI stack, and then switching back to real mode
           | before passing control to the Compatibility Services Module.
        
             | LoganDark wrote:
             | > That's an argument for including support for real mode,
             | rather than for coming up in it.
             | 
             | Not really. You have to come up in it to boot software that
             | expects to already be in it.
        
               | mjg59 wrote:
               | No, firmware needs to hand off in the state the software
               | you're booting needs. That says nothing about the mode
               | the CPU needs to be in when it starts running firmware.
        
               | LoganDark wrote:
               | > No, firmware needs to hand off in the state the
               | software you're booting needs. That says nothing about
               | the mode the CPU needs to be in when it starts running
               | firmware.
               | 
               | This assumes that "firmware" doesn't count against
               | backwards compatibility, which isn't necessarily the
               | case. Maybe Intel (or AMD) doesn't have a 100% monopoly
               | on firmware to be confident enough that the CPU mode is
               | an implementation detail. Or maybe some customers do
               | indeed run their own "firmware" (maybe embedded?). No way
               | to be sure.
        
               | mjg59 wrote:
               | Firmware needs to know CPU-specific details (it needs to
               | be able to program the memory controller, for instance),
               | so skipping (or inverting) the real mode to protected
               | mode code in the firmware is just another step in porting
               | to a new platform.
        
               | LoganDark wrote:
               | Are we talking about the same "firmware" here? If you're
               | talking about firmware _loaded directly onto the CPU_
               | (like microcode updates are), that runs even before the
               | motherboard gets to do anything, then the mode the CPU
               | starts in can only be observed after that point anyway,
               | so for all we know it probably could already be
               | implementing your idea without anyone noticing.
               | 
               | I have objections to changing the way the CPU
               | _observably_ starts (i.e. mode in which the BIOS or
               | bootloader starts in).
        
               | mjg59 wrote:
               | I'm talking about the firmware on the motherboard. Your
               | BIOS is CPU-specific - if a future CPU changes the
               | default CPU mode, you simply update your BIOS code to
               | match while you're doing the rest of the work you need to
               | do for that BIOS to run on the new CPU. If the BIOS
               | expects to run in real mode (I'm not aware of any modern
               | firmware that does, but) then you just add some code to
               | switch back to real mode. Otherwise, you probably just
               | delete the code that currently transitions from real mode
               | to protected mode. That doesn't preclude you switching
               | back to real mode if the bootloader expects that.
        
               | LoganDark wrote:
               | > I'm talking about the firmware on the motherboard.
               | 
               | Then that's what I thought, yeah.
               | 
               | I don't see why you're explaining how your idea would be
               | implemented; I'm rather saying that implementing it in
               | that way might be prohibitive if Intel or AMD still have
               | customers that expect the CPU to act a certain way. And
               | these customers aren't necessarily standard
               | desktop/laptop motherboards.
               | 
               | In other words, changing "what mode the CPU starts in"
               | would be a big and observable breaking change and not
               | _necessarily_ just an implementation detail that can be
               | magically worked around by firmware updates like you
               | describe.
        
               | mjg59 wrote:
               | You usually can't take existing firmware and run it on a
               | new CPU, because the new CPU requires different bringup
               | code anyway. Take a look at https://github.com/coreboot/c
               | oreboot/tree/master/src/soc/int... to get some idea of
               | how many different implementations there are for modern
               | Intel alone (there's a bunch more for the pre-SoC style
               | Intels). If you already have to port your firmware to a
               | new CPU, you can deal with the CPU starting in a
               | different mode - it is _entirely_ an implementation
               | detail that can be handled in the firmware.
        
         | jeroenhd wrote:
         | I believe one of the Playstations or Xboxes run an AMD64 chip
         | that does away with most of the legacy stuff. I read about it
         | in an article about getting Linux to run on a homebrewed
         | console.
         | 
         | If I remember correctly, this required hacking around a lot of
         | assumptions in the Linux kernel. I imagine the Windows kernel
         | won't be that different.
         | 
         | If Intel or AMD bring out a CPU that doesn't support any
         | operating system in use today (or any UEFI firmware/BIOS
         | implementation for that matter) they wouldn't be selling many
         | chips. Many vendors outsource their driver update tools to
         | third parties, which in turn use tiny operating systems like
         | FreeDOS to flash firmware onto devices; it'd suck for them to
         | end up needing to rebuild their operating systems.
         | 
         | Likewise, a dedicated GPU also plays a role in the boot
         | process, and taking away the legacy assumptions of the GPU boot
         | ROM will probably also require flashing any consumer graphics
         | card with new firmware as well. Then there's PXE network boot,
         | which often still relies on separate firmware, which also
         | brings its own expectations about the state of the CPU.
         | 
         | Bringing up a modern CPU is going to be a terrible hack
         | whatever you do. I don't see why Intel would need to re-
         | engineer their entire boot process. The current system is hacky
         | as hell but it works and it doesn't require much more work than
         | putting the firmware and microcode images in the right place.
         | 
         | I seriously doubt that redoing their entire boot process and
         | guiding everyone from motherboard manufacturers to driver
         | programmers on how to use the new system (and to iron out the
         | bugs in the new process) will be more cost effective than
         | letting all the old code work like it does today. Very rarely
         | do complete rewrites make any business sense.
        
           | mjg59 wrote:
           | GPU option ROMs no longer make assumptions about legacy setup
           | - UEFI option ROMs are executed in either 32-bit or 64-bit
           | mode, and there's no need to implement any of the legacy VGA
           | compatibility. Same for PXE, which just hooks into the UEFI
           | network stack rather than having to deal with anything
           | legacy.
           | 
           | No OS assumes real-mode for the boot processor at this point.
           | If you boot Linux on a UEFI system you'll jump straight into
           | the kernel in 64-bit protected mode. The only time real mode
           | comes into play is in the bringup of other CPUs (which is
           | something that can be ignored now that ACPI specifies an
           | alternative) and ACPI resume (which isn't relevant on systems
           | that use S0ix rather than S3), so you could absolutely ship
           | an x86 CPU that didn't support real mode and all you'd have
           | to do is modify the firmware entry code. Modern operating
           | systems would Just Work, as would hardware option ROMs.
           | 
           | (And enough systems no longer ship with CSMs that people
           | aren't using FreeDOS for firmware updates any more - it's
           | either Linux or a UEFI executable)
        
             | RulerOf wrote:
             | >And enough systems no longer ship with CSMs that people
             | aren't using FreeDOS for firmware updates any more - it's
             | either Linux or a UEFI executable
             | 
             | Is this only for OEM systems? I'm used to seeing these
             | happen from inside of Windows, with the exception of
             | motherboard firmware all happening inside of the setup
             | program. It would make sense for much of that to be EFI
             | applications nowadays, although there's not much in the way
             | of context around these GUI wrappers to really indicate
             | what's going on under the hood.
        
       | userbinator wrote:
       | Just like the locked-down mobile devices that have unfortunately
       | perverted the nature of general-purpose computing, Boot Guard is
       | a tool of planned obsolescence and manufacturer control disguised
       | as "security". Want to fix something in the BIOS that they didn't
       | want you to have[1][2][3]? Too bad, it's locked and they won't
       | release a newer version to force you to buy another. Absolute
       | bastards.
       | 
       | [1] https://news.ycombinator.com/item?id=29837884
       | 
       | [2] https://news.ycombinator.com/item?id=28254571
       | 
       | [3] https://news.ycombinator.com/item?id=33650347
        
       | vrglvrglvrgl wrote:
       | [dead]
        
       | cperciva wrote:
       | Another fun thing with SMP: The x86 multiprocessor spec says that
       | to start an AP you need to send an IPI, wait 10 ms, then send
       | another IPI (IIRC it's a "reset" followed by an "init"). On large
       | systems, this adds up!
       | 
       | Except that you don't need to wait 10 ms for each AP -- you can
       | start up the APs in parallel. There's just one small problem: All
       | of the APs start up in the same state -- executing from the same
       | CS:IP, _and also the same stack pointer_. Good luck having
       | hundreds of CPUs stomping over each other 's stacks.
       | 
       | Except that if you're careful, it doesn't matter -- you can even
       | make a function call if you want _because all of the CPUs will
       | push the same return address onto the stack_.
       | 
       | Implementing this in FreeBSD is on my "speeding up the boot" to-
       | do list. I know it's possible though, because someone told me
       | that they had already done exactly this in a different (non open
       | source) system.
       | 
       | Lexicon for the non-x86 people: SMP = Symmetric MultiProcessing,
       | aka more than one "virtual CPU". AP = Auxiliary Processor, any
       | CPU other than the one which the BIOS starts up for you. IPI =
       | InterProcessor Interrupt, how CPUs wake each other up. CS:IP =
       | Code Segment + Instruction Pointer, where the CPU is reading
       | instructions from.
        
         | pantalaimon wrote:
         | I thought Linux does this already, at least there is a patch:
         | https://lore.kernel.org/lkml/20230414225551.858160935@linutr...
        
           | __turbobrew__ wrote:
           | I wonder if risc-v is much faster to boot since the
           | architecture doesn't come with all of this legacy cruft that
           | x86 needs to deal with?
        
             | riceart wrote:
             | What legacy cruft? (speaking specifically of SMP boot here)
        
             | mrguyorama wrote:
             | If successful it will inevitably accrue it's own cruft.
        
               | snvzz wrote:
               | Boot process being codified in a specification minimizes
               | the risk.
        
               | layer8 wrote:
               | You mean, like booting in real mode was codified for x86?
        
         | toast0 wrote:
         | > All of the APs start up in the same state -- executing from
         | the same CS:IP, and also the same stack pointer. Good luck
         | having hundreds of CPUs stomping over each other's stacks.
         | 
         | I'm away from my hobby OS to double check, but isn't it the cae
         | that the Start IPI includes a page number which drives CS? If
         | you send those out one by one, you can give each AP its own
         | code page and set the stack page based on that (either using
         | the CS value to index into something, or as an immediate value
         | in the code, that you modify as you copy to the page). Of
         | course, if you do a broadcast SIPI, then all of those are going
         | to have the same CS. Depending on how much early boot code you
         | fancy writing in assembler, you could maybe jump into
         | protected/long mode, find the current cpu id, and lookup the
         | proper stack pointer without using the stack at all, and only
         | then jump into C code? Of course, one probably has nice C
         | functions for some of those things, so it doesn't seem nice to
         | also have it in assembly.
        
           | toast0 wrote:
           | > I'm away from my hobby OS to double check, but isn't it the
           | cae that the Start IPI includes a page number which drives
           | CS?
           | 
           | I double checked, and as I understand it, with traditional
           | APIC start IPI, you get to pick the CS address to be (0-255)
           | * 0x1000; although how much of the first 1MB of physical
           | address space is available within that depends on the system
           | memory map. I just start one AP at a time, and use the top of
           | the code page as stack space until it switches to the
           | intended kernel stack, and then that AP starts the next one.
           | That's not time efficient though; you could pretty easily
           | start as many APs as you've got low pages available; although
           | the option where everybody starts from the same place and
           | they figure it out among themselves is probably simpler;
           | because there's never a need to wait for an AP to finish
           | starting before starting more APs; just saying, you've got
           | options, they don't all have to start at the same CS:IP.
        
           | mananaysiempre wrote:
           | If you know how many CPUs you are bringing up, then you can
           | allocate a bunch of stacks contiguously and have the CPUs
           | race to pick up the next one, say                 mov rsp,
           | STACKSIZE       lock xadd [currstack], rsp
           | 
           | Of course, the contention on that xadd is going to cost you
           | (if not 10ms per CPU... probably?), and this presumes you
           | aren't using the kernel stack pointer for anything (like a
           | stable CPU number). To fix that, you probably will need to
           | traverse a CPU -> startup data map in assembly. But it's a
           | start (no pun intended), and is not as horrendous a hack as
           | having multiple CPUs push the same return address onto the
           | same stack.
        
             | klempner wrote:
             | As an order of magnitude point, my experience has been that
             | a bunch of CPUs trying to xadd has a throughput bottleneck
             | on the scale of once per 50 to 100 nanoseconds.
             | 
             | But even if you allow an entire extra order of magnitude,
             | at one per microsecond, that's still 10000 over the course
             | of 10 milliseconds which is plenty for this usecase, at
             | least for now.
        
         | unnah wrote:
         | Do you need special case processing for SMT (Symmetric
         | multithreading) in there, or is it actually completely
         | transparent?
        
           | cperciva wrote:
           | As far as the startup process is concerned, SMT is two CPUs.
           | I don't actually know how SMT works when one "CPU" has been
           | started and the other hasn't... I guess it just pretends that
           | it hit a hlt instruction on the unstarted "CPU"?
        
         | JoshTriplett wrote:
         | Also, on any modern system, you really don't need the second
         | SIPI. The CPU will come up with the first SIPI, and then ignore
         | the second SIPI. So you can just send a pile of INITs and then
         | a pile of SIPIs (or in theory one broadcast SIPI), and expect
         | the CPUs to come up.
         | 
         | For the startup code, you shouldn't need to make a function
         | call. A few lines of memory-less stack-less assembly could get
         | the CPU number and then change the stack, assuming you have a
         | global value that gives the base of a preallocated array of
         | stacks.
        
           | Dwedit wrote:
           | How "modern" are we talking here? Core 2 Duo? Arrandale?
           | Haswell? Skylake?
        
             | mananaysiempre wrote:
             | Per the OSDev Wiki, Pentium Pro and later[1]:
             | 
             | > For newer CPUs (P6, Pentium 4) one SIPI is enough, but
             | I'm not sure if older Intel CPUs (Pentium) or CPUs from
             | other manufacturers need a second SIPI or not.
             | 
             | When (and if) that became officially sanctioned behaviour
             | is another question.
             | 
             | [1] https://wiki.osdev.org/Symmetric_Multiprocessing#Initia
             | lisat...
        
           | vardump wrote:
           | > A few lines of memory-less stack-less assembly could get
           | the CPU number and then change the stack...
           | 
           | Except for that TSC_AUX (MSR that stores CPU id number) is
           | going to be 0 for all of the cores. Unless you know some
           | other way to get CPU number?
        
             | dfox wrote:
             | Core ID in TSC_AUX is essentially an concession to
             | userspace. As an OS and firmware you are supposed to
             | identify CPU cores by means of their LAPIC ID (as read from
             | APIC configurations MSR or from CPUID). Small issue there
             | is that APIC IDs are structured according to HT/NUMA
             | topology and thus not necessarily consecutive.
             | 
             | On the other hand, as an OS on PC-like platform you know
             | how many cores there are supposed to be and what are their
             | APIC IDs before-hand because they were already enumerated
             | by firmware (which is the reason why you can do the one by
             | one AP startup sequence in the first place).
        
               | JoshTriplett wrote:
               | Exactly: either read the APIC ID and use that to look up
               | the CPU number in a table you already have, or arrange a
               | location in memory to use xadd to assign a sequential CPU
               | number, whichever your OS prefers.
        
       | jeffbee wrote:
       | Huh, I did not realize until reading this that the IME is also
       | x86. I assumed it was just whatever was most convenient, which
       | seems like it rules out x86, but I guess not.
        
         | mjg59 wrote:
         | It was initially ARC, but transitioned to x86 with version 11.
        
           | p_l wrote:
           | IIRC some of the variants used embedded SPARC, though they
           | are a rare find.
        
         | usr1106 wrote:
         | AMD uses ARM for that purpose IIRC.
        
           | anonymfus wrote:
           | No, the role of the AMD's Platform Security Processor in the
           | boot process is completely different from Intel's IME, as PSP
           | is located on the CPU side and so entire boot process and
           | security checks are completely different and would require a
           | separate writeup.
        
       | usr1106 wrote:
       | Use the cache as RAM. So I guess you could run a small Linux
       | system just in cache without any RAM?
       | 
       | Why? As a fun project with some old motherboard for example :)
        
         | JoshTriplett wrote:
         | With a fair bit of effort; notably, DMA and IOMMUs probably
         | won't work, and most modern devices don't really support PIO.
         | You might be able to boot a really simple environment that runs
         | out of an initramfs though. It's also entirely not obvious to
         | what degree you can use the paging system.
         | 
         | It'd likely be a substantial effort to port Linux.
        
           | usr1106 wrote:
           | Sure, I would not expect to do any advanced IO. But right,
           | probably there is no serial console really close to the CPU
           | either, this is not a Raspberry PI. No idea what signals
           | could be used for that.
           | 
           | Page tables I am not sure either. Could you still do it like
           | in Linux 1.0? No idea what things looked like then, but I
           | assume much less dedicated hardware support.
        
             | JoshTriplett wrote:
             | > probably there is no serial console really close to the
             | CPU either
             | 
             | outb to 0x3f8 _might_ work.
             | 
             | I know there were once versions of Linux that supported
             | running without an MMU. Those versions, with very limited
             | hardware support, _might_ work in this mode.
        
               | mschuster91 wrote:
               | It's still part of the kernel code:
               | https://www.kernel.org/doc/Documentation/nommu-mmap.txt
               | 
               | And apparently, good enough to run DOOM:
               | https://hackaday.com/2022/12/07/a-tiny-risc-v-emulator-
               | runs-...
        
           | wtallis wrote:
           | Hasn't Intel supported DMA into L3 cache for something like a
           | decade now?
        
             | adrian_b wrote:
             | It is supported on true Xeon systems.
             | 
             | I believe that it is not supported in Core CPUs, not even
             | in those of them which were branded as Xeon E or Xeon W.
        
             | usr1106 wrote:
             | Right, I did not specify what L ;) And the original article
             | did not mention either.
        
           | usr1106 wrote:
           | Substantial effort I don't doubt. First probably years of
           | learning how things work under the hood for most normal
           | mortals.
        
           | lamp987 wrote:
           | DMA on x86 does update contents of CPU caches.
        
             | JoshTriplett wrote:
             | Yes, it does, but that doesn't mean the hardware
             | necessarily supports doing it without a memory controller.
        
         | Klinky wrote:
         | With the some of the new AMD Epyc CPUs having over 1GB of L3
         | cache, you could run a pretty full-featured Linux distro + app
         | entirely in cache.
        
           | undersuit wrote:
           | If anyone can verify that Cache as RAM actually works on AMD
           | though or explain how AMD boots without it that would be
           | great:
           | 
           | >Cache-as-RAM (CAR) is no longer a supportable feature in AMD
           | hardware.
           | 
           | https://git.furworks.de/coreboot-
           | mirror/coreboot/commit/a245...
        
             | mjg59 wrote:
             | The PSP sets up the memory controller before the x86 cores
             | are started. It's not implausible that the PSP has some
             | sort of cache as RAM stage, but that's before Coreboot
             | starts.
        
               | layer8 wrote:
               | For a moment I was confused that the PlayStation Portable
               | would have x86 cores.
        
       | TacticalCoder wrote:
       | That is very interesting.
       | 
       | > I'm also missing out the fact that this entire process only
       | kicks off after the Management Engine says it can, which means
       | we're waiting for an entirely independent x86 to boot an entire
       | OS before our CPU even starts pretending to execute the system
       | firmware.
       | 
       | I take it that that OS is Minix?
       | 
       | > But what verifies the first component in the boot chain? You
       | can't simply ask the BIOS to verify itself - if an attacker can
       | replace the BIOS, they can replace it with one that simply lies
       | about having done so. Intel's solution to this is called Boot
       | Guard.
       | 
       | Wait... How can an attacker replace the BIOS? Aren't motherboard
       | nowadays protected from unwarranted BIOS flashing?
       | 
       | Say I'm an attacker and I got root on some PC (Intel or AMD)
       | running Linux, how do I replace the BIOS with a backdoored BIOS
       | without the user noticing?
        
         | mjg59 wrote:
         | > I take it that that OS is Minix?
         | 
         | It's the Minix kernel, I don't think the userland contains much
         | Minix.
         | 
         | > Wait... How can an attacker replace the BIOS?
         | 
         | If you have physical access you can just attach to the flash
         | directly and reprogram it. This is very much in-scope for
         | various people.
        
           | LoganDark wrote:
           | > If you have physical access you can just attach to the
           | flash directly and reprogram it. This is very much in-scope
           | for various people.
           | 
           | Not to mention just replacing the motherboard since the CPU
           | is socketed and could go anywhere.
        
       | superkuh wrote:
       | https://archive.is/XoghM
       | 
       | These days dreamwidth.org is harder to view and interact with
       | than facebook.com if you don't have an account. We really need to
       | stop linking to it and link instead to an archive.is copy or the
       | like.
       | 
       | After the run-around to the archived copy I see mgj is still
       | complaining about people being able to boot in modes other than
       | UEFI. I'm glad these options still exist. Throwing away all the
       | legacy computing options would remove many abilities no longer
       | possible on modern hardware and software stacks.
        
         | mjg59 wrote:
         | > I see mgj is still complaining about people being able to
         | boot in modes other than UEFI
         | 
         | I'm not sure how you get that impression, since I'm mostly
         | talking about what happens before you get to that point. Having
         | the CPU hand off control to the firmware in protected mode
         | doesn't preclude the firmware switching back to real mode.
        
         | cronix wrote:
         | > These days dreamwidth.org is harder to view and interact with
         | than facebook.com if you don't have an account. We really need
         | to stop linking to it and link instead to an archive.is copy or
         | the like.
         | 
         | I've made the suggestion to Dang in the past to just
         | autogenerate an archive.is link for every story/link posted to
         | HN and have that be an "alternate link" after the main one. I
         | think it's kind of silly some people just post an archive link
         | for every post and gets a buttload of points for it as everyone
         | upvotes it which games the system. I think it would also be
         | good in general to preserve the article as it appeared at the
         | time when initially discussed and hasn't been edited, or
         | removed, since.
        
         | masfuerte wrote:
         | It's fine with js disabled.
        
           | superkuh wrote:
           | How do you get through the captcha? It's what blocks me.
        
             | masfuerte wrote:
             | It didn't show me one. FWIW, I also have cookies and third-
             | party domains disabled.
        
       | [deleted]
        
       | rwmj wrote:
       | Intel actually released a variant of the 80386 which booted into
       | protected mode and lacked real mode entirely. It was as far as I
       | know not very successsful:
       | https://en.wikipedia.org/wiki/Intel_80376
        
         | mjg59 wrote:
         | Simultaneously lacking real mode and paging support did kind of
         | restrict it to embedded use cases
        
         | senko wrote:
         | In 1989, people very much used real mode, so it's not
         | surprising this failed. (also, it was for embedded systems only
         | according to that Wikipedia article)
         | 
         | In 2023, not so much.
         | 
         | It's baffling to me this is still supported, when all the other
         | changes in the hardware basically make it impossible to run
         | anything that old on the modern hardware (you can do that in
         | QEMU, but you can then emulate x86 in software fast enough
         | anyway).
        
           | kevin_thibedeau wrote:
           | I boot FreeDOS on a Ryzen system so I can use a parallel port
           | device whose program won't work correctly under 64-bit
           | Dosemu2. it is an EPROM programmer whose timing requirements
           | won't tolerate non-virtualized emulation.
        
             | senko wrote:
             | Thanks for providing an actual and relevant use case.
             | 
             | TBH (and I'm wildly speculating here, I'm not involved in
             | embedded dev at all), if there were no other options, I
             | believe the SW emulation would rise to the challenge to
             | make it workable.
             | 
             | If people can faithfully reproduce behaviours of ages old
             | consoles to make sure the old games' bugs are still
             | preserved, and for free, I'm guessing someone would step up
             | in case of x86 if there was a business need.
             | 
             | But since you can still use actual HW to do that, there
             | isn't any.
        
           | voxadam wrote:
           | What's the actual reason for real mode _still_ being
           | supported on modern processors in this day in age? Why didn
           | 't it die with the advent of AMD64 (aka x86-64)? Why didn't
           | AMD skip real mode and boot directly into something more
           | modern?
        
             | pwg wrote:
             | Most likely because for protected mode to function, there
             | is a certain minimal amount of housekeeping data tables
             | that need to be setup properly (i.e. LDT, GDT, IDT, etc),
             | otherwise you'll just immediately take a double fault and
             | the CPU will halt.
             | 
             | Real mode exists today as a gateway to setting up all those
             | housekeeping data tables so that once the "protected mode"
             | switch is flipped on, the CPU will actually find code to
             | execute.
        
             | toast0 wrote:
             | When AMD64 came out, bios booting was dominant. You need
             | real mode for that.
             | 
             | In today's world, you could probably release an UEFI only
             | cpu and few would notice. But I doubt it would save enough
             | space to make a difference. And you'd open yourself to
             | criticism from those few that still use real mode: this
             | processor is fake, they'd say, because it has no real mode.
        
               | sgjohnson wrote:
               | > this processor is fake, they'd say, because it has no
               | real mode.
               | 
               | They could also claim that it's not PC compatible.
               | Because it literally wouldn't be.
               | 
               | Apple also achieved their dream of the Mac no longer
               | being a PC with the release of M1.
        
             | gpderetta wrote:
             | - There is likely very little to no cost in keeping it in
             | 
             | - There might be even a cost in removing it.
             | 
             | - Complexity is a barrier to entry to any competitor that
             | want to produce compatible CPUs.
        
       | wg0 wrote:
       | This might seem far fetched but a RISC-V take over in a decade or
       | two is imminent even more so looking at the geopolitical vectors
       | and trajectories.
        
       | concerned_ wrote:
       | [flagged]
        
         | yjftsjthsd-h wrote:
         | ? The article very specifically talks about how the process has
         | changed over the years, even to the point of UEFI _not_
         | starting the OS /bootloader in real mode anymore.
        
           | concerned_ wrote:
           | [flagged]
        
             | yjftsjthsd-h wrote:
             | > It does not
             | 
             | It literally does:
             | 
             | >>> For modern UEFI systems, the firmware that's launched
             | from the reset vector then reprograms the CPU into a
             | sensible mode (ie, one without all this segmentation
             | bullshit),
             | 
             | > you could not use this article to boot any Intel CPU.
             | 
             | I mean, the article doesn't contain machine code or
             | assembly listings, but it does a decent job of describing
             | the process.
             | 
             | What is your actual criticism, either of the article or the
             | described CPUs?
        
               | concerned_ wrote:
               | [flagged]
        
       | MichaelZuo wrote:
       | What exactly does "...the CPU is hardcoded to start reading
       | instructions from when power is applied." mean?
       | 
       | How do the hardcoded parts read anything with just an electrical
       | current?
        
         | deepspace wrote:
         | I am not sure what you mean by "just an electrical current".
         | The CPU is typically kept in reset until the clock is stable,
         | so it has a valid clock signal on startup. It is therefore able
         | to start executing the microcode which performs the power-on-
         | reset sequence, i.e. start reading instructions from the reset
         | vector and so on.
        
           | MichaelZuo wrote:
           | > CPU is typically kept in reset until the clock is stable >
           | It is therefore able to start executing the microcode
           | 
           | At the very beginning, what is keeping it in 'reset' and
           | initiating the execution of the microcode? Through what
           | means?
           | 
           | From what I understand, in the first few hundred nanoseconds
           | it's just an electrical current that's flowing through the
           | CPU and nothing else.
        
             | convolvatron wrote:
             | yeah, ok, reset would normally be held low - whether that's
             | the presence or absence of a current isn't important. in
             | the the old days there was usually a hardware power
             | controller that doesn't raise reset until the power is
             | stable. on those machines I think that just took long
             | enough that the clock chain had settled out.
             | 
             | these days its more likely that there is a system
             | controller orchestrating the bringup. potentially waiting
             | for all the voltage converters and the clock generator to
             | settle before raising the reset line. depending on the
             | architecture it may not really be a simple pin, but
             | addressed by the system board controller through the scan
             | network.
        
               | MichaelZuo wrote:
               | How is the system controller initiated? I assume via a
               | simpler process?
        
               | convolvatron wrote:
               | using something like the 'brownout detector' power
               | controller above. I'm pretty sure I've seen designs (and
               | done this myself), that just puts a little RC on the
               | reset pin. 100ms aughta be enough for anyone!
               | microcontrollers (like those used for board controllers)
               | have a much simpler power and clock structure than big
               | cpus...and many of them are built to just come up on
               | their own.
        
       ___________________________________________________________________
       (page generated 2023-04-17 23:02 UTC)