[HN Gopher] Intel iAPX 432
       ___________________________________________________________________
        
       Intel iAPX 432
        
       Author : sebastianconcpt
       Score  : 39 points
       Date   : 2022-08-10 15:58 UTC (1 days ago)
        
 (HTM) web link (en.wikipedia.org)
 (TXT) w3m dump (en.wikipedia.org)
        
       | nullc wrote:
       | iAPX 432's security features would be welcome in the computing
       | world we have today, I wonder to what extent its failures doomed
       | similar functionality in Intel?
       | 
       | At least there is CHERI now but we still hardly seem close to
       | having hardware enforced capabilities-grade security in high
       | perfomance server kit.
        
         | kps wrote:
         | The i960 MX (nee BiiN) had a similar tagged-memory capability
         | system along with a fairly pleasant RISC instruction set.
        
       | twoodfin wrote:
       | I keep hoping @bcantrill & the Oxide crew will do a Twitter space
       | on the i432, or perhaps failed architectures generally.
        
         | bcantrill wrote:
         | We would love to! Maybe we could convince Robert Colwell to
         | join us, as his paper on the 432 is one of my favorite systems
         | papers of all time![0]
         | 
         | [0] http://dtrace.org/blogs/bmc/2008/07/18/revisiting-the-
         | intel-...
        
           | jaykru wrote:
           | Rob was happy to chat with me about his 432 paper over
           | LinkedIn (cold DM'd him) for a semester project I did on it a
           | few months back. He might go for a podcast episode :) I'd
           | love to listen to it!
        
       | chasil wrote:
       | It is amazing how many failures Intel has survived, and that
       | their core competence really emerged from the Datapoint 2200.
        
       | iforgotpassword wrote:
       | If you're into some light edutainment-style videos, I enjoyed
       | watching RetroBytes recently: https://youtu.be/4o4MXV-d-jQ
        
       | mattst88 wrote:
       | I remember reading an article about the iAPX 432 that went into
       | extensive detail about the compounding effects of the design--I
       | recall it describing how an operation with an small constant
       | operand would be slow because the ISA didn't support immediates,
       | and as a result you'd have to load it from memory, and there was
       | not even a cache to help with that.
       | 
       | Does anyone know this article? I've searched and haven't been
       | able to find it, and it was definitely worth a read.
        
         | twoodfin wrote:
         | I think you want "Performance Effects of Architectural
         | Complexity in the Intel 432"
         | 
         | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14...
        
         | Lammy wrote:
         | Could it be
         | https://homes.cs.washington.edu/~levy/capabook/Chapter9.pdf ?
         | 
         | Sorry for huge quote, but it's from a huge article:
         | 
         | =======================================================
         | 
         | From section 9.2, _Segments and Objects_ :
         | 
         | > All objects are addressed through capabilities which, on the
         | Intel 432, are called accessdescriptors (ADS). (The vendor's
         | terminology is used in this chapter for compatibility with
         | Intel literature. The notation "AD" is used throughout for
         | "capability.")
         | 
         | > At the lowest level, objects are composed of memory segments,
         | and a memory segment is the most fundamental object (called a
         | generic object on the Intel 432). Each Intel 432 segment has
         | two parts: a data part for scalars and an accesspart for ADS,
         | as shown in Figure 9-2. Objects requiring both data and access
         | descriptors can be stored in a single segment. Segments are
         | addressed through ADS, as the figure illustrates. The data part
         | grows upward (in the positive direction) from the boundary
         | between the two parts, while the accesspart grows downward (in
         | the negative direction) from the dividing line. The hardware
         | ensures that only data operations are performed on the data
         | part and that AD operations are performed on the accesspart.
         | 
         | =======================================================
         | 
         | From section 9.4.3, _Instruction Operand Addressing_ :
         | 
         | > At any moment during a procedure's execution, ADS specified
         | by instructions must be located in one of four environment
         | objects. Environment object 0 is the context object itself.
         | Instructions can specify any of the ADS within the context
         | object's accesspart; for example, to refer to the domain or the
         | constants data segment. The three remaining environments,
         | environments 1 through 3, are defined dynamically by the
         | procedure.
         | 
         | > Instruction objects contain only a data part. Because Intel
         | 432 instructions are bit-addressable and can start on arbitrary
         | bit boundaries, instructions are addressed as bit offsets into
         | instruction objects. The first instruction in each instruction
         | object begins at bit displacement 64, following the header of
         | four 16-bit predefined fields. The maximum size of an
         | instruction segment is 64K bits, or 8K bytes, due to the bit
         | addressing. Although there is generally one instruction object
         | for each procedure in the domain, procedures larger than 8K
         | bytes require additional instruction objects. The BRANCH
         | INTERSEGMENT instruction can be used to transfer control to
         | another instruction object within the same domain.
         | 
         | > The four environment segments thus provide efficient
         | addressing of ADS. An instruction can specify an immediate 4-
         | or g-bit access selector describing the location of an AD for
         | an operand. Or, it can specify the location of a 16-bit
         | accessselector located in memory or on the stack. The short
         | direct format efficiently addresses any of the first four ADS
         | in any of the four environments. This includes the ADS for the
         | global constants, context message (calling parameters), and
         | current domain within the current context. All of the
         | processor-defined ADS within the context object's accesspart
         | can be addressed using an 8-bit accessselector.
         | 
         | =======================================================
         | 
         | Unrelated, but I love how they went for the "As Above, So
         | Below" approach for growing the data-vs-access-parts of
         | instruction object memory ^
        
         | linksnapzz wrote:
         | I think I've read the same article, and also wish I had the
         | reference-I do remember that there were no or few registers,
         | and reads were from memory almost all the time..
         | 
         | Also-does anyone know of a an actual system that shipped with a
         | 432? Like, manufacturer and model #?
        
         | kps wrote:
         | > the ISA didn't support immediates
         | 
         | I don't know the article, but have a related story. In the '90s
         | I worked for a custom compiler shop, and a company you've heard
         | of (not Intel) came to us with a system they wanted tools for.
         | They had gone all-in on RISC -- operations were all register-
         | to-register, and the only memory addressing was register
         | indirect (i.e. through an address in a register). We had to
         | point out that it would be rather difficult to get an address
         | into a register in the first place.
        
           | andrewf wrote:
           | Could you do it with shifts and increments? Constant loads
           | would look just like multiplies, a glorious RISC apotheosis..
        
             | kps wrote:
             | Yes, you could get 0 by subtracting (or xoring) a register
             | with itself, then -1 by complementing, then 1 by negating,
             | then adding to itself to get any single bit. Then
             | synthesize any constant by adding those. The code would be
             | impractically slow and large, though.
        
       | kabdib wrote:
       | I was taking a VLSI design course in 1981, and the professor
       | teaching it proudly showed off some 432 chips (embedded in
       | plastic) that he'd been given. He waxed lyrically about them, how
       | the big boys were doing silicon in Silly Valley. (We, with our
       | colored pencils, were learning how to do NAND gates and full
       | adders in NMOS, on graph paper).
       | 
       | Later, I read Organick's book on the 432. It was kind of a mess,
       | no idea how they expected the thing to perform.
       | 
       | This was also back when ADA was the up-and-coming language, which
       | the 432 was going to run really well (if you believed the
       | marketing). ADA was pretty intimidating, as it was complicated
       | for the time and generics seemed to scare everyone. (Little did
       | we know that C++ was going to be a thing in a decade or so, and
       | it made ADA seem simple in comparison).
        
         | chasil wrote:
         | ADA evolved into the procedural scripting syntax of many SQL
         | databases.
         | 
         | "SQL/PSM is derived, seemingly directly, from Oracle's PL/SQL.
         | Oracle developed PL/SQL and released it in 1991, basing the
         | language on the US Department of Defense's Ada programming
         | language... IBM's SQL PL (used in DB2) and Mimer SQL's PSM were
         | the first two products officially implementing SQL/PSM. It is
         | commonly thought that these two languages, and perhaps also
         | MySQL/MariaDB's procedural language, are closest to the SQL/PSM
         | standard. However, a PostgreSQL addon implements SQL/PSM
         | (alongside its other procedural languages like the PL/SQL-
         | derived plpgsql), although it is not part of the core product."
         | 
         | https://en.wikipedia.org/wiki/SQL/PSM
        
       | PAPPPmAc wrote:
       | The first of Intel's many expensive lessons about the problems
       | with extremely complicated ISAs dependent on even more
       | sophisticated compilers making good static decisions for
       | performance. Then they did it again with the i860. Then they did
       | it again with Itanium.
        
         | sytse wrote:
        
           | speps wrote:
           | Why would you do that without giving credit?
        
             | sytse wrote:
             | Good idea, I added
             | https://twitter.com/sytses/status/1557803849041072128
        
             | bobloblaw724449 wrote:
             | It's fine, it's Sid (he's a good guy).
        
               | generalizations wrote:
               | Except when he palms off other people's ideas as his own.
        
               | bobloblaw724449 wrote:
               | It's only an HN comment and I don't see why it honestly
               | matters. At the end of the day, more people will see his
               | tweet and learn about these failed architectures then
               | some random comment on some random HN post. Significantly
               | more people read twitter than HN.
               | 
               | The way you're reacting to this is like it's 2007 and he
               | stole the blueprints to the iPhone.
        
         | bri3d wrote:
         | iAPX 432 was sort of a different failure from i860 and Itanium,
         | no? My understanding is that the issue with iAPX 432 was that
         | the architecture provided object-oriented instructions, but
         | they turned out to be slow in practice, and the compiler didn't
         | know how slow they were, so it abused them in situations where
         | they should have used scalar ops instead, and that in tandem,
         | the ABI relied too heavily on pass-by-value. Basically, that
         | the iAPX was explained to compiler authors as an object-
         | oriented CPU, when it should have been treated as a CPU with
         | object-oriented extensions.
         | 
         | Whereas i860 and Itanium were just trying to shoehorn VLIW into
         | general-purpose computing, which is generally incredibly
         | challenging. VLIW is great for places like DSP, where you have
         | a defined real-time stream of data and limited context
         | switching. In this case, you can use the spare die space you
         | didn't spend on dispatch, prediction, and retirement on more
         | MACs or ALUs or vectors, and the compiler can accurately
         | predict the latency of a given operation because the source is
         | defined. Fundamentally, compiler scheduling is intractable in a
         | multiuser or task switching environment, because you have _no
         | idea_ what will be in cache ahead of runtime and always end up
         | with the i860/Itanium problem, where you stall your entire
         | execution pipeline every time you miss cache unexpectedly.
        
         | bombcar wrote:
         | Have we (finally) realized the dream? By basically putting the
         | "smart" part of the compiler in the chip itself, or do we still
         | run relatively simple ISAs?
        
           | PAPPPmAc wrote:
           | I argue about this a lot. Some reasonably substantiated
           | opinions:
           | 
           | 1. Highly sophisticated large-scale static analysis keeps
           | getting beaten by relatively stupid tricks built into
           | overgrown instruction decoders, working on relatively narrow
           | windows of instructions.
           | 
           | 2. The primary reason for (1) is that performance is now
           | almost completely dominated by memory behavior, and making
           | good static predictions about the dynamic behavior of fancy
           | memory systems in the face of multitasking, DRAM refresh
           | cycles, multiple independent devices competing for the memory
           | bus, layers of caches, timing variations, etc. is essentially
           | impossible.
           | 
           | 3. You can give up on a bunch of your dynamic tricks and
           | build much simpler more predictable systems that can be
           | statically optimized effectively. You could probably find an
           | good local maxima in that style. The dynamic tricks are,
           | however, unreasonably effective for performance, and have the
           | advantage that they let you have good performance with the
           | same binaries on multiple different implementations of an
           | ISA. That's not insurmountable (eg. the AOT compilation for
           | ART objects on Android), but the ecosystem isn't fully set up
           | to support that kind of thing.
        
           | AnimalMuppet wrote:
           | By putting it on the chip, it can be dynamic rather than
           | static. The microcode can know a lot more of what's going on
           | than the compiler can.
        
       ___________________________________________________________________
       (page generated 2022-08-11 23:01 UTC)