[HN Gopher] Automated CPU Design with AI
       ___________________________________________________________________
        
       Automated CPU Design with AI
        
       Author : skilled
       Score  : 61 points
       Date   : 2023-07-02 20:59 UTC (2 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | dooglius wrote:
       | Doesn't seem to be any discussion of what the inputs and outputs
       | actually are here, at least for the "coarse-grained" approach.
       | Suspect there is some "scaffolding" around e.g. register map and
       | memory access, and the rest is essentially learning a map from
       | (instruction, register input vals)->(register output val, control
       | registers for memory access)
        
       | westurner wrote:
       | From the abstract; "Pushing the Limits of Machine Design:
       | Automated CPU Design with AI" (2023)
       | https://arxiv.org/abs/2306.12456 :
       | 
       | > _[...] This approach generates the circuit logic, which is
       | represented by a graph structure called Binary Speculation
       | Diagram (BSD), of the CPU design from only external input-output
       | observations instead of formal program code. During the
       | generation of BSD, Monte Carlo-based expansion and the distance
       | of Boolean functions are used to guarantee accuracy and
       | efficiency, respectively. By efficiently exploring a search space
       | of unprecedented size 10^{10^{540}}, which is the largest one of
       | all machine-designed objects to our best knowledge, and thus
       | pushing the limits of machine design, our approach generates an
       | industrial-scale RISC-V CPU within only 5 hours. The taped-out
       | CPU successfully runs the Linux operating system and performs
       | comparably against the human-designed Intel 80486SX CPU. In
       | addition to learning the world 's first CPU only from input-
       | output observations, which may reform the semiconductor industry
       | by significantly reducing the design cycle, our approach even
       | autonomously discovers human knowledge of the von Neumann
       | architecture._
       | 
       | The von Neumann (and Mark) architectures have an instruction
       | pipeline bottleneck maybe by design for serial debuggability; as
       | compared with IDK in-RAM computing with existing RAM geometries?
       | (See also: "Rowhammer for qubits")
       | 
       | (Edit: High-Bandwidth Memory; hbm2e vs gddr6x (2023)
       | https://en.wikipedia.org/wiki/High_Bandwidth_Memory )
       | 
       | Hopefully part of the fitness function is determined by the
       | presence and severity of hardware side channels and electron
       | tunneling; does it filter out candidate designs with side-channel
       | vulnerabilities (that are presumed undetectable with TLA+)?
        
         | westurner wrote:
         | And then maybe someday design reconfigurable - probably modular
         | - semiconductor fabrication facility to produce the design(s)?
        
       | xeonmc wrote:
       | Pentium FDIV bug, round two incoming.
        
       | brucethemoose2 wrote:
       | > The implemented program is executed on a Linux cluster
       | including 68 servers, each of which is equipped with 2 Intel Xeon
       | Gold 6230 CPUs.
       | 
       | > We verify our output netlist on the FPGAs and tape out the chip
       | with 65nm technology. The automatically designed CPU was sent to
       | the manufacturer in December 2021.
        
       | granthamb wrote:
       | It wasn't clear to me that they had implemented a page table (I
       | think that's the S extensions?) which I would think would make
       | the I/O space much more complex and difficult to represent. Lack
       | of VA translation would make this CPU much less comparable to a
       | 486SX.
        
       | behnamoh wrote:
       | Wasn't Google Tensor already designed by one of Google's AIs? I
       | remember it made a big deal because people thought Google could
       | improve their chips much faster than the competition.
        
         | rowanG077 wrote:
         | That was just placement not abstract circuit design.
        
       | optimalsolver wrote:
       | Could some really alien CPU architectures be discovered with this
       | method?
       | 
       | Just wondering how far from human design-space you could end up
       | with this.
        
         | ninkendo wrote:
         | Silicon validation is a huge part of the overall cost of
         | bringing up a chip, because it's so important that the physical
         | hardware do what it's supposed to do. So it's gonna be limited
         | to behaving exactly as the validation specifies, which likely
         | will limit how "alien" it will actually be.
        
       | amelius wrote:
       | I didn't read the paper but judging from the abstract it's
       | probably a technique for design space exploration.
       | 
       | I.e., they manually designed the CPU but left a (large) number of
       | parameters open, then used AI to find an optimum for those
       | parameters.
       | 
       | So anything the AI did was completely correctness-preserving.
       | 
       | Note that this may sound like it's a small achievement, but keep
       | in mind that for modern CPUs the search of the design space is
       | hugely important, and probably the reason for the success of e.g.
       | Apple's M1.
        
       | bsder wrote:
       | I find this paper _extremely_ suspicious.
       | 
       | If this _actually_ worked, it should be able to cough up a 6502,
       | 6809, 8051, etc. as well since they are so much simpler--
       | especially since they even mention a Commodore 64.
       | 
       | The fact that they don't do this stinks very strongly. There are
       | other concerning signs in the paper as well.
        
         | staunton wrote:
         | Why should it produce those designs? Are they in any known
         | sense optimal?
        
           | bsder wrote:
           | > Why should it produce those designs? Are they in any known
           | sense optimal?
           | 
           | Yes. 6502 was quite cheap for the day so is much more optimal
           | for cost than most designs. The 6809 was done fixing the
           | mistakes of the 6800 and it's implementation is much more
           | orthogonal. The 6800 and 8051 are probably the best
           | documented. All of them have extremely long lived tool chains
           | and support. Pick your optimality.
           | 
           | In addition, then "Why should it produce a RISC-V design?"
           | RISC-V is definitely sub-optimal on quite a few fronts.
           | 
           | If a system is doing actual _CPU design_ , as claimed by the
           | paper, those designs (6502, 6809, 8051) are a simple sanity
           | check. The designs are extensively documented to the point
           | that we have web pages that simulate them down to the
           | transisitor. You should be able to provide a "relatively"
           | small input and get back a compatible design as an output. A
           | 6502 has only 3500 or so transistors. That's on the order of
           | the complexity they claim in the paper.
           | 
           | This would prevent someone like me from saying: "You
           | basically stuffed a RISC-V design into the training set,
           | managed to launder it through ML/AI to get the computer to
           | cough it back up, then deployed a legion of humans to patch
           | the result suffciently that it could be called "Linux
           | compatible", and finally barfed out a publication with 6
           | pages of link references in a 12 page paper."
           | 
           | Here's the touchstone for whether AI is doing chip design:
           | "When AI can distinguish between control plane and datapath
           | and _synthesize and place them differently_ , AI is doing
           | actual design."
        
           | sitkack wrote:
           | I don't think any reviewer of the paper would ask why they
           | didn't use one the processors mentioned.
           | 
           | I think of lots of reasons to do it with a riscv
           | * lots of excellent simulators and emulators         * great
           | tool chains         * both software (Verilog, VHDL)
           | implementations as well as hardware         * regular,
           | compact instruction set (no condition codes)
           | 
           | Using anything _besides_ RISC-V would have been an order of
           | magnitude harder.
        
       ___________________________________________________________________
       (page generated 2023-07-02 23:00 UTC)