[HN Gopher] A Simulated Annealing FPGA Placer
       ___________________________________________________________________
        
       A Simulated Annealing FPGA Placer
        
       Author : stefanpie
       Score  : 60 points
       Date   : 2024-01-02 16:46 UTC (6 hours ago)
        
 (HTM) web link (stefanabikaram.com)
 (TXT) w3m dump (stefanabikaram.com)
        
       | lindig wrote:
       | My understanding of simulated annealing is that solutions that
       | are not improvements are still accepted with some probability in
       | early steps but that this probability decreases as "temperature"
       | drops. Looking at your description (but not code) I did not see
       | that aspect but it looked like you would only accept improvements
       | of the cost function. Is this correct or where does your solution
       | accept slight regressions with some probability, too?
        
         | kwoff wrote:
         | In the "Simulated Annealing" section:
         | 
         | "At first, you might want to make moves or swaps over large
         | distances or you might want to accept some percent of moves
         | that don't improve the objective, but as time goes on ...
         | 
         | However, as it turns out, you technically don't need this
         | annealing part to make FPGA placement work. You can just
         | randomly try different moves and accept or reject them based on
         | whether they improve the objective function. This is what I did
         | in my toy implementation of an FPGA placer just to keep it
         | simple."
        
           | lindig wrote:
           | The argument for annealing in the original paper is that
           | accepting regressions is essential to escape from local
           | minima.
           | 
           | https://www2.stat.duke.edu/~scs/Courses/Stat376/Papers/Tempe.
           | ..
           | 
           | "Annealing, as implemented by the Metropolis procedure,
           | differs from iterative improvement in that the procedure need
           | not get stuck since transitions out of a local optimum are
           | always possible at nonzero temperature. A second and more
           | important feature is that a sort of adaptive divide-and-
           | conquer occurs. Gross features of the eventual state of the
           | system appear at higher tempera-tures; fine details develop
           | at lower tem-peratures. This will be discussed with specific
           | examples."
        
             | _0ffh wrote:
             | Yes, without a temperature and cooling schedule, how can it
             | be annealing? It's in the name. It may sound harsh, but I'd
             | call it an abuse of the term to do hillclimbing but call it
             | annealing. It also seems lazy, since doing it right is an
             | all but trivial addition to the code. Finding the best
             | cooling schedule might require some experimentation though.
        
             | WanderPanda wrote:
             | So obscure that in a field as important as optimization we
             | still think in terms of ,,escaping from local minima". Also
             | (as a total outsider) the progress in general optimization
             | algorithms/implementations appears to be very slow (I was
             | shocked how old ipopt is). I was wondering if all the low
             | hanging inductive biases (for real world problems) have
             | already been exploited or if we just have no good ways of
             | expressing them? Maybe learning them from data in a fuzzy
             | way might work?
        
               | marcosdumay wrote:
               | Unless you come with some new take on the P ?= NP
               | problem, there isn't much we can improve on generic
               | optimization.
               | 
               | There are all kinds of possibilities for specific
               | problems, but if you want something generic, you have to
               | traverse the possibility space and use its topology to
               | get into an optimum. And if the topology is chaotic, you
               | are out of luck, and if it's completely random, there's
               | no hope.
        
           | klyrs wrote:
           | If you want to keep the pedants at bay, call this "simulated
           | annealing at zero temperature." Or just call it a greedy
           | algorithm.
        
         | stefanpie wrote:
         | Based on the other comments, they are correct in that when
         | doing annealing you usually want to accept some moves that do
         | lead to regression that might not improve the regression to
         | escape early local minimums in your objective.
         | 
         | I abused the definition of annealing a lot in the post but I
         | briefly touched on the idea:
         | 
         | "At first, you might want to make moves or swaps over large
         | distances or _you might want to accept some percent of moves
         | that don 't improve the objective_, but as time goes on, you
         | want to make smaller moves and _be less likely to select moves
         | that don 't improve the objective_. This is the "annealing"
         | part of simulated annealing in the context of FPGA placement."
         | 
         | I think I might have made the writing confusing because I mixed
         | the original definition of the annealing approach (of accepting
         | moves that don't improve the objective) with the use of
         | "annealing" for other things like action parameters (ex. swap
         | distance between two nodes). Something I should edit to clarify
         | better.
         | 
         | Note that, yes, the thing I implemented doesn't do any
         | annealing but rather just pick actions that only improve the
         | objective. I am working on some extensions to add real
         | annealing but that turned out to have a lot of more in-depth
         | technical work that is not obvious.
        
         | cerved wrote:
         | Yes, that's the annealing part. Otherwise it's just local
         | search
        
       | cevans01 wrote:
       | To the author: the source link https://github.com/stefanpie/sa-
       | placer appears to be private.
        
         | stefanpie wrote:
         | Should be fixed now, thank you for the note!
        
       | iamflimflam1 wrote:
       | My final year undergrad project (1992?) used an FPGA. The router
       | for that used simulated annealing.
        
       | akivabamberger wrote:
       | Wondering how this compares to nextpnr
       | (https://github.com/YosysHQ/nextpnr) ?
        
         | stefanpie wrote:
         | I believe that nextpnr has integrated both a simulated
         | annealing placer and an analytical placer (based on this
         | publication from 2019: https://arxiv.org/abs/1903.10407) (see
         | here for the work the analytical placer is based on:
         | https://ieeexplore.ieee.org/abstract/document/6339278). I think
         | they are also working on electrostatic placement but I'm not
         | sure what the current status of that is.
        
           | aseipp wrote:
           | There is initial electrostatic placement support for iCE40
           | and ECP5 in upstream nextpnr at this point, since a few
           | months. But there hasn't been a stable release just yet.
        
       | amelius wrote:
       | Do any methods exist that utilize deep learning to drive the
       | optimization?
        
         | aseipp wrote:
         | Yes, see DREAMPlace -- "DREAMPlace: Deep Learning Toolkit-
         | Enabled GPU Acceleration for Modern VLSI Placement".[1] The
         | technique in particular rather reformulates VLSI placement in
         | terms of a non-linear optimization problem. Which is how ML
         | frameworks (broadly) work, optimizing approximations to high-
         | dimensional non-linear functions. So it's not like, shoving the
         | netlist it into an LLM or an existing network or anything.
         | 
         | Note that DREAMPlace is a _global_ placer; it also comes with a
         | detail placer but global placement is what it is targeted at. I
         | don 't know of an appropriate research analogue for the routing
         | phase of the problem that follows placing, but maybe someone
         | else does.
         | 
         | [1] https://github.com/limbo018/DREAMPlace
        
           | stefanpie wrote:
           | I also want to note that some people are looking at using
           | ML/DL to generate placement solutions, even though DREAMPlace
           | is still a key recent work in this area that at the core uses
           | analytical placement, although cleverly using deep learning
           | libraries and features.
           | 
           | For example, the VTR people are looking at RL to pick what
           | actions to take during FPGA placement for their simulated
           | annealing placer (see:
           | https://ieeexplore.ieee.org/document/9415585).
           | 
           | I also know this paper/poster was indexed recently under
           | OpenRevew but I'm still waiting for some more implementation
           | details to look at:
           | https://openreview.net/forum?id=6GR8KqWCWf
           | 
           | FPGA routing I don't think anyone has touched on using ML/DL
           | but I do know that there is some talk about using ML/DL
           | models with current routing approaches to replace search
           | heuristics (think like replacing or augmenting something like
           | A*) or do routability predictions. Certainly, there are
           | probably many ways to use RL in routing as there are many
           | places in current algorithms to intelligently make certain
           | heuristic decisions.
           | 
           | Edit: I also want to note that there are a ton of works that
           | also use ML/DL to tune the "hyperparameters" of EDA tools
           | such as placers and routers. Think ML/DL for back-box non-
           | linear optimization.
        
           | amelius wrote:
           | > The technique in particular rather reformulates VLSI
           | placement in terms of a non-linear optimization problem.
           | Which is how ML frameworks (broadly) work, optimizing
           | approximations to high-dimensional non-linear functions. So
           | it's not like, shoving the netlist it into an LLM or an
           | existing network or anything.
           | 
           | Do you mean it uses gradient descent to find the optimum?
        
       | ttul wrote:
       | Simulated annealing has been useful for decades. My undergrad
       | VLSI project was to lay out a 32-bit CRC chip and test it with
       | SPICE (an analog simulation tool). To get the capacitance down,
       | you need short wires. But a CRC-32 has many many
       | interdependencies. Laying one out by hand would take a very very
       | long time.
       | 
       | So my partner wrote the Verilog code for the annealer and I wrote
       | a C++ program that attached virtual springs to each gate,
       | iteratively moving gates closer together if they were far apart.
       | At first, the movements are dramatic, but over time, the total
       | length of all gates converges onto an asymptotic limit that is
       | much better than the starting point.
       | 
       | Once we had a gate layout implemented in an obscure EDM language,
       | we were able to bring it into SPICE and damn right it worked the
       | first time! I think the professor was somewhat mystified why we
       | didn't just build a simple 4-bit adder instead, but spend 50
       | hours on this project was a lot more fun than doing a hand
       | layout.
        
         | ttul wrote:
         | And here is the layout:
         | https://www.dropbox.com/scl/fi/ksxru0lmnp0oha356fdby/layout....
         | 
         | With apologies, it was a CRC-8. Still so many nodes and so many
         | wires...
        
         | Taniwha wrote:
         | Years ago I figured out how to get the best first approximation
         | ... borrow a gym for a day, make 1000s of metal plates with
         | hooks and a bar code, one for each gate, have a robot take
         | rubber bands and use the netlist to hook up the plates. Then
         | pick up the whole thing by the pads and give it a bit of a
         | shake, lay it all carefully back down and have the robot go
         | around and read the position of each bar code ....
        
       | dekhn wrote:
       | I used simulated annealing in my previous work doing molecular
       | dynamics- it was great for not getting your protein trapped in a
       | local minima, which gradient descent is prone to doing. I got
       | excellent results for very little work.
       | 
       | I asked my intern, who was knowledgeable in deep networks as well
       | as molecular stuff, "it looks like ML training mainly does
       | gradient descent, how can that work, don't you get stuck in local
       | minima?" and they said "loss functions in ML are generally
       | believed to be bowl-shaped" and I've been wondering how that
       | could be true.
       | 
       | It's interesting to read up on the real-world use of annealing
       | for steel - it's quite intersting how you can change steel
       | properties through heat treatment. Want it really strong? Quench
       | it fast, that will lock it into an unstable structuer that's
       | still strong. Quench it slow, it will find a more stable minimum,
       | and be more ductile.
        
         | pclmulqdq wrote:
         | Their explanation is relatively incomplete, I think. The best
         | explanation I can come up with regarding why gradent descent
         | works is that the optimization space is extremely high-
         | dimensional, which has the tendency to force functions that
         | would look weird in low dimensions to be more well-behaved.
         | Points taken randomly in high-dimensional space also tend to be
         | closer together.
         | 
         | The numerical benefits of dimensionality are my belief as to
         | why the "stack more layers" crowd has been so successful.
        
       ___________________________________________________________________
       (page generated 2024-01-02 23:00 UTC)