[HN Gopher] A Simulated Annealing FPGA Placer
___________________________________________________________________
A Simulated Annealing FPGA Placer
Author : stefanpie
Score : 60 points
Date : 2024-01-02 16:46 UTC (6 hours ago)
(HTM) web link (stefanabikaram.com)
(TXT) w3m dump (stefanabikaram.com)
| lindig wrote:
| My understanding of simulated annealing is that solutions that
| are not improvements are still accepted with some probability in
| early steps but that this probability decreases as "temperature"
| drops. Looking at your description (but not code) I did not see
| that aspect but it looked like you would only accept improvements
| of the cost function. Is this correct or where does your solution
| accept slight regressions with some probability, too?
| kwoff wrote:
| In the "Simulated Annealing" section:
|
| "At first, you might want to make moves or swaps over large
| distances or you might want to accept some percent of moves
| that don't improve the objective, but as time goes on ...
|
| However, as it turns out, you technically don't need this
| annealing part to make FPGA placement work. You can just
| randomly try different moves and accept or reject them based on
| whether they improve the objective function. This is what I did
| in my toy implementation of an FPGA placer just to keep it
| simple."
| lindig wrote:
| The argument for annealing in the original paper is that
| accepting regressions is essential to escape from local
| minima.
|
| https://www2.stat.duke.edu/~scs/Courses/Stat376/Papers/Tempe.
| ..
|
| "Annealing, as implemented by the Metropolis procedure,
| differs from iterative improvement in that the procedure need
| not get stuck since transitions out of a local optimum are
| always possible at nonzero temperature. A second and more
| important feature is that a sort of adaptive divide-and-
| conquer occurs. Gross features of the eventual state of the
| system appear at higher tempera-tures; fine details develop
| at lower tem-peratures. This will be discussed with specific
| examples."
| _0ffh wrote:
| Yes, without a temperature and cooling schedule, how can it
| be annealing? It's in the name. It may sound harsh, but I'd
| call it an abuse of the term to do hillclimbing but call it
| annealing. It also seems lazy, since doing it right is an
| all but trivial addition to the code. Finding the best
| cooling schedule might require some experimentation though.
| WanderPanda wrote:
| So obscure that in a field as important as optimization we
| still think in terms of ,,escaping from local minima". Also
| (as a total outsider) the progress in general optimization
| algorithms/implementations appears to be very slow (I was
| shocked how old ipopt is). I was wondering if all the low
| hanging inductive biases (for real world problems) have
| already been exploited or if we just have no good ways of
| expressing them? Maybe learning them from data in a fuzzy
| way might work?
| marcosdumay wrote:
| Unless you come with some new take on the P ?= NP
| problem, there isn't much we can improve on generic
| optimization.
|
| There are all kinds of possibilities for specific
| problems, but if you want something generic, you have to
| traverse the possibility space and use its topology to
| get into an optimum. And if the topology is chaotic, you
| are out of luck, and if it's completely random, there's
| no hope.
| klyrs wrote:
| If you want to keep the pedants at bay, call this "simulated
| annealing at zero temperature." Or just call it a greedy
| algorithm.
| stefanpie wrote:
| Based on the other comments, they are correct in that when
| doing annealing you usually want to accept some moves that do
| lead to regression that might not improve the regression to
| escape early local minimums in your objective.
|
| I abused the definition of annealing a lot in the post but I
| briefly touched on the idea:
|
| "At first, you might want to make moves or swaps over large
| distances or _you might want to accept some percent of moves
| that don 't improve the objective_, but as time goes on, you
| want to make smaller moves and _be less likely to select moves
| that don 't improve the objective_. This is the "annealing"
| part of simulated annealing in the context of FPGA placement."
|
| I think I might have made the writing confusing because I mixed
| the original definition of the annealing approach (of accepting
| moves that don't improve the objective) with the use of
| "annealing" for other things like action parameters (ex. swap
| distance between two nodes). Something I should edit to clarify
| better.
|
| Note that, yes, the thing I implemented doesn't do any
| annealing but rather just pick actions that only improve the
| objective. I am working on some extensions to add real
| annealing but that turned out to have a lot of more in-depth
| technical work that is not obvious.
| cerved wrote:
| Yes, that's the annealing part. Otherwise it's just local
| search
| cevans01 wrote:
| To the author: the source link https://github.com/stefanpie/sa-
| placer appears to be private.
| stefanpie wrote:
| Should be fixed now, thank you for the note!
| iamflimflam1 wrote:
| My final year undergrad project (1992?) used an FPGA. The router
| for that used simulated annealing.
| akivabamberger wrote:
| Wondering how this compares to nextpnr
| (https://github.com/YosysHQ/nextpnr) ?
| stefanpie wrote:
| I believe that nextpnr has integrated both a simulated
| annealing placer and an analytical placer (based on this
| publication from 2019: https://arxiv.org/abs/1903.10407) (see
| here for the work the analytical placer is based on:
| https://ieeexplore.ieee.org/abstract/document/6339278). I think
| they are also working on electrostatic placement but I'm not
| sure what the current status of that is.
| aseipp wrote:
| There is initial electrostatic placement support for iCE40
| and ECP5 in upstream nextpnr at this point, since a few
| months. But there hasn't been a stable release just yet.
| amelius wrote:
| Do any methods exist that utilize deep learning to drive the
| optimization?
| aseipp wrote:
| Yes, see DREAMPlace -- "DREAMPlace: Deep Learning Toolkit-
| Enabled GPU Acceleration for Modern VLSI Placement".[1] The
| technique in particular rather reformulates VLSI placement in
| terms of a non-linear optimization problem. Which is how ML
| frameworks (broadly) work, optimizing approximations to high-
| dimensional non-linear functions. So it's not like, shoving the
| netlist it into an LLM or an existing network or anything.
|
| Note that DREAMPlace is a _global_ placer; it also comes with a
| detail placer but global placement is what it is targeted at. I
| don 't know of an appropriate research analogue for the routing
| phase of the problem that follows placing, but maybe someone
| else does.
|
| [1] https://github.com/limbo018/DREAMPlace
| stefanpie wrote:
| I also want to note that some people are looking at using
| ML/DL to generate placement solutions, even though DREAMPlace
| is still a key recent work in this area that at the core uses
| analytical placement, although cleverly using deep learning
| libraries and features.
|
| For example, the VTR people are looking at RL to pick what
| actions to take during FPGA placement for their simulated
| annealing placer (see:
| https://ieeexplore.ieee.org/document/9415585).
|
| I also know this paper/poster was indexed recently under
| OpenRevew but I'm still waiting for some more implementation
| details to look at:
| https://openreview.net/forum?id=6GR8KqWCWf
|
| FPGA routing I don't think anyone has touched on using ML/DL
| but I do know that there is some talk about using ML/DL
| models with current routing approaches to replace search
| heuristics (think like replacing or augmenting something like
| A*) or do routability predictions. Certainly, there are
| probably many ways to use RL in routing as there are many
| places in current algorithms to intelligently make certain
| heuristic decisions.
|
| Edit: I also want to note that there are a ton of works that
| also use ML/DL to tune the "hyperparameters" of EDA tools
| such as placers and routers. Think ML/DL for back-box non-
| linear optimization.
| amelius wrote:
| > The technique in particular rather reformulates VLSI
| placement in terms of a non-linear optimization problem.
| Which is how ML frameworks (broadly) work, optimizing
| approximations to high-dimensional non-linear functions. So
| it's not like, shoving the netlist it into an LLM or an
| existing network or anything.
|
| Do you mean it uses gradient descent to find the optimum?
| ttul wrote:
| Simulated annealing has been useful for decades. My undergrad
| VLSI project was to lay out a 32-bit CRC chip and test it with
| SPICE (an analog simulation tool). To get the capacitance down,
| you need short wires. But a CRC-32 has many many
| interdependencies. Laying one out by hand would take a very very
| long time.
|
| So my partner wrote the Verilog code for the annealer and I wrote
| a C++ program that attached virtual springs to each gate,
| iteratively moving gates closer together if they were far apart.
| At first, the movements are dramatic, but over time, the total
| length of all gates converges onto an asymptotic limit that is
| much better than the starting point.
|
| Once we had a gate layout implemented in an obscure EDM language,
| we were able to bring it into SPICE and damn right it worked the
| first time! I think the professor was somewhat mystified why we
| didn't just build a simple 4-bit adder instead, but spend 50
| hours on this project was a lot more fun than doing a hand
| layout.
| ttul wrote:
| And here is the layout:
| https://www.dropbox.com/scl/fi/ksxru0lmnp0oha356fdby/layout....
|
| With apologies, it was a CRC-8. Still so many nodes and so many
| wires...
| Taniwha wrote:
| Years ago I figured out how to get the best first approximation
| ... borrow a gym for a day, make 1000s of metal plates with
| hooks and a bar code, one for each gate, have a robot take
| rubber bands and use the netlist to hook up the plates. Then
| pick up the whole thing by the pads and give it a bit of a
| shake, lay it all carefully back down and have the robot go
| around and read the position of each bar code ....
| dekhn wrote:
| I used simulated annealing in my previous work doing molecular
| dynamics- it was great for not getting your protein trapped in a
| local minima, which gradient descent is prone to doing. I got
| excellent results for very little work.
|
| I asked my intern, who was knowledgeable in deep networks as well
| as molecular stuff, "it looks like ML training mainly does
| gradient descent, how can that work, don't you get stuck in local
| minima?" and they said "loss functions in ML are generally
| believed to be bowl-shaped" and I've been wondering how that
| could be true.
|
| It's interesting to read up on the real-world use of annealing
| for steel - it's quite intersting how you can change steel
| properties through heat treatment. Want it really strong? Quench
| it fast, that will lock it into an unstable structuer that's
| still strong. Quench it slow, it will find a more stable minimum,
| and be more ductile.
| pclmulqdq wrote:
| Their explanation is relatively incomplete, I think. The best
| explanation I can come up with regarding why gradent descent
| works is that the optimization space is extremely high-
| dimensional, which has the tendency to force functions that
| would look weird in low dimensions to be more well-behaved.
| Points taken randomly in high-dimensional space also tend to be
| closer together.
|
| The numerical benefits of dimensionality are my belief as to
| why the "stack more layers" crowd has been so successful.
___________________________________________________________________
(page generated 2024-01-02 23:00 UTC)