[HN Gopher] My first superoptimizer
___________________________________________________________________
My first superoptimizer
Author : azhenley
Score : 73 points
Date : 2023-05-29 20:47 UTC (2 hours ago)
(HTM) web link (austinhenley.com)
(TXT) w3m dump (austinhenley.com)
| petercooper wrote:
| The 1987 paper about superoptimizers is well worth the read, BTW.
| It's quite short. Even the first example on its own (writing the
| sign function without any conditionals) is a mind-bender but
| becomes straightforward once you see the trick used:
| https://courses.cs.washington.edu/courses/cse501/15sp/papers...
| c7DJTLrn wrote:
| I've found GPT to be pretty good at cleaning up code. I've pasted
| in isolated blocks of code, asked it to tidy it up, and it often
| has nice suggestions.
|
| There's a feature coming to Copilot where you can select code in-
| editor and prompt on it directly.
| UncleMeat wrote:
| Superoptimization has absolutely nothing to do with what GPT is
| doing. It doesn't happen in source and produces actually
| optimal code sequences (against some cost model) rather than
| doing some refactoring or tidying.
| klyrs wrote:
| I'd hate to suggest that the comment about cleaning up code
| with GPT was written by GPT and posted without that crucial
| editing step. But the number of people who are using that
| tool instead of their wetware is certainly growing.
| c7DJTLrn wrote:
| GPT's knowledge isn't up to date enough to know about the
| feature I mentioned. No, it wasn't written by GPT. It
| wasn't clear to me from the post that superoptimisation
| pertains to optimisation of machine code only.
| klyrs wrote:
| It doesn't pertain to the optimization of machine code
| only, but the distinguishing feature between optimization
| and superoptimization is that superoptimization produces
| an absolute shortest possible program given an
| instruction set. GPT doesn't do that. Copilot won't do
| that. Whatever cleanup feature you're describing is a
| non-sequitur here.
| ipnon wrote:
| Jetbrains was hiring mathematicians working on homotopy type
| theory to do something similar if I recall correctly. The idea is
| that HTT can be used to generate equivalent programs to that
| which is in your buffer and do something like superoptimization.
| Maybe I am hallucinating.
| lambdaxymox wrote:
| I don't know about superoptimization, but JetBrains does have a
| HoTT-based language and proof assistant called Arend for doing
| mathematics in.
| twoodfin wrote:
| I'm curious: Has there been any research on optimization in the
| opposite direction? Start with the most naively implemented
| correct function--a giant switch/nested if covering all possible
| inputs--and transform it piecewise in a correctness-preserving
| manner?
| boothby wrote:
| I've done this. In my experience it ends up stuck in local
| minima from which there are no obvious correctness-preserving
| improvements. Add Monte Carlo to escape valleys and this mode
| of exploration is still too timid to find anything close to
| optimal.
|
| When you find a truly optimal solution, you end up with the
| least satisfying proof of correctness: "Here's this this
| inscrutable formula; I verified that it produces the right
| output for all possible inputs."
| UncleMeat wrote:
| Mostly yes. There is a lot of research on taking arbitrary
| programs and distilling constraints and invariants out of them
| or doing various forms of function simplification.
| retrac wrote:
| A similar problem comes up in digital design. Behaviour is
| often specified in terms of exhaustive truth tables, perhaps
| tables that were generated automatically, from a hardware
| description language. While such a table can be directly
| translated into a circuit, doing so in a naive fashion is
| potentially extremely non-optimal. Optimizing it is a hard
| problem. Everything has been tried, from exhaustive search
| (suitable only for small numbers of terms), to rule-based fuzzy
| backtracking searches, to in recent years, deep learning. But
| at least when I was in school a couple decades ago, the best
| known algorithm was still a person playing with the terms on
| graph paper, rearranging them until patterns emerged:
| https://en.wikipedia.org/wiki/Karnaugh_map
|
| As I understand it, there are recent advances in SAT solvers
| giving practical algorithms for much larger numbers of terms,
| but I'm much behind the state of the art at this point.
| gxt wrote:
| There are surely ways to further constraint the set of programs
| to test, ie anything that output something not in the expected
| outcome should probably be an early return. I'd love for more
| work to be poured into this concept. There might be low hanging
| fruits and who knows an LLVM pass somewhere not to far in the
| distant future.
| jakewins wrote:
| I think you would enjoy applying dynamic programming to this
| problem!
|
| I have a simple example, starting from a "generate all possible
| combinations" python function, here: https://tech.davis-
| hansson.com/p/price-wars.html
| jmholla wrote:
| Why did you use the term dynamic programming programming when
| referring to the memorization part? The term of art, dynamic
| programming, is at play here but refers to the initial design
| of breaking up the problem into recursive sub-problems.
| EliasWatson wrote:
| It would be interesting to modify this for optimizing BrainF
| programs. It has a very simple instruction set and there are lots
| of example programs available.
|
| To avoid generating useless things like "++--", you could have
| the optimizer generate instructions that are translated to BrainF
| operations. So instead of ">>>--", the optimizer would generate
| "Move +3; Add -2".
| shoo wrote:
| there might not be that many opportunities to optimize
| brainfuck programs as the instruction set of the brainfuck
| virtual machine is so simple. arithmetic is unary, and staying
| within the brainfuck virtual machine doesn't provide a way to
| do any better than unary arithmetic.
|
| on another hand, there are many opportunities for optimization
| when compiling brainfuck to an actual real world instruction
| set
|
| e.g. suppose in your program you want to load the integer 36
| into the current cell
|
| a naive way to do this is to increment 36 times, e.g.
| `++++++++++++++++++++++++++++++++++++`.
|
| if you can't guarantee that the initial value of the current
| cell is zero, then you could zero it first:
| `[-]++++++++++++++++++++++++++++++++++++`
|
| if we're optimizing for code size, not instruction count, we
| could express this as a more complex, compact program:
| `[-]>[-]++++++[-<++++++>]<` -- assuming it is OK to use the
| cell one to the right of the pointer as working storage. this
| will run slower than the less compact naive program as it
| executes more instructions.
|
| if compiling to some reasonable instruction set, there's likely
| a single non-BF instruction that lets you load the constant 36
| into a register.
|
| extra credit: implementing the optimising brainfuck-to-
| something compiler in brainfuck itself
| Xeoncross wrote:
| Have superoptimizers been run against math formulas? Seems like
| the quickest way to find holes in modern encryption.
| mungoman2 wrote:
| Happy to see this! Superoptimizers are one of my all time
| favorite topics.
|
| My favorite trick when cutting down the search space of possible
| programs is to generate DAGs directly instead of an instruction
| sequence. This way, it's easier to write rules that avoid
| nonsensical programs and to some extent avoid generating multiple
| equivalent programs.
|
| For example to avoid programs that overwrite calculations you
| make sure that each node in the generated DAG has something that
| depends on it.
|
| In a suitable environment the DAG can then be executed directly
| without lowering to a sequence of instructions.
| shoo wrote:
| This sounds a little bit like representing boolean functions as
| binary decision diagrams -- although for a superoptimizer
| presumably each node in the DAG would correspond to an
| instruction from the target instruction set, which might do
| something rather complicated, not just a primitive logical
| operation between a few bits.
|
| https://en.wikipedia.org/wiki/Binary_decision_diagram
|
| (i first read about BDDs in Knuth's, "The Art of Computer
| Programming", Volume 4, Fascicle 1, then spent an enjoyable few
| weeks going down a rabbit hole of building a boolean function
| to decide if an input 2d grid of bits was a maze or not, then
| uniformly sampling mazes from it...)
| [deleted]
| manasij7479 wrote:
| The pruning part can do a lot of heavy lifting to make it a
| practical tool.
|
| Related: I work on Souper (https://github.com/google/souper).
|
| Feel free to reach out if anyone has questions!
| whiterock wrote:
| This reminds of an undergrad paper I once wrote in which I showed
| that binary functions in n variables need asymptotically not more
| (and seldom less) than 2^n/n boolean gates.
| viraptor wrote:
| If anyone's interested in non-toy usage, there's the souper
| project as well https://github.com/google/souper
|
| It's a shame it's a bit hard to actually build these days. Its
| target is people working on compilers, not end users. But there
| are some odd cases where I'd really like to use it myself, like
| trying to get speedup in llama.cpp.
| fooker wrote:
| That's an interesting idea.
|
| Are llama.cpp (and similar targets) made into standalone C++
| source files without dependencies doing the heavy lifting?
| harerazer wrote:
| This is essentially calculating the Kolmogorov complexity of
| whatever the final state of memory wrt to the constrained
| assembly (and giving the associated program). Since the any
| program in the constrained assembly always halts, this is
| possible a la brute force.
|
| It also doesn't seem particularly interesting because it doesn't
| allow the programs to get input. Obviously that makes things much
| more difficult wrt to proving program equivalence.
| manasij7479 wrote:
| Input's and other side effects are not too tricky to handle.
|
| In most cases, you just slice the program to isolate pure
| computation, and just optimize that.
|
| Most traditional compiler optimizations stick to that as well,
| the exceptions to this rule are carefully engineered.
| worldsayshi wrote:
| Seems like the superoptimizer search graph would be a very good
| subject for machine learning algorithms? Are where any good AI
| superoptimizers out there?
| divan wrote:
| How about "cognitive superoptimizer" - find the optimal version
| of the code for human comprehension?
|
| First try - get every permutation, feed to some stable code-
| oriented LLM, ask to explain code and measure number of words.
___________________________________________________________________
(page generated 2023-05-29 23:00 UTC)