[HN Gopher] Mining JIT traces for missing optimizations with Z3
___________________________________________________________________
Mining JIT traces for missing optimizations with Z3
Author : matt_d
Score : 90 points
Date : 2024-07-20 17:56 UTC (1 days ago)
(HTM) web link (pypy.org)
(TXT) w3m dump (pypy.org)
| adsharma wrote:
| Why not do this at a higher level on the python source itself?
|
| I ask because z3 has been used for type inference (Typpete) and
| for solving equations written in Python.
| ainoobler wrote:
| There is more runtime information in the traces and more
| opportunities for optimization.
| aantix wrote:
| Compile enough traces with the accompanying optimizations and you
| have solid training data for an LLM that can suggest
| optimizations based on JIT traces.
| ainoobler wrote:
| How would you verify semantic correctness of the optimizations?
| aantix wrote:
| I think I envisioned traces being extracted from a series of
| open source projects and their automated test suites.
|
| Run the test suite, identify optimizations. One by one, make
| the the optimization change to the implementation as
| suggested by the LLM.
|
| Instrument the changed methods on the second test run and see
| if runtime performance has changed. Verify that the test
| still passes.
| ainoobler wrote:
| I meant how do you make sure the optimization suggested by
| the AI is actually valid. If you're using AI to modify
| bytecode for faster execution then you have to make sure
| the optimized and unoptimized code are semantically
| equivalent. Neural networks can't do logic so how would you
| know the suggestions were not bogus?
| weebull wrote:
| You've asked the right question, and for those that think
| validation is as simpLe as "run it and see if it gets the
| right result", good start but instruction ordering can be
| critical around multi thread aware data structures.
| Taking a fence out, or an atomic operation might give a
| big performance gain. Trouble is the structure may now go
| wrong 1% of the time.
| aantix wrote:
| A valid accompanying test would ensure this?
|
| You'd be extracting optimization candidates by running
| the test suite.
|
| You re-run the test suite after changes to ensure they
| still pass.
| ainoobler wrote:
| JIT optimizers operate at runtime, there are no test
| suites to verify before/after. It's happening live as the
| code is running so if you use AI then you won't know if
| the optimization is actually valid or not. This is why
| the article is using Z3 instead of neural networks. Z3
| can validate semantic equivalence, neural networks can't.
| fwip wrote:
| Yes, but this Z3 analysis is not done at runtime. It's
| done offline, based on JIT traces. A neural network
| could, in principal, suggest optimizations in the same
| way, which an expert would then review for possible
| inclusion into the Pypy JIT.
| ainoobler wrote:
| You'd still have to write a proof for verifying semantic
| equivalence before implementing the optimization so I
| don't see what the neural network gains you here unless
| it is actually supplying the proof of correctness along
| with the optimization.
| screcth wrote:
| The idea is that the LLM would provide "intuition" to
| guide the optimizer to find better optimizations, but a
| formal proof would be necessary to ensure that those
| optimizations are actually valid.
| fwip wrote:
| I might be incorrect, but I don't believe that most
| compiler optimizations have formal proofs written out
| before implementation. Does Pypy do this?
| derdi wrote:
| Pypy doesn't do this in general. The same Z3 model that
| is used to find these missing optimizations is also used
| to verify some integer optimizations.
|
| But the point is that as long as optimization rules are
| hand-written, a human has thought about them and
| convinced themselves (maybe incorrectly) that the rules
| are correct. If a machine generates them without a human
| in the loop, some other sort of correctness argument is
| needed. Hence the reasonable suggestion that they should
| be formally verified.
| fwip wrote:
| Ah, yes, I meant that the LLM could output suggestions,
| which a human would then think about and convince
| themselves, and only then, implement in Pypy.
| derdi wrote:
| Presumably the LLM would generate a lot of proposed rules
| for humans to wade through. Reviewing lots of proposed
| rewrites while catching all possible errors would be
| tedious and error-prone. We have computers to take care
| of this kind of work.
| cfbolztereick wrote:
| PyPy has formally verified the integer abstract domain
| using Z3, a quite important part of our jit optimizer
| (will write about that in the coming weeks).
|
| We also run a fuzzer regularly to find optimization bugs,
| using Z3 as a correctness check:
|
| https://pypy.org/posts/2022/12/jit-bug-finding-smt-
| fuzzing.h...
|
| The peephole optimizations aren't themselves formally
| verified completely yet. We've verified the very simplest
| rules, and some of the newer complicated ones, but not
| systematically all of them. I plan to work on fully and
| automatically verifying all integer optimizations in the
| next year or so. But we'll see, I'll need to find
| students and/or money.
| lmeyerov wrote:
| Close!
|
| Generate the z3 too - as the need is to verify, not test.
| It can be a direct translation. For all inputs, is the
| optimization output equivalent. (Bootstrapping a compiler
| prototype via LLMs is nice though.)
|
| One place LLMs get fun here is where the direct
| translation to z3 times out, such as bigger or more
| complicated programs, and so the LLM can provide
| intuition for pushing the solver ahead.
| SkiFire13 wrote:
| Tests can't ensure the correctness of an algorithm, only
| that it gives the correct output on a specific input.
| aantix wrote:
| Depends on the comprehensiveness of the test.
| SkiFire13 wrote:
| For any practical input no test is gonna be comprehensive
| enough. Especially for something that has infinite
| possible inputs like programs.
| aantix wrote:
| Is the scope a whole program or a specific algorithm?
| wolf550e wrote:
| regehr et al use alive2 which uses z3
| Twirrim wrote:
| But.. But.. But.... This is HN. You must use AI / LLMs
| for everything! /s
| cfbolztereick wrote:
| All the minimized inefficiencies that are found by my script
| are already optimizations. They just happen to be rather
| specific patterns, so they need to be suitably generalized.
| There's another Regehr etal paper about how to do that
| automatically: https://dl.acm.org/doi/10.1145/3649837
|
| (llm's aren't involved, it's all based on z3)
|
| I don't plan on implementing something like this for now, I'd
| rather take the inefficiencies and manually extract
| optimizations out of them and implement them in PyPy's jit.
| algo_trader wrote:
| You are not wrong
|
| AlphaZero-type systems were unable to significantly improve
| over near-optimal solvers such as z3.
| Zacharias030 wrote:
| Could you point me to some references to learn about his?
| puzzledobserver wrote:
| How does the PyPy JIT compare to the JIT in the upcoming CPython
| 3.13?
| i80and wrote:
| Very different technologies. The PyPy JIT is a tracing JIT
| where hot paths of execution are identified, then that specific
| path is compiled and optimized. Same as LuaJIT.
|
| The CPython JIT is a newer and less invasive technique called
| copy-and-patch[1]. It's a lot less powerful, but a lot easier
| to plug into an existing language implementation: known
| sequences of python bytecode are mapped to templates of machine
| code
|
| [1] https://en.wikipedia.org/wiki/Copy-and-patch
___________________________________________________________________
(page generated 2024-07-21 23:01 UTC)