[HN Gopher] Diffusion on syntax trees for program synthesis
___________________________________________________________________
Diffusion on syntax trees for program synthesis
Author : pouwerkerk
Score : 377 points
Date : 2024-06-04 00:52 UTC (22 hours ago)
(HTM) web link (tree-diffusion.github.io)
(TXT) w3m dump (tree-diffusion.github.io)
| sakras wrote:
| This is very cool! My first thought is: can this be applied to
| converting raster graphics to vector graphics (eg PNG to SVG)?
| Seems like a very similar problem, though probably much more
| computationally expensive.
| szvsw wrote:
| > We apply our approach to inverse graphics tasks, where our
| model learns to convert images into programs that produce those
| images.
|
| I would argue that at least on a philosophical level, this is,
| definitionally, the process of converting raster graphics to
| vector graphics, as long as you by the premise that the
| difference between the two is simply that vector gfx is a
| programmatic/imperative representation of image generation,
| while raster is a data structure/declarative representation of
| images.
|
| In other words, simply put, raster images are just arrays,
| vector images are sequences of instructions. Raster images are
| naturally much closer to the "raw" data needed by the output
| mechanism, while vector images require a much more complex
| interpreter.
| adrianmonk wrote:
| Or, raster and vector images are philosophically _the same
| thing_. The only difference is that vector has more
| operations than raster. Raster just has "draw unit square at
| integer coordinates".
| szvsw wrote:
| Yep, I agree with this - tried to hint at that when I said
| "vector images require a much more complex interpreter".
| DougBTX wrote:
| On the other hand, A Pixel Is Not A Little Square[0] would
| disagree, a raster is a grid sample of a continuous
| function.
|
| [0] http://alvyray.com/Memos/CG/Microsoft/6_pixel.pdf
| code_biologist wrote:
| Though I agree with the point that paper makes (it makes
| a good case that the little square mental model of a
| pixel is mostly inappropriate) it does seem focused on
| imaging and does not mention places where the little
| square mental model is appropriate.
|
| Pictures of cats, subpixel rendered text, or company logo
| SVGs as displayed on a web page are point samples and not
| little squares.
|
| User interfaces are good examples of often being little
| squares. Calling HN's beige background a point sampled
| discrete representation of an underlying continuous
| function seems pretty tortured -- to me it seems like a
| bunch of beige little squares.
| andybak wrote:
| > vector gfx is a programmatic/imperative representation of
| image generation, while raster is a data
| structure/declarative representation of images.
|
| This seems a bit off to me.
|
| Aside from oddities such as Postscript, most vector formats
| are canonical examples of declarative code. The distinction
| is more about what is being represented rather than how it is
| represented.
| pmayrgundter wrote:
| It's funny, this kind of subtree mutation was looked at pretty
| deeply by Koza and Adami in the 90s under the rubric of Genetic
| Algorithms, but with a slightly different optimization function
|
| One ref in the paper to 2000 for GAs for fast generation of
| program trees, but that's missing the main show
|
| Hope they're reading this and dig into those guys work
| 29athrowaway wrote:
| You can also say backpropagation is the chain rule from
| centuries ago.
| elijahbenizzy wrote:
| Backpropogration is just an application of the chain rule --
| cool that we all learned it in high school!
| telotortium wrote:
| It is a computationally clever application of the chain rule
| to minimize the amount of computation needed to compute
| gradients for all parameters in the network.
| om8 wrote:
| > to minimize the amount of computation
|
| IMO backprop is the most trivial implementation of
| differentiation in neural networks. Do you know an easier
| way to compute gradients with larger overhead? If so,
| please share it.
| QuadmasterXLII wrote:
| My first forays into making neural networks used
| replacement rules to modify an expression tree until all
| the "D" operators went away, but that takes exponential
| complexity in network depth if you aren't careful. Finite
| differences is linear in number of parameters, as is
| differentiation by Dual Numbers
| eli_gottlieb wrote:
| Backprop is the application of dynamic programming to the
| chain rule for total derivatives, which sounds trivial
| only in retrospect.
| Jensson wrote:
| You can do forward propagation. Humans typically finds
| forward easier than backwards.
| tripzilch wrote:
| since you asked ... how about Monte Carlo with Gibbs
| sampling?
| pmayrgundter wrote:
| I totally didn't realize this until these comments. Neat!
|
| I went digging in wikipedia.. the Backpropagation article was
| created in 2005 and yet the mention of association/derivation
| from the chain rule wasn't mentioned until 2014, through a
| borrow from the German article
|
| https://en.wikipedia.org/w/index.php?title=Backpropagation&o.
| ..
| tithe wrote:
| Are these the references?
|
| - https://web.archive.org/web/20021224053225/http://smi-
| web.st...
|
| - https://www.genetic-programming.com/jkpdf/tr1314.pdf
| verdverm wrote:
| Some more recent alternatives to Koza's GP use some very
| different search mechanisms. FFX & PGE are both very fast.
|
| https://seminars.math.binghamton.edu/ComboSem/worm-chiu.pge_...
|
| https://arxiv.org/pdf/2209.09675
|
| I authored PGE and have thought that RL, and more recently
| diffusion techniques, might help these algos. All of the algos
| need better ways to guide the search, or help it get unstuck
| from local optima, which happens surprisingly fast. Most work
| in GP / EVO is about avoiding premature convergence.
| teruakohatu wrote:
| These kind of Genetic Algorithms are still being researched in
| academia. I attended a seminar a couple of years ago on the
| subject. It's still a total dead end imho.
| Chabsff wrote:
| I used to (early 00s) be super big into GA and GP until a
| professor of mine at the time described the whole class of
| algorithms as "Marginally better than brute forcing". That
| really resonated with my experience, and was just too spot-on
| to ignore.
| bionhoward wrote:
| GP finds good working trees for complex real world problems
| in a single night on a consumer GPU which you would
| definitely need to brute force for (millions of) years to
| find. Quite a margin!
| sandos wrote:
| Wow, what a flash from the past! Was playing around a LOT with
| GP around that time, and the name Koza is certainly familiar. I
| even think I did some semi-similar things, instead of my normal
| approach which was simplistic but inefficient in that lots of
| invalid code was generated.
| vidarh wrote:
| Genetic _Programming_ [1], specifically. I have both his two
| bricks from '92 and '94 (Genetic Programming: On the
| Programming of Computers by Means of Natural Selection, and
| Genetic Programming II : Automatic Discovery of Reusable
| Programs). I've not read his two later ones.
|
| The big problem they seemed to get stuck at was partially doing
| it _fast enough_ , and partially ending up with a result that
| was comprehensible. The latter in particular seems to be far
| better with LLMs. You tended to end up spending a lot of time
| trying to reorganise and prune trees to get something that you
| could decipher, and so it seemed like the primary, and too
| limited, value became algorithms where you could invest a lot
| of resources into trying to find more optimal versions of very
| small/compact algorithms you could justify spending time on.
| But the challenged there is that there are often _so many_ far
| lower hanging fruits in most code bases that few people get to
| the point where it 's worth trying.
|
| I still love the idea at a conceptual level...
|
| [1] https://www.genetic-programming.com/johnkoza.html
| pmayrgundter wrote:
| On my last comment I suggested the authors may not be familiar
| with Koza+Adami, but didn't realize the corresponding author is
| Stuart Russell, co-author of "Artificial Intelligence: A Modern
| Approach", with Peter Norvig.. the "The authoritative, most-
| used AI textbook, adopted by over 1500 schools." according to
| their site. https://aima.cs.berkeley.edu/
|
| Whoops!
| pamelafox wrote:
| I would love to see them try this with the Processing/P5Js
| libraries. Thats what the ASTs reminded me of. It could
| potentially be used to help students trying to figure out how to
| fix their programs. I used AST-based hints for my ProcessingJS
| courses on Khan Academy, but I handwrote the AST patterns for
| those hints.
| dwlg00 wrote:
| I'm failing to see how this is novel. It looks like they're doing
| diffusion on a representation system for 2D graphics, which is
| very different than an actual program (they do address this
| limitation to be fair)
| revalo wrote:
| Yeah, this is true! These are more like expressions rather than
| programs. We were mostly following the language used by
| previous work, https://arxiv.org/abs/1906.04604
| hobofan wrote:
| Couldn't this be used to do HTML generation from designs?
| Especially when combined with multiple viewport sizes at the
| same time, generating a fluid HTML layout would be pretty
| awesome.
| dinobones wrote:
| I don't understand the "magic" here.
|
| In a traditional approach, you would generate random images,
| calculate some distance metric, then use some optimization method
| like simulated annealing to minimize the distance.
|
| I get that the difference between the image representations is
| being optimzied here, but how is it possible that changing tokens
| in a program is differentiable?
| revalo wrote:
| Changing tokens in a program is not differentiable. For me, the
| key idea is that you can train a neural model to suggest edits
| to programs by randomly mutating nodes. And when you run this
| neural model, you get to make edits that are syntactically
| correct (i.e., a number will only replace a number etc.)
| according to a context-free grammar.
| montyanderson wrote:
| This is fascinating. I've been trying to envisage how the new
| language models will have deeper or lower-level role in software
| production than simple code generation.
| passion__desire wrote:
| I think browsers could be the next iteration. Website backend
| will have premade flows. e.g. a transfer money from my account
| to another account, etc. And through fluidic UIs, the website
| will collect info need, necessary approvals before flow
| submission. AI-based browser DOM manipulation.
| artninja1988 wrote:
| Surprised to see Stuart Russells name on this as I thought he was
| fully consumed by the doomsday cult. Although he's last author so
| he's probably only on it because he's the head of the lab
| samatman wrote:
| A lot of doomers work on AI. While frowning, and shaking their
| heads very gravely, so you know they don't approve.
| optimalsolver wrote:
| If we don't get to AGI first, the bad guys will.
| sgt101 wrote:
| hang on, what if...
|
| we're the bad guys?
| ngruhn wrote:
| I haven't heard anyone make sane case against the doomsday
| argument. Only attacks.
| ImHereToVote wrote:
| You can't be scared while you laugh at someone. So laughing
| is a good thing to do to stave of fear.
| robxorb wrote:
| What's with the last author/first author thing in science
| papers?
|
| I've read several times that the author listed last is usually
| the most significant contributor, and the first author the
| least significant, due to some kind of tradition around modesty
| plus favourably introducing new names. (Which then of course
| doesn't work, if everyone knows it's happening...)
|
| Here, you've interpreted it as the reverse, and by that I mean
| in the sensible way - did you not know about the tradition, or
| are you wrong? And how can we know either way for sure?
| optimalsolver wrote:
| >the author listed last is usually the most significant
| contributor
|
| Where did you read that?
|
| That's definitely not the case in machine learning.
| robxorb wrote:
| I've read it in several places over the years, and just had
| a search now to find a reference to cite. Here's a PubMed
| paper on the subject:
|
| https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3010799/
|
| (And note how points 1 through 4 all quite conflict with
| each other!)
| pama wrote:
| If it helps think of the first author as the lead
| engineer or the CEO, and the last author as the board or
| the VC. In some areas of science (or some teams) the last
| author is closer to a CEO, in others closer to a VC (they
| almost always have the powers of the board). This picture
| does not contradict the guideline in the reference you
| shared. Typically, most of the work and writing of the
| paper is done by the first author, though sometimes, for
| example when a student gives up, only most of the writing
| of the paper. The main ideas may be from the first author
| or from the last author, and rarely in between, but the
| sorting typically goes by amount of labor/contributions
| on the task. In some narrow subfields, including most
| math, sorting is alphabetical or random.
| robxorb wrote:
| That does help, thanks! Very intuitive analogy. Maybe
| this kind of organisational structure is a kind of
| natural archetype people gravitate towards when coming
| together to break new ground.
| fabian2k wrote:
| Conventions vary by field, but within a specific field
| they're usually pretty consistent. In natural sciences
| (except large physics papers) the convention is that the
| first author is the one doing most of the practical work. The
| last author is the PI (principal investigator) of the group
| who had a hand in designing the experiments and oversaw the
| research. Now, the latter can mean anything from barely doing
| any work on the paper to being deeply involved in the
| research.
|
| If you're reading papers most of the time the last author is
| more meaningful to you because they're the senior researcher,
| you know their research interests and what kind of papers
| they produce. The first authors are PhD students and
| PostDocs, they change much more often.
| robxorb wrote:
| Thank you, that makes my apparently half-formed prior
| understanding make a lot more sense. Seems the path forward
| is researching the greater context of the last authors
| work, and of course the common convention for each field.
|
| Eg, as a sibling commented this convention may not be
| common in ML research, I wonder then if it may be something
| less common in emergent fields where researchers are
| generally be younger and less likely to follow older
| traditions.
| behnamoh wrote:
| How is it different from genetic algorithms that mutate the
| syntax tree until the target output is achieved?
| Karellen wrote:
| It's different in the same way that using an LLM instead of a
| traditional Markov chain is a different way of generating text.
| You're still predicting the next word at a time to hopefully
| end up with plausible sentences/paragraphs, but the difference
| is in how you model the training dataset, and how you use that
| model to make each next choice in your live application.
| gastonmorixe wrote:
| could diffusion work at binary level? I mean, could we train a
| diffusion model to generate a final binary of a program given a
| prompt? probably AST may be better but the binary I feel is
| extremely easy to at least test fast if it works or not. Though
| there may be a lot of drawbacks, if this is possible I can't wait
| until we ask "give me an app that does that" and the diffusion
| model starts generating all te bytes the app to do the job. Just
| wondering
| dcreater wrote:
| That would be mind blowing. Why go through all the lost
| intermediary steps, especially through Python and JS, when you
| can generate machine code directly
| treyd wrote:
| Interpretability, reasoning about the generated code,
| portability, etc. Probably also demands a larger model with a
| higher cost of training (and likely more training data) to
| target a more unstructured "language" like machine code.
| eternauta3k wrote:
| If your model is error-prone, having control structures,
| types and other compile-time checks is very valuable. It's
| harder to constrain arbitrary machine code to make something
| sensible.
| mejutoco wrote:
| Intuitively it makes sense, but I am not fully convinced
| about this. You could give it only a few register and
| discard invalid operations for certain registers or plain
| known invalid operations.
| treyd wrote:
| But that doesn't stop it from generating code that
| segfaults.
| mejutoco wrote:
| That is the same problem as generating python that blows
| up, no? (assuming it is tested in a sandbox)
| fulafel wrote:
| Editing with feedback from program output, like in this work,
| could be more closely applicable if you first disassemble the
| binary and have it edit things in the assembly language AST,
| then reassemble. This would result in a higher likelihood of
| creating a valid program.
| zelphirkalt wrote:
| This sounds more similar to what people have done with Racket and
| hint generation for MOOCs. Not sure which university it is again,
| but I saw a presentation about how they generate hints for
| students by mutating the syntax tree and analyzing how they had
| to modify it, to get to a target solution. It was presented at
| some RacketCon, maybe a decade ago already. Perhaps that
| knowledge how to do it can be combined with newer machine
| learning approaches?
|
| EDIT: I found the talk:
| https://invidious.baczek.me/watch?v=ijyFC36kVis
| aquarius0 wrote:
| The application to inverse graphics tasks reminds me of this
| paper which was released one week earlier:
| https://arxiv.org/abs/2405.15306
| flakiness wrote:
| The PDF is super slow to render, I guess because it contains
| commands from programmatically generated figures. It gives it a
| kind of an academic-paper-feel I miss these days.
|
| https://arxiv.org/pdf/2405.20519
| lwansbrough wrote:
| I'd like to see it with SDFs!
| grondilu wrote:
| Please elaborate. Are you thinking of approximating the
| distance function with an algebraic expression, with algebra
| itself being the "programming language"?
| Philpax wrote:
| You can represent arbitrary shapes through the composition of
| SDFs: https://iquilezles.org/articles/distfunctions/
|
| These can be treated as parameterised nodes in a tree,
| similar to what's happening here. It follows that there may
| be a possible adaptation of this to SDF composition, such
| that you can give it a shape and have it produce the SDF
| nodes + composition required to produce that shape.
|
| Most existing approaches to SDFs with NNs have the NN itself
| take on the role of the SDF (i.e. given a point, it predicts
| the distance), so there's a compelling opportunity here to
| build a system that can produce spatial representations from
| existing imagery without NN inference at render-time.
|
| I imagine adding the third dimension to the problem makes it
| much harder, though! I'll have to give the paper a read to
| determine how coupled to 2D their current approach is.
| grondilu wrote:
| The application to graphics is interesting. It seems to me that
| current image generation models struggle with stylized pictures
| ("ligne claire" in comics, geometric shapes and so on). After all
| this kinds of pictures should be easy to encode in vectoriel
| formats (like SVG), which are basically programming languages.
| machiaweliczny wrote:
| I had idea about doing something similar based on DifussER paper.
| One would need to model code edits as algebra similar to add
| char, replace char or delete char but something like define func,
| remove func, define var etc. I am undereducated to do it myself
| but have a feeling it could work.
| machiaweliczny wrote:
| I will have to dig into this paper as it looks like exactly
| this. I wonder if they use closures to limit valid operations
| space.
|
| The only thing I didn't understand to make it happen was how to
| connect it well to description of desired program or edit.
|
| BTW my idea was to train it by destructing programs available
| on github (so adding noise via some random valid ops and then
| removing it to retrieve original program). Probably best done
| in N-1 commit is treated as noise and moving back to commit 0
| can16358p wrote:
| I wonder how this would apply to compiler/interpreter
| optimizations.
|
| Is it possible that it can "disect" some parts of the execution,
| perhaps at assembly level, and come up with optimizations
| specific to the compiled code without changing the output (I mean
| expected program output, not emitted binary), that modern
| compilers have not deterministically come up with?
| bastawhiz wrote:
| I expect the answer is "no". I wouldn't expect a tool like this
| to "discover" assembly without being trained on the compiled
| output. The model has no notion of how or where the code runs.
|
| After decades of compiler research and super compilers chugging
| away, we're sort of at a point where discovering novel
| optimizations with results that are more than a smidge of
| improvement is almost impossibly unlikely. Compilers today are
| _really good_.
|
| That said, I think the value that something like this might
| have is being able to optimize the _intent_ of the code. If it
| can determine that I 'm sorting some numbers, it can rewrite my
| code to use a faster sorting algorithm that has the same
| functional properties. If I'm storing data that never gets
| used, it can stop storing it. It has a view of the code at a
| level above what the compiler sees, with an understanding not
| just of _what_ is being done, but _why_.
| xavxav wrote:
| > After decades of compiler research and super compilers
| chugging away, we're sort of at a point where discovering
| novel optimizations with results that are more than a smidge
| of improvement is almost impossibly unlikely. Compilers today
| are really good.
|
| I agree when it comes to peephole optimizations, but there's
| still a lot of juice left in exploiting language guarantees
| (immutability, non-aliasing, data-parallelism), however most
| compiler developer energy is spent propping up C/C++ and
| consequently optimizations are developed with those languages
| in mind.
| gergo_barany wrote:
| This is called superoptimization:
| https://en.wikipedia.org/wiki/Superoptimization
|
| There are people applying synthesis techniques to
| superoptimization. So something like this would possibly apply.
| hoosieree wrote:
| My dissertation worked on a similar problem. I used obfuscation
| to build a large dataset from a small set of ground-truth
| functions. Then built a model to classify unseen obfuscated
| binary code to the nearest of the known functions.
|
| The application I had in mind during the research was anti-
| malware static analysis, but optimization is really just the
| flipside of obfuscation. Something I'd like to try in the
| future is a diffusion model that treats obfuscation as "noise"
| to be removed.
|
| One thing I learned is that optimizing compilers produce very
| regular output. After normalizing addresses, the "vocabulary"
| size of basic blocks ends up pretty small, like ~2000 tokens.
| Certain "phrases" correlate with the original source code's
| semantics regardless of how much obfuscation you add on top.
| whereismyacc wrote:
| There's been talk in the past about github adding integrations
| with common build tools (automatically?). What if you could
| compile every llvm-compiled project on github and run a diffusion
| model over the intermediate representations?
| daralthus wrote:
| what's the output?
| whereismyacc wrote:
| I guess the output would be an llvm intermediate
| representation that you can compile down and run, right? I'm
| stretching pretty far past my knowledge here.
| Philpax wrote:
| I think the question - at least, for me - is "what do you
| expect to do with this system?" What will you output with
| your diffusion model?
| JHonaker wrote:
| Markov Chain Monte Carlo for program synthesis isn't exactly
| novel. The most immediate reference I thought of is Josh
| Tenenbaum's [1].
|
| There's also a lot of demos in WebPPL (web probabilistic
| programming language)[2] like [3] for the synthesis of 3D space-
| ships. I highly recommend their associated books on The Design
| and Implementation of Probabilistic Programming Languages [4] and
| Probabilistic Models of Cognition [5].
|
| I also highly recommend taking a look at the publications of the
| MIT Probabilistic Computing Project [6].
|
| [1] Human-level concept learning through probabilistic program
| induction.
| https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science....
|
| [2] http://webppl.org/
|
| [3] https://dritchie.github.io/web-procmod/
|
| [4] https://dippl.org/
|
| [5] http://probmods.org/
|
| [6] http://probcomp.csail.mit.edu/
| ofou wrote:
| Here's a GPTo summary of the paper
|
| https://www.emergentmind.com/papers/2405.20519
| goexploration wrote:
| The beam search idea is interesting. Curious to know if beam
| search for reverse diffusion has been done before.
|
| Could anyone clarify how they integrate beam search with the
| reverse diffusion- do they sample m > k nodes from a reverse
| diffusion step and expand only the top k nodes?
___________________________________________________________________
(page generated 2024-06-04 23:01 UTC)