[HN Gopher] The unreasonable effectiveness of fuzzing for portin...
       ___________________________________________________________________
        
       The unreasonable effectiveness of fuzzing for porting programs
        
       Author : Bogdanp
       Score  : 148 points
       Date   : 2025-06-18 16:26 UTC (6 hours ago)
        
 (HTM) web link (rjp.io)
 (TXT) w3m dump (rjp.io)
        
       | oasisaimlessly wrote:
       | Author FYI: The "You can see the session log here." link to [1]
       | is broken.
       | 
       | [1]: https://rjp.io/blog/claude-rust-port-conversation
        
         | rjpower9000 wrote:
         | Fixed, thanks!
        
       | nyanpasu64 wrote:
       | > Most code doesn't express subtle logic paths. If I test if a
       | million inputs are correctly sorted, I've probably implemented
       | the sorter correctly.
       | 
       | I don't know if this was referring to Zopfli's sorter or sorting
       | in general, but I _have_ heard of a subtle sorting bug in
       | Timsort:
       | https://web.archive.org/web/20150316113638/http://envisage-p...
        
         | rjpower9000 wrote:
         | Thanks for sharing, I did not know about that!
         | 
         | Indeed, this is exactly the type of subtle case you'd worry
         | about when porting. Fuzzing would be unlikely to discover a bug
         | that only occurs on giant inputs or needs a special
         | configuration of lists.
         | 
         | In practice I think it works out okay because most of the time
         | the LLM has written correct code, and when it doesn't it's
         | introduced a dumb bug that's quickly fixed.
         | 
         | Of course, if the LLM introduces subtle bugs, that's even
         | harder to deal with...
        
           | awesome_dude wrote:
           | > Fuzzing would be unlikely to discover a bug that only
           | occurs on giant inputs or needs a special configuration of
           | lists.
           | 
           | I have a concern about peoples' over confidence in fuzz
           | testing.
           | 
           | It's a great tool, sure, but all it is is something that
           | selects (and tries) inputs at random from the set of all
           | possible inputs that can be generated for the API.
           | 
           | For a strongly typed system that means randomly selecting
           | ints from all the possible ints for an API that only accepts
           | ints.
           | 
           | If the API accepts any group of bytes possible, fuzz testing
           | is going to randomly generate groups of bytes to try.
           | 
           | The only advantage this has over other forms of testing is
           | that it's not constrained by people thinking "Oh these are
           | the likely inputs to deal with"
        
       | amw-zero wrote:
       | There are 2 main problems in generative testing:
       | 
       | - Input data generation (how do you explore enough of the
       | program's behavior to have confidence that you're test is a good
       | proxy for total correctness)
       | 
       | - Correctness statements (how do you express whether or not the
       | program is correct for an arbitrary input)
       | 
       | When you are porting a program, you have a built in correctness
       | statement: The port should behave exactly as the source program
       | does. This greatly simplifies the testing process.
        
         | bluGill wrote:
         | Several times I've been involved in porting code. Eventually we
         | reach a time where we are getting a lot of bug reports "didn't
         | work, didn't work with the old system as well" which is to say
         | we ported correctly, but the old system wasn't right either and
         | we just hadn't tested it in that situation until the new system
         | had the budget for exhaustive testing. (normally it worked at
         | one point on the old system and got broke in some other update)
        
       | lhmiles wrote:
       | Are you the author? You can speed things up and get better
       | results sometimes by retrying the initial generation step many
       | times in parallel, instead of the interactive rewrite thing.
        
         | rjpower9000 wrote:
         | I'm the author. That's a great idea. I didn't explore that for
         | this session but it's worth trying.
         | 
         | I didn't measure consistently, but I would guess 60-70% of the
         | symbols ported easily, with either one-shot or trivial edits,
         | 20% Gemini managed to get there but ended up using most of its
         | attempts, and 10% it just struggled with.
         | 
         | The 20% would be good candidates for multiple generations &
         | certainly consumed more than 20% of the porting time.
        
       | rcthompson wrote:
       | The author notes that the resulting Rust port is not very
       | "rusty", but I wonder if this could also be solved through
       | further application of the same principle. Something like telling
       | the AI to minimize the use of unsafe etc., while enforcing that
       | the result should compile and produce identical outputs to the
       | original.
        
         | rjpower9000 wrote:
         | It seems feasible, but I haven't thought enough it. One
         | challenge is that as you Rustify the code, it's harder to keep
         | the 1-1 mapping with C interfaces. Sometimes to make it more
         | Rust-y, you might want an internal function or structure to
         | change. You then lose your low-level fuzz tests.
         | 
         | That said, you could have the LLM write equivalence tests, and
         | you'd still have the top-level fuzz tests for validation.
         | 
         | So I wouldn't say it's impossible, just a bit harder to
         | mechanize directly.
        
       | DrNosferatu wrote:
       | It will be inevitable that this generalizes.
        
       | DrNosferatu wrote:
       | Why not use the same approach to port the _full set_ of Matlab
       | libraries to Octave?
       | 
       | (or a open source language of your choice)
       | 
       | Matlab manuals are public: it would be clean room reverse
       | engineering.
       | 
       | (and many times, the appropriate bibliography of the underlying
       | definitions of what is being implemented is listed on the manual
       | page)
        
       | e28eta wrote:
       | > LLMs open up the door to performing radical updates that we'd
       | never really consider in the past. We can port our libraries from
       | one language to another. We can change our APIs to fix issues,
       | and give downstream users an LLM prompt to migrate over to the
       | new version automatically, instead of rewriting their code
       | themselves. We can make massive internal refactorings. These are
       | types of tasks that in the past, rightly, a senior engineer would
       | reject in a project until its the last possibly option. Breaking
       | customers almost never pays off, and its hard to justify
       | refactoring on a "maintenance mode" project.
       | 
       | > But if it's more about finding the right prompt and letting an
       | LLM do the work, maybe that changes our decision process.
       | 
       | I don't see much difference between documenting any breaking
       | changes in sufficient detail for your library consumers to
       | understand them vs "writing an LLM prompt for migrating
       | automatically", but if that's what it takes for maintainers to
       | communicate the changes, okay!
       | 
       | Just as long as it doesn't become "use this LLM which we've
       | already trained on the changes to the library, and you just need
       | to feed us your codebase and we'll fix it. PS: sorry, no
       | documentation."
        
         | marxism wrote:
         | There's a huge difference between documentation and prompts.
         | Let me give you a concrete example.
         | 
         | I get requests to "make your research code available on Hugging
         | Face for inference" with a link to their integration guide.
         | That guide is 80% marketing copy about Git-based repositories,
         | collaboration features, and TensorBoard integration. The actual
         | implementation details are mixed in through out.
         | 
         | A prompt would be much more compact.
         | 
         | The difference: I can read a prompt in 30 seconds and decide
         | "yes, this is reasonable" or "no, I don't want this change."
         | With documentation, I have to reverse-engineer the narrow
         | bucket which applies to my specific scenario from a one size
         | drowns all ocean.
         | 
         | The person making the request has the clearest picture of what
         | they want to happen. They're closest to the problem and most
         | likely to understand the nuances. They should pack that
         | knowledge densely instead of making me extract it from
         | documentation links and back and forth.
         | 
         | Documentation says "here's everything now possible, you can do
         | it all!" A prompt says "here's the specific facts you need."
         | 
         | Prompts are a shared social convention now. We all have a rough
         | feel for what information you need to provide - you have to be
         | matter-of-fact, specific, can't be vague. When I ask someone to
         | "write me a prompt," that puts them in a completely different
         | mindset than just asking me to "support X".
         | 
         | Everyone has experience writing prompts now. I want to leverage
         | that experience to get cooperative dividends. It's division of
         | labor - you write the initial draft, I edit it with special
         | knowledge about my codebase, then apply it. Now we're sharing
         | the work instead of dumping it entirely on the maintainer.
         | 
         | [1] https://peoplesgrocers.com/en/writing/write-prompts-not-
         | guid...
        
           | rjpower9000 wrote:
           | I was pretty hand-wavy when I made the original comment. I
           | was thinking implicitly to things like the Python sub-
           | interpreter proposal, which had strong pushback from the
           | Numpy engineers at the time (I don't know the current status,
           | whether it's a good idea, etc, just something that came to
           | mind).
           | 
           | https://lwn.net/Articles/820424/
           | 
           | The objections are of course reasonable, but I kept thinking
           | this shouldn't be as big a problem in the future. A lot of
           | times we want to make some changes that aren't _quite_
           | mechanical, and if they hit a large part of the code base,
           | it's hard to justify. But if we're able to defer these types
           | of cleanups to LLMs, it seems like this could change.
           | 
           | I don't want a world with no API stability of course, and you
           | still have to design for compatibility windows, but it seems
           | like we should be able to do better in the future. (More so
           | in mono-repos, where you can hit everything at once).
           | 
           | Exactly as you write, the idea with prompts is that they're
           | directly actionable. If I want to make a change to API X, I
           | can test the prompt against some projects to validate agents
           | handle it well, even doing direct prompt optimization, and
           | then sharing it with end users.
        
           | e28eta wrote:
           | Yes, there's a difference between "all documentation for a
           | project" and "prompt for specific task".
           | 
           | I don't think there should be a big difference between
           | "documentation of specific breaking changes in a library and
           | how consumers should handle them" and "LLM prompt to change a
           | code base for those changes".
           | 
           | You might call it a migration guide. Or it might be in the
           | release notes, in a special section for Breaking Changes. It
           | might show up in log messages ("you're using this API wrong,
           | or it's deprecated").
           | 
           | Why would describing the changes to an LLM be easier than
           | explaining them to the engineer on the other end of your API
           | change?
        
       | gaogao wrote:
       | Domains where fuzzing is useful are generally good candidates for
       | formal verification, which I'm pretty bullish about in concert
       | with LLMs. This is in part because you can just formal verify by
       | exhaustiveness for many problems, but the enhancement is being
       | able to prove that you don't need to test certain combinations
       | through inductive reasoning and such.
        
         | rjpower9000 wrote:
         | That's an interesting idea. I hadn't thought about it, but it
         | would be interesting to consider doing something similar for
         | the porting task. I don't know enough about the space, could
         | you have an LLM write a formal spec for a C function and the
         | validate the translated function has the same properties?
         | 
         | I guess I worry it would be hard to separate out the "noise",
         | e.g. the C code touches some memory on each call so now the
         | Rust version has to as well.
        
       | zie1ony wrote:
       | I find it amazing, that the same ideas pop up in the same period
       | of time. For example, I work on tests generation and I went the
       | same path. I tried to find bugs by prompting "Find bugs in this
       | code and implement tests to show it.", but this didn't get me
       | far. Then I switched to property (invariant) testing, like you,
       | but in my case I ask AI: "Based on the whole codebase, make the
       | property tests." and then I fuzz some random actions on the
       | state-full objects and run prop tests over and over again.
       | 
       | At first I also wanted to automate everything, but over time I
       | realized that best is: 10% human to 90% AI of work.
       | 
       | Another idea I'm exploring is AI + Mutation Tests
       | (https://en.wikipedia.org/wiki/Mutation_testing). It should help
       | AI with generation of full coverage.
        
         | wahnfrieden wrote:
         | An under-explored approach is to collect data on human usage of
         | the app (from production and from internal testers) and feed
         | that to your generative inputs
        
         | LAC-Tech wrote:
         | I'd have much more confidence in an AI codebase where the human
         | has chosen the property tests, than a human codebase where the
         | AI has chosen the property tests.
         | 
         | Tests are executable specs. That is the last thing you should
         | offload to an LLM.
        
           | koakuma-chan wrote:
           | How about an LRM?
        
             | LAC-Tech wrote:
             | I do not know this term; could you give a concise
             | explanation?
        
               | koakuma-chan wrote:
               | LRM is a new term for reasoning LLMs. From my experience,
               | either I am bad at prompting, or LRMs are vastly better
               | than LLMs at instruction following.
        
           | bccdee wrote:
           | Also, a poorly designed test suite makes your code base
           | extremely painful to change. A well-designed test suite with
           | good abatractions makes it easy to change code, on top of
           | which, it makes tests extremely fast to write.
           | 
           | I think the whole idea of getting LLMs to write the tests
           | comes from a pandemic of under-abstracted, labour-intensive
           | test suites. And that just makes the problem worse.
        
             | LAC-Tech wrote:
             | Perhaps the viewpoint that tests are a chore or grunt work;
             | something you have to do but you don't really view as
             | interesting or important.
             | 
             | (like how I describe what git should do and I get the LLM
             | to give me the magic commands with all the confusing nouns
             | and verbs and dashes in the right place).
        
       | punnerud wrote:
       | Reading that TensorFlow is not used much anymore (besides Google)
       | felt good to read. Had to check Google Trends:
       | https://trends.google.com/trends/explore?date=all&q=%2Fg%2F1...
       | 
       | I started using TensorFlow years ago and switched to PyTorch.
       | Hope ML will make switches like TensorFlow to PyTorch faster and
       | easier, and not just the biggest companies eating the open source
       | community. Like it have been for years.
        
         | screye wrote:
         | Google has moved to JAX. I know many people who prefer it over
         | pytorch.
        
           | leoh wrote:
           | It's okay. Complaints are documentation, limited community
           | support (all kinds of architecture is much more diy for it vs
           | PyTorch).
           | 
           | Unrelated gripe: they architected it really poorly from a
           | pure sw pov imo. Specifically it's all about Python bindings
           | for C++ so the py/c++ layer is tightly coupled both in code
           | and in the build system.
           | 
           | They have a huge opportunity to fix this so, for example,
           | rust bindings could be (reasonably trivially) generated, let
           | alone for other languages.
        
       | comex wrote:
       | Interesting! But there's a gap between aspirations and what was
       | accomplished here.
       | 
       | Early on in the blog post, the author mentions that "c2rust can
       | produce a mechanical translation of C code to Rust, though the
       | result is intentionally 'C in Rust syntax'". The flow of the post
       | seems to suggest that LLMs can do better. But later on, they say
       | that their final LLM approach produces Rust code which "is very
       | 'C-like'" because "we use the same unsafe C interface for each
       | symbol we port". Which sounds like they achieved roughly the same
       | result as c2rust, but with a slower and less reliable process.
       | 
       | It's true that, as the author says, "because our end result has
       | end-to-end fuzz tests and tests for every symbol, its now much
       | easier to 'rustify' the code with confidence". But it would have
       | been possible to use c2rust for the actual port, and separately
       | use an LLM to write fuzz tests.
       | 
       | I'm not criticizing the approach. There's clearly a lot of
       | promise in LLM-based code porting. I took a look at the earlier,
       | non-fuzz-based Claude port mentioned in the post, and it reads
       | like idiomatic Rust code. It would be a perfect proof of concept,
       | if only it weren't (according to the author) subtly buggy.
       | Perhaps there's a way to use fuzzing to remove the bugs while
       | keeping the benefits compared to mechanical translation.
       | Unfortunately, the author's specific approach to fuzzing seems to
       | have removed both the bugs and the benefits. Still, it's a good
       | base for future work to build on.
        
       | maxjustus wrote:
       | I used this general approach to port the ClickHouse specific
       | version of cityHash64 to JS from an existing Golang
       | implementation https://github.com/maxjustus/node-ch-
       | city/blob/main/ch64.js. I think it works particularly well when
       | porting pure functions.
        
       ___________________________________________________________________
       (page generated 2025-06-18 23:00 UTC)