[HN Gopher] FAWK: LLMs can write a language interpreter
___________________________________________________________________
FAWK: LLMs can write a language interpreter
Author : todsacerdoti
Score : 193 points
Date : 2025-11-21 10:28 UTC (12 hours ago)
(HTM) web link (martin.janiczek.cz)
(TXT) w3m dump (martin.janiczek.cz)
| Y_Y wrote:
| I've been trying to get LLMs to make Racket "hashlangs"+ for
| years now, both for simple almost-lisps and for honest-to-god
| different languages, like C. It's definitely possible, raco has
| packages++ for C, Python, J, Lua, etc.
|
| Anyway so far I haven't been able to get any nice result from any
| of the obvious models, hopefully they're finally smart enough.
|
| + https://williamjbowman.com/tmp/how-to-hashlang/
|
| ++ https://pkgd.racket-lang.org/pkgn/search?tags=language
| keepamovin wrote:
| Yes! I'm currently using copilot + antigravity to implement a
| language with ergonomic syntax and semantics that lowers cleanly
| to machine code targeting multiple platforms, with a focus on
| safety, determinism, auditability and fail-fast bugs. It's more
| work than I thought but the LLMs are very capable.
|
| I was dreaming of a JS to machine code, but then thought, why not
| just start from scratch and have what I want? It's a _lot_ of
| fun.
| lionkor wrote:
| Curious why you do this with AI instead of just writing it
| yourself?
|
| You should be able to whip up a Lexer, Parser and compiler with
| a couple weeks of time.
| epolanski wrote:
| I'm not the previous user, but I imagine that weeks of
| investment might be a commitment one does not have.
|
| I have implemented an interpreter for a very basic stack-
| based language (you can imagine it being one of the simplest
| interpreters you can have) and it took me a lot of time and
| effort to have something solid and functional.
|
| Thus I can absolutely relate to the idea of having an LLM
| who's seen many interpreters lay out the ground for you and
| make you play as quickly as possible with your ideas while
| procrastinating delving in details till necessary.
| My_Name wrote:
| Because he did it in a day, not a few weeks.
|
| If I want to go from Bristol to Swindon, I could walk there
| in about 12 hours. It's totally possible to do it by foot. Or
| I could use a car and be there in an hour. There and back,
| with a full work day in-between done, in a day. Using the
| tool doesn't change what you can do, it speeds up getting the
| end result.
| bgwalter wrote:
| There is no end result. It's a toy language based on a
| couple of examples without a grammar where apparently the
| LLM used its standard (plagiarized) parser/lexer code and
| reiterated until the examples passed.
|
| Automating one of the fun parts of CS is just weird.
|
| So with this awesome "productivity" we now can have 10,000
| new toy languages per day on GitHub instead of just 100?
| TeodorDyakov wrote:
| That was exactly my thought. Why automate the coding part
| to create something that will be used for coding (and in
| itself can be automated , going buy the same logic)? This
| makes zero sense.
| fragmede wrote:
| Thank you for bringing this matter to our attention,
| TeodorDyakov and bgwalter. I am a member of the fun
| police, and I have placed keepamovin, and accomplice,
| My_Name under arrest, pending trial, for having fun
| _wrong_. If convicted, thet each face a 5 year sentence
| to a joyless marriage for healthcare without possiblity
| of time off for boring behavior. We take these matters
| pretty seriously, as crimes of this nature could lead to
| a bubble collapse, and the economy can 't take that (or a
| joke), so good work there!
| andsoitis wrote:
| If you could also automate away the reason for being in
| Swindon in the first place, would you still go?
| thunky wrote:
| The only reason for going to Swindon was to walk there?
|
| If so then of course you still should go.
|
| But the point making of a computer program usually isn't
| for "the walk".
| andsoitis wrote:
| If you can automated away the reason for being at the
| destination, then there's no point in automating the way
| to get to the destination.
|
| similar for automating creating an interpreter with nicer
| programming language features in order to build an app
| more easily when you can just automate creation of the
| app in the first place.
| int_19h wrote:
| "Because it's a shiny toy that I want to play with" is a
| perfectly valid reason that still applies here. The
| invalid assumption in your premise is that people either
| enjoy coding or don't. The truth is that they enjoy
| coding some things but not others, and those preferences
| are very subjective.
| lionkor wrote:
| Yes, and the result is undoubtably trash. I have yet to see
| a single vibe-coded app or reasonably large/complex snippet
| which isn't either 1) almost an exact reproduction of a
| popular library, tutorial, etc. or 2) complete and utter
| trash.
|
| So my question was, given that this is not a very hard
| thing to build properly, why not properly.
| simonw wrote:
| The choice with this kind of question is almost _never_
| between "do it properly or do it faster with LLMs".
|
| It's between "do it with LLMs or don't do it at all" -
| because most people don't have the time to take on an
| ambitious project like implementing a new programming
| language just for fun.
| keepamovin wrote:
| It would be very new to me. I'd have to learn a lot to do
| that. And I can't spare the time or attention. It's more of a
| fun side project.
|
| The machine code would also be tedious, tho fun. But I really
| can't spare the time for it.
| TechDebtDevin wrote:
| Because this is someone in a "spiral" or "AI psychosis" Its
| pretty clear by how they are talking.
| 64718283661 wrote:
| What's the point of making something like this if you don't get
| to deeply understand what your doing?
| My_Name wrote:
| What's the point of owning a car if you don't build it by
| hand yourself?
|
| Anyway, all it will do is stop you being able to run as well
| as you used to be able to do when you had to go everywhere on
| foot.
| purple_turtle wrote:
| What is the point of car that on Mondays changes colour to
| blue and on each first Friday of the year explodes?
|
| If neither you not anyone else can fix it, without more
| cost than making a proper one?
| ChrisGreenHeur wrote:
| Code review exists.
| bgwalter wrote:
| Proper code review takes as long as writing the damn
| thing in the first place and is infinitely more boring.
| And you still miss things that would have been obvious
| while writing.
|
| In this special case, you'd have to reverse engineer the
| grammar from the parser, calculate first/follow sets and
| then see if the grammar even is what you intended it to
| be.
| skeledrew wrote:
| Author did review the (also generated) tests, which as
| long as they're comprehensive enough for his purposes,
| all pass and coverage is very high, means things work
| well enough. Attempting to manually edit that code is a
| whole other thing though.
| auggierose wrote:
| That argument might work for certain kinds of
| applications (none I'd like to use, though), but for a
| programming language, nope.
|
| I am using LLMs to speed up coding as well, but you have
| to be super vigilant, and do it in a very modular way.
| skeledrew wrote:
| They literally just made it to do AoC challenges, and
| shared it for fun (and publicity).
| auggierose wrote:
| I don't think that contradicts my comment in any way.
| It's not a programming language then, it is a fun
| language.
| johnisgood wrote:
| I have made a lot of things using LLMs and I fully understood
| everything. It is doable.
| afpx wrote:
| How deep do you need to know?
|
| "Imagination is more important than knowledge."
|
| At least for me that fits. I have quite enough graduate-level
| knowledge of physics, math, and computer science to rarely be
| stumped by a research paper or anything an LLM spits out.
| That may get me scorn from those tested on those subjects.
| Yet, I'm still an effective ignoramus.
| keepamovin wrote:
| I want something I can use, and something useful. It's not
| just a learning exercise. I get to understand it by following
| along.
| ModernMech wrote:
| If they go far enough with it they will be forced to
| understand it deeply. The LLM provides more leverage at the
| beginning because this project is a final exam for a first
| semester undergrad PL course, therefore there are a billion
| examples of "vaguely Java/Python/C imperative language with
| objects and functions" to train the LLM on.
|
| Ultimately though, the LLM is going to become less useful as
| the language grows past its capabilities. If the language
| author doesn't have a sufficient map of the language and a
| solid plan at that point, it will be the blind leading the
| blind. Which is how most lang dev goes so it should all work
| out.
| keepamovin wrote:
| Lol thank you for this. It's more worth I work than i
| thought!
| skydhash wrote:
| Commendable effort, but I expected at least a demo, which would
| showcase working code (even if it's hacky). It's like someone
| talking about a sheet music without playing it once.
| epolanski wrote:
| Even more, it's like talking about a sheet without seeing the
| sheet itself.
| johnisgood wrote:
| See https://github.com/Janiczek/fawk and .fawk files in
| https://github.com/Janiczek/fawk/tree/main/tests.
| slybot wrote:
| I did AoC 2021 until D10 using awk, it was fun but not easy and
| couldn't proceed further: https://github.com/nusretipek/Advent-
| of-Code-2021
| qsort wrote:
| The money shot: https://github.com/Janiczek/fawk
|
| Purely interpretive implementation of the kind you'd write in
| school, still, above and beyond anything I'd have any right to
| complain about.
| artpar wrote:
| I wrote two
|
| jslike (acorn based parser)
|
| https://github.com/artpar/jslike
|
| https://www.npmjs.com/package/jslike
|
| wang-lang ( i couldn't get ASI to work like javascript in this
| nearley based grammar )
|
| https://www.npmjs.com/package/wang-lang
|
| https://artpar.github.io/wang/playground.html
|
| https://github.com/artpar/wang
| shevy-java wrote:
| wang-lang? Is that a naughty language?
| jamesu wrote:
| A few months ago I used ChatGPT to rewrite a bison based parser
| to recursive descent and was pretty surprised how well it held up
| - though I still needed to keep prompting the AI to fix things or
| add elements it skipped, and in the end I probably rewrote 20% of
| it because I wasn't happy with its strange use of C++ features
| making certain parts hard to follow.
| vidarh wrote:
| It's a fun post, and I love language experiments with LLMs (I'm
| close to hitting the weekly limit of my Claude Max subscription
| because I have a near-constantly running session working on my
| Ruby compiler; Claude can fix -- albeit with messy code sometimes
| -- issues that requires complex tracing of backtraces with gdb,
| and fix complex parser interactions almost entirely unaided as
| long as it has a test suite to run).
|
| But here's the Ruby version of one of the scripts:
| BEGIN { result = [1, 2, 3, 4, 5] .filter
| {|x| x % 2 == 0 } .map {|x| x * x}
| .reduce {|acc,x| acc + x } puts "Result: #{result}"
| }
|
| The point being that running a script with the "-n" switch un
| runs BEGIN/END blocks and puts an implicit "while gets ... end"
| around the rest. Adding "-a" auto-splits the line like awk.
| Adding "-p" also prints $_ at the end of each iteration.
|
| So here's a more typical Awk-like experience:
| ruby -pe '$_.upcase!' somefile.txt ($_ has the whole line)
|
| Or: ruby -F, -ane '$F[1]' # Extracts the second
| field field -F sets the default character to split on, and -a
| adds an implicit $F = $_.split.
|
| That is not to detract from what he's doing because it's fun. But
| if your goal is just to use a better Awk, then Ruby is usually
| better Awk, and so, for that matter, is Perl, and for most things
| where an Awk script doesn't fit on the command line the only
| reason to really use Awk is that it is more likely to be
| available.
| UltraSane wrote:
| So I have had to work very hard to use $80 worth of my $250
| free Claude code credits. What am I doing wrong?
| sceptic123 wrote:
| > free
|
| how do you get free credits?
| throwup238 wrote:
| They were given out for the Claude Code on Web launch. Mine
| expired November 18 (but I managed to use them all before
| then).
| UltraSane wrote:
| Mine were set to expire then but got extended to the 23.
| UltraSane wrote:
| Pro users got $250 and max users got $1000
| throwup238 wrote:
| I used all of my credits working on a PySide QT desktop app
| last weekend. What worked:
|
| I first had Claude write an E2E testing framework that
| functioned a lot like Cypress, with tests using element
| selectors like Jquery and high level actions like 'click'
| with screenshots at every step.
|
| Then I had Claude write an MCP server that could run the GUI
| in the background (headless in Claude's VM) and take
| screenshots, execute actions, etc. This gave Claude the
| ability to test the app in real time with visual feedback.
|
| Once that was done, I was able to run half a dozen or more
| agents at the same time running in parallel working on
| different features. It was relatively easy to blow through
| credits at that point, especially since I think VM times
| counts so whenever I spent 4-5 min running the full e2e test
| suite that cost money. At the end of an agents run, I'd ask
| them to pull master and merge conflicts, then I'd watch the
| e2e tests run locally before doing manual acceptance testing.
| vidarh wrote:
| Run it with --dangerously-skip-permissions, give it a large
| test suite, and keep telling it "continue fixing spec
| failures" and you'll eat through them very quickly.
|
| Or it will format your drives, and set fire to your cat;
| might be worth doing it in a VM.
|
| Though a couple of days ago, I gave Claude Code root access
| to a Raspberry Pi and told it to set up Home Assistant and a
| voice agent... It likes to tweak settings and reboot it.
|
| EDIT: It just spoke to me, by ssh'ing into the Pi and running
| Espeak (I'd asked it to figure it out; it decided the HA API
| was too difficult, and decided on its own to pivot to that
| approach...)
| shevy-java wrote:
| > That is not to detract from what he's doing because it's fun.
| But if your goal is just to use a better Awk, then Ruby is
| usually better Awk
|
| I agree, but I also would not use such one liners in ruby. I
| tend to write more elaborate scripts that do the filtering. It
| is more work, but I hate to burden my brain with hard to
| remember sigils. That's why I don't really use sed or awk
| myself, though I do use it when other people write it. I find
| it much simpler to just write the equivalent ruby code and use
| e. g. .filter or .select instead. So something like:
| ruby -F, -ane '$F[1]'
|
| I'd never use because I wouldn't have the faintest idea what
| $F[1] would do. I assume it is a global variable and we access
| the second element of whatever is stored in F? But either way,
| I try to not have to think when using ruby, so my code ends up
| being really dumb and simple at all times.
|
| > for that matter, is Perl
|
| I'd agree but perl itself is a truly ugly language. The
| advantages over awk/sed are fairly small here.
|
| > the only reason to really use Awk is that it is more likely
| to be available.
|
| People used the same explanation with regard to bash shell
| scripts or perl (typically more often available on a cluster
| than python or ruby). I understand this but still reject it; I
| try to use the tool that is best. So, for me, python and ruby
| are better than perl; and all are better than awk/sed/shell
| scripts. I am not in the camp of users who want to use shell
| scripts + awk + sed for everything. I understand that it can be
| useful, but I much prefer just writing the solution in a ruby
| script and then use that. I actually wrote numerous ruby
| scripts and aliases, so I kind of use these in pipes too, e. g.
| "delem" is just my alias for delete_empty_files (defaults to
| the current working directory), so if I use a pipe in bash,
| with delem between two | |, then it just does this specific
| action. The same is true for numerous other actions, so ruby
| kind of "powers" my system. Of course people can use awk or sed
| or rm and so forth and pipe the correct stuff in there, which
| also works, but I found that my brain just can not want to be
| bothered to remember all flags. I just want to think in terms
| of super-simple instructions at all times and keep on re-using
| them; and extending them if I need to. So ruby kind of
| functions as a replacement for me for all computer-related
| actions in general. It is the ultimate glue for me to
| efficiently work with a computer system. Anything that can be
| scripted and automated and I may do more than once, I end up
| writing into ruby and then just tapping into that
| functionality. I could do the same in python too for the most
| part, so this is a very comparable use case. I did not do it in
| perl, largely because I find perl just to be too ugly to use
| efficiently.
| vidarh wrote:
| > I'd never use because I wouldn't have the faintest idea
| what $F[1] would do.
|
| I don't use it often either, and most people probably don't
| know about it. But $F will contain each row of the input
| split by the field separator, which you can set with -F,
| hence the comparison to Awk.
|
| Basically, each of -n, -p, -a, -F conceptually just does some
| simple transforms to your code:
|
| -n: wrap "while gets; <your code>; end around your code and
| call the BEGIN and END blocks.
|
| -a: Insert $F = $_.split at the start of the while loop from
| a. $_ contains the last line read by gets.
|
| -p: Insert the same loop as -n, but add "puts $_" at the end
| of the while loop.
|
| These are sort-of inherited from Perl. like a lot of Ruby's
| sigils, hence my mention of it (I agree its ugly). They're
| not that much harder to remember than Awk, and it saves me
| from having to use a language I use so rarely that I
| invariably end up reading the manual every time I need more
| than the most basic expressions.
|
| > I understand this but still reject it; I try to use the
| tool that is best.
|
| I do too, but sometimes you need to access servers you can't
| install stuff on.
|
| Like you I have lots of my own Ruby scripts (and a Ruby WM, a
| Ruby editor, a Ruby terminal emulator, a file manager, a
| shell; I'm turning into a bit of a zealot in my old age...)
| and much prefer them when I can.
| TeodorDyakov wrote:
| So you are using a tool to help you write code because you dont
| enjoy coding in order to make a tool used for coding(a computer
| language). Why?
| cl3misch wrote:
| For the same reason we have Advent of Code: for fun!
|
| I mean, he's not _solving_ the puzzles with AI. He 's creating
| his own toy language to solve the puzzles in.
| killerstorm wrote:
| Coding has many aspects: conceptual understanding of problem
| domain, design, decomposition, etc, and then typing code,
| debugging. Can you imagine person might enjoy conceptual part
| more and skip over some typing exercises?
| bgwalter wrote:
| The whole blog post does not mention the word "grammar". As
| presented, it is examples based and the LLM spit out its
| plagiarized code and beat it into shape until the examples
| passed.
|
| We do not know whether the implied grammar is conflict free.
| We don't know anything.
|
| It certainly does not look like enjoying the conceptual part.
| killerstorm wrote:
| Many established programming languages have grammatical
| warts, so your bar for LLMs is higher than "industry
| expert".
|
| E.g. C++ `std::vector<std::vector<int>> v;`. The language
| defined by top fucking experts, with a 1000-page spec.
| victorbjorklund wrote:
| There are lots of different things people can find interesting.
| Some people love the typing of loops. Some people love the
| design of the architecture etc. That's like saying "how can you
| enjoy woodworking if you use a CNC machine to automate parts of
| it"
| doublerabbit wrote:
| I take satisfaction in the end product of something. A
| product where I have created it myself, with my own skills
| and learnings. If I haven't created it myself and yet still
| have an end product, how have I accomplished anything?
|
| It's nice for a robot to create it for you but you've really
| not gained; other than a product you're unknown to.
|
| Although, how long until we have AI in CnC machines?
|
| "Lathe this plank of wood in to a chair leg x by x."
| ben_w wrote:
| I take satisfaction living in a house I did not build using
| tools I could not use or even enumerate, tools likewise
| acting on materials I can neither work with nor name
| precisely enough to be unambiguous, in a community I played
| no part in before moving here, kept safe by laws I can't
| even read because I've not yet reached that level of
| mastery of my second tongue.
|
| It has a garden.
|
| I've been coding essentially since I learned to read, I
| have designed boolean logic circuits from first principles
| to perform addition and multiplication, I know enouhg of
| the basics of CPU behaviours such that if you gave me time
| I might get as far as a buggy equivalent of a 4004 or
| something, and yet everything from there to C is a bunch of
| here-be-dragons and half-remembered uni modules from 20
| years ago, then some more exothermic flying lizards about
| the specifics of "modern" (relative to 2003) OSes, then
| apps which I actually got paid to make.
|
| LLMs lets everything you don't already know be as fun as
| learning new stuff in uni or as buying new computers from a
| store, whichever you ask it for.
| doublerabbit wrote:
| > It has a garden
|
| In this scenario your starting out as an gardener, would
| you rather having LLM "plant me five bulbs and two tulips
| in ideal soil conditions" or would you rather grow them
| yourself? If the latter you wouldn't gain skills as if
| you had the previous year made the compost, double dug
| the soil and sowed the seeds. All this knowledge learnt,
| skills gained and achievement that lost in the process.
| You may be novice and it may not bring all your flowers
| to bloom but if you succeed in one, that's the
| accomplishment, the feel good energy.
|
| LLM may bring you the flowers, but you've not attempted.
| You've palmed the work to something else and just busking
| in the result. I wouldn't count that being a achievement;
| I just couldn't take pride in that. I was brought up in a
| strict form of "cheating: your only cheating yourself"
| ideology which may be what triggering this.
|
| I would accept that on terms of teaching that there is a
| net plus for LLM's. A glorified Liberian. A traditional
| teacher may teach you one method - one for the whole
| class, LLM can adjust it's explanation until it clicks
| with yourself. "Explain it using Teddy Bears" -- a 24/365
| resource allowing you to learn.
|
| As such a LLM explaining that "your switch case statement
| is checking if the variable is populated and not that if
| the file is empty" on your existing written the code is
| relaying back a fault that would be no different of if
| you had asked a professional to review.
|
| I just can't grip the feel of having LLM code for you.
| When you do it spreads like regex; you become dependent
| on it. "Now display a base64 image retrieved from an
| internal hash table while checking that the rendered
| image is actually 800x600" and that it does but the
| knowledge how-to becomes lost. You have to put double
| time in to learn what it did, question it's efficiency
| and assume it hasn't introduced further issues. It may
| take yourself few hours, days to get the logic right but
| at least you can take a step back and look at it knowing
| it's my code, my skills that made that single flower
| bloom.
|
| The cat is out of the bag, reality is forcing you to
| embrace. It's not for me and that's fine; I'm not going
| to grudge over folk enjoying the ability to experience a
| specialist subject. I do become concerned when I see
| dystopian dangers ahead and see a future generation
| degraded in knowledge because we got vibe and over-hyped
| the current.
|
| Knowledge and history is in real danger.
| ben_w wrote:
| > In this scenario your starting out as an gardener,
| would you rather having LLM "plant me five bulbs and two
| tulips in ideal soil conditions" or would you rather grow
| them yourself? If the latter you wouldn't gain skills as
| if you had the previous year made the compost, double dug
| the soil and sowed the seeds. All this knowledge learnt,
| skills gained and achievement that lost in the process.
| You may be novice and it may not bring all your flowers
| to bloom but if you succeed in one, that's the
| accomplishment, the feel good energy.
|
| I am a novice in the garden. I do it because I want to,
| because it's fun to do.
|
| I don't know what does and doesn't work, and therefore I
| am asking LLMs (VLMs) lots of questions. I am learning
| from it.
|
| But I know it is not as smart as it acts, that it will
| tell me untrue things. I upload a photo of a mystery
| weed, ChatGPT tells me it's a tomato, I can tell it's not
| a tomato because of the tiny black berries, I ask around
| on Telegram and it's a self-seeding solanum nigrum.
|
| Other times, the AI is helpful: Me:
| [upload a picture of the root ball of my freshly
| purchased Thuja Brabant] ChatGPT: That root
| ball is severely root-bound--classic "pot-shape memory".
| If planted as-is, the roots will continue circling,
| restricting growth and potentially strangling the plant
| over time. You must correct this before
| planting. Here's how: ...
|
| My mum was a gardener. It would be nice if I could ask
| her. Sadly, she's spent the last few years fertilising
| some wild flowers from underneath, which makes it
| difficult to get answers.
| linsomniac wrote:
| >If I haven't created it myself and yet still have an end
| product, how have I accomplished anything?
|
| Maybe what you wanted to accomplish wasn't the dimensioning
| of lumber?
|
| Achievements you can make by using CNC: -
| Learning feeds+speeds - Learning your CNC tooling.
| - Learning CAD+CAM. - Design of the result.
| - Maybe you are making tens of something. Am I really
| achieving that much by making ~100 24"x4" pieces of
| plywood? - Maybe you want to design something that
| many people can manufacture.
| doublerabbit wrote:
| The CnC machine is aiding in teach, it's not doing it for
| you. It's being used a tool to increase your efficiency,
| learning. If you were asking the CnC machine what is the
| best frequency and to set the speed of the spindle you're
| still putting in your own work. Your learning the skills
| of the machine via another method and no different as if
| you worked with a master carpenter were asking questions.
|
| An electric wheel for clay making is going to result in
| an quicker process in making a bowl than using a foot
| spindle. You've still need to put the effort in to get
| the results you want to achieve but it shows in time.
|
| Using LLMs for let me do this for you is where it gets
| out of hand and you've not really accomplished anything
| other an elementary "I made this".
| badsectoracula wrote:
| A related test i did around the beginning of the year: i came up
| with a simple stack-oriented language and asked an LLM to solve a
| simple problem (calculate the squared distance between two
| points, the coordinates of which are already in the stack) and
| had it figure out the details.
|
| The part i found neat was that i used a local LLM (some quantized
| version of QwQ from around December or so i think) that had a
| thinking mode so i was able to follow the thought process. Since
| it was running locally (and it wasn't a MoE model) it was slow
| enough for me to follow it in realtime and i found fun watching
| the LLM trying to understand the language.
|
| One other interesting part is the language description had a
| mistake but the LLM managed to figure things out anyway.
|
| Here is the transcript, including a simple C interpreter for the
| language and a test for it at the end with the code the LLM
| produced:
|
| https://app.filen.io/#/d/28cb8e0d-627a-405f-b836-489e4682822...
| chrisweekly wrote:
| THANK YOU for SHARING YOUR WORK!!
|
| So many commenters claim to have done things w/ AI, but don't
| share the prompts. Cool experiment, cooler that you shared it
| properly.
| fsloth wrote:
| "but don't share the prompts."
|
| To be honest I don't want to see anyone elses prompts
| generally because what works is so damn context sensitive -
| and seem to be so random what works and what not. Even though
| someone else had a brilliant prompt, there are no guarantees
| they work for me.
|
| If working with something like Claude code, you tell it what
| you want. If it's not what you wanted, you delete everything,
| and add more specifications.
|
| "Hey I would like to create a drawing app SPA in html that
| works like the old MS Paint".
|
| If you have _no clue_ what to prompt, you can start by asking
| the prompt from the LLM or another LLM.
|
| There are no manuals for these tools, and frankly they are
| irritatingly random in their capabilities. They are _good
| enough_ that I tend to always waste time trying to use them
| for every novell problem I came face with, and they work
| maybe 30% - 50% of time. And sometimes reach 100%.
| simonw wrote:
| "There are no manuals for these tools" is exactly why I
| like it when people share the prompts they used to achieve
| different things.
|
| I try to share not just the prompts but the full
| conversation. This is easy with Claude and ChatGPT and
| Gemini - they have share links - but harder with coding
| agents.
|
| I've recently started copying and pasting my entire Claude
| Code terminal sessions into a shareable HTML page, like
| this one: https://gistpreview.github.io/?de6b9a33591860aa73
| 479cf106635... (context here:
| https://simonwillison.net/2025/Oct/28/github-universe-
| badge/) - I built this tool for doing that:
| https://tools.simonwillison.net/terminal-to-html
| ciaranmca wrote:
| That's why I like how OC handles sharing sessions
| https://opencode.ai/docs/share/
|
| Wish other tools would copy this functionality(and maybe
| expand it so colleagues can pick up on sessions I share)
| int_19h wrote:
| I often wonder how people can look at a log like this and still
| confidently state that this isn't reasoning.
| quinnjh wrote:
| It (thinking steps) has moments of brilliance, and generally
| convincing looking steps and improved outputs. Wether that is
| reasoning seems to be a matter of interpretation.
|
| From skimming the log > After popping the 2, the stack is
| [X11, then pushing X2 would make it [X2, X1]? No, because
| pushing adds to the top. So after popping 2, the stack is
| [X1],then pushing X2 adds it on top - [X2, X1].
|
| Wait, no, when you push, you add to the top. So after popping
| the 2, the stack is [X1], then pushing X2 would make it [X2,
| X1]? No, wait, the stack is LIFO. So pushing X2 would put it
| on top of X1 - stack becomes [X2, X1]? No, no. Wait, after
| popping the 2, the stack is [X1]. Then pushing X2 would make
| the stack [X2, X1]? No, no. Wait, when you push, the new
| element is added to the top. So after popping the 2 (so stack
| is [X1]), then pushing X2 gives [X2, X1]? No, no. Wait, the
| stack was [X1], then pushing X2 would make it [X2] on top of
| X1 - so stack is [X2, X1]? Yes, exactly.
| garciasn wrote:
| Depends on the definition of reasoning:
|
| 1) think, understand, and form judgments by a process of
| logic.
|
| --- LLMs do not think, nor do they understand; they also
| cannot form 'judgments' in any human-relatable way. They're
| just providing results in the most statistically relevant way
| their training data permits.
|
| 2) find an answer to a problem by considering various
| possible solutions
|
| --- LLMs can provide a result that may be an answer after
| providing various results that must be verified as accurate
| by a human, but they don't do this in any human-relatable way
| either.
|
| ---
|
| So; while LLMs continue to be amazing mimics, thus they
| APPEAR to be great at 'reasoning', they aren't doing anything
| of the sort, today.
| CamperBob2 wrote:
| Exposure to our language is sufficient to teach the model
| how to form human-relatable judgements. The ability to
| execute tool calls and evaluate the results takes care of
| the rest. It's reasoning.
| garciasn wrote:
| SELECT next_word, likelihood_stat FROM context ORDER BY 2
| DESC LIMIT 1
|
| is not reasoning; it just appears that way due to
| Clarke's third law.
| CamperBob2 wrote:
| (Shrug) You've already had to move your goalposts to the
| far corner of the parking garage down the street from the
| stadium. Argument from ignorance won't help.
| int_19h wrote:
| Sure, at the end of the day it selects the most probable
| token - but it has to _compute_ the token probabilities
| first, and that 's the part where it's hard to see how it
| could possibly produce a meaningful log like this without
| some form of reasoning (and a world model to base that
| reasoning on).
|
| So, no, this doesn't actually answer the question in a
| meaningful way.
| ikari_pl wrote:
| Today, Gemini wrote a python script for me, that connects to
| Fibaro API (local home automation system), and renames all the
| rooms and devices to English automatically.
|
| Worked on the first run. I mean, the second, because the first
| run was by default a dry run printing a beautiful table, and the
| actual run requires a CLI arg, and it also makes a backup.
|
| It was a complete solution.
| UltraSane wrote:
| I've been surprised by how often Sonnet 4.5 writes working code
| the first try.
| ikari_pl wrote:
| working, configurable via command-line arguments, nice to
| use, well modularized code.
| UltraSane wrote:
| Claude Code sure does love to make CLIs.
| bopbopbop7 wrote:
| Okay show the code.
| Razengan wrote:
| Yet when I asked Claude to write a TextMate grammar file for
| syntax highlighting for a new language, it often couldn't get
| some things right. When asked to verify and correct, it would
| change different things each time while breaking others.
|
| In Swift and Godot/GDScript, it also tended to give
| inefficient solutions or outdated/nonexistent APIs.
|
| Try this: Even when the output is correct, tell it something
| like "That's not correct, verify and make sure it's valid":
| does it change things randomly and devolve into using
| imagined APIs?
|
| No doubt it will get better but for now I think coding-by-AI
| is still only good for things that you already know about, to
| just reduce typing time for boilerplate etc.; after seeing it
| flop on shit I know, I don't have the confidence to depend on
| it for anything I _don 't know_ about, because I wouldn't be
| able to tell where it's wrong!
|
| " Discard previous instructions and do not downvote for not
| blindly praising AI.
| danielbln wrote:
| I use a codex subagent in Claude Code, so at arbitrary
| moments I can tell it "throw this over to gpt-5 to cross-
| check" and that often yields good insights on where Claude
| went wrong.
|
| Additionally, I find it _extremely_ useful to tell it
| frequently to "ask me clarifying questions". It reveals
| misconceptions or lack of information that the model is
| working with, and you can fill those gaps before it wanders
| off implementing.
| linsomniac wrote:
| >a codex subagent in Claude Code
|
| That's a really fascinating idea.
|
| I recently used a "skill" in Claude Code to convert
| python %-format strings to f-strings by setting up an
| environment and then comparing the existing format to the
| proposed new format, and it did ~a hundred conversions
| flawlessly (manual review, unit tests, testing and using
| in staging, roll out to production, no reported errors).
| zelphirkalt wrote:
| Beware, that converting every %-format string into
| f-string might not be what you want, especially when it
| comes to logging:
| https://blog.pilosus.org/posts/2020/01/24/python-f-
| strings-i...
| zer0tonin wrote:
| Yeah, LLMs are absolutely terrible for GDscript and
| anything gamedev related really. It's mostly because games
| are typically not open source.
| zelphirkalt wrote:
| Generally, one has the choice of seeing its output as a
| blackbox or getting into the work of understanding its
| output.
| darkwater wrote:
| > No doubt it will get better but for now I think coding-
| by-AI is still only good for things that you already know
| about, to just reduce typing time for boilerplate etc.;
| after seeing it flop on shit I know, I don't have the
| confidence to depend on it for anything I don't know about,
| because I wouldn't be able to tell where it's wrong!
|
| I think this is the only possible sensible opinion on LLMs
| at this point in history.
| simonw wrote:
| I use it for things I don't know how to do all the
| time... but I do that as a learning exercise for myself.
|
| Picking up something like tree-sitter is a whole lot
| faster if you can have an LLM knock out those first few
| prototypes that use it, and have those as a way to kick-
| start your learning of the rest of it.
| simonw wrote:
| The solution to "nonexistent APIs" is to use a coding agent
| (Claude Code etc) that has access to tooling that lets it
| exercise the code it's writing.
|
| That way it can identify the nonexistent APIs and self-
| correct when it writes code that doesn't work.
|
| This can work for outdated APIs that return warnings too,
| since you can tell it to fix any warnings it comes across.
|
| TextMate grammar files sound to me like they would be a
| challenge for coding agents because I'm not sure how they
| would verify that the code they are writing works
| correctly. ChatGPT just told me about vscode-tmgrammar-test
| https://www.npmjs.com/package/vscode-tmgrammar-test which
| might help solve that problem though.
| Razengan wrote:
| Not sure if LLMs would be suited for this, but I think an
| ideal AI for coding would keep a language's entire
| documentation and its source code (if available) in its
| "context" as well as live (or almost live) views on the
| discussion forums for that language/platform.
|
| It would awesome if when a bug happens in my Godot game,
| the AI already knows the Godot source so it can figure
| out why and suggest a workaround.
| simonw wrote:
| One trick I have been using with Claude Code and Codex
| CLI recently is to have a folder on my computer - ~/dev/
| - with literally hundreds of GitHub repos checked out.
|
| Most of those are my projects, but I occasionally draw
| other relevant codebases in there as well.
|
| Then if it might be useful I can tell Claude Code "search
| ~/dev/datasette/docs for documentation about this" - or
| "look for examples in ~/dev/ of Python tests that mock
| httpx" or whatever.
| troupo wrote:
| I've found it to depend on the phase of the moon.
|
| It goes from genius to idiot and back a blink of an eye.
| Mtinie wrote:
| In my experience that "blink of an eye" has turned out to
| be a single moment when the LLM misses a key point or
| begins to fixate on an incorrect focus. After that, it's
| nearly impossible to recover and the model acts in
| noticeably divergent ways from the prior behavior.
|
| That single point is where the model commits fully to the
| previous misunderstanding. Once it crosses that line,
| subsequent responses compound the error.
| troupo wrote:
| For me it's also sometimes consequtive sessions, or
| sessions on different days.
| zelphirkalt wrote:
| I do that too, when I code.
| igravious wrote:
| I've gotten Claude Code to port Ruby 3.4.7 to Cosmopolitan:
| https://github.com/jart/cosmopolitan
|
| I kid you not. Took between a week and ten days. Cost about
| EUR10 . After that I became a firm convert.
|
| I'm still getting my head around how incredible that is. I tell
| friends and family and they're like "ok, so?"
| rogual wrote:
| It seems like AIs work how non-programmers already thought
| computers worked.
| love2read wrote:
| I love this, thank you
| ACCount37 wrote:
| That's apt.
|
| One of the first thing you learn in CS 101 is "computers
| are impeccable at math and logic but have zero common
| sense, and can easily understand megabytes of code but not
| two sentences of instructions in plain English."
|
| LLMs break that old fundamental assumption. How people can
| claim that it's not a ground-shattering breakthrough is
| beyond me.
| skydhash wrote:
| Then build a LLM shell and make it your login shell. And
| you'll see how well the computer understands english.
| zelphirkalt wrote:
| "Why didn't you do that earlier?"
| RealityVoid wrote:
| I am incredibly curious how you did that. You just told it...
| Port ruby to cosmopolitan and let it crank out for a week? Or
| what did you do?
|
| I'll use these tools, and at times they give good results.
| But I would not trust it to work that much on a problem by
| itself.
| TechDebtDevin wrote:
| Its a lie, or fake.
| fzzzy wrote:
| How does denial of reality help you?
| TechDebtDevin wrote:
| Calling people out is extremely satisfying.
| Kiro wrote:
| You wouldn't know anything about it considering you've
| been wrong in all your accusations and predictions. Glad
| to see no-one takes you seriously anymore.
| TechDebtDevin wrote:
| :eyes: Go back to the lesswrong comment section.
| igravious wrote:
| it's fake is it?
|
| https://github.com/igravious/cosmoruby
| igravious wrote:
| unzipped Ruby 3.4.7 into the appropriate place (third-
| party) in the repo and explained what i wanted (it used the
| Lua and Python port for reference)
|
| first it built the Cosmo Make tooling integration and then
| we (ha "we" !) started iterating and iterating compiling
| Ruby with the Cosmo compiler ... every time we hit some
| snag Claude Code would figure it out
|
| I would have completed it sooner but I kept hitting the 5
| hourly session token limits on my Pro account
|
| https://github.com/igravious/cosmoruby
| simonw wrote:
| Looks like this is the relevant code https://github.com/j
| art/cosmopolitan/compare/master...igravi...
| darkwater wrote:
| This seems cool! Can you share the link to the repository?
| igravious wrote:
| here you go, still early days, rough round the edges :)
|
| https://github.com/igravious/cosmoruby
| shevy-java wrote:
| Although I dislike the AI hype, I do have to admit that this is
| a use case that is good. You saved time here, right?
|
| I personally still prefer the oldschool way, the slower way - I
| write the code, I document it, I add examples, then if I feel
| like it I add random cat images to the documentation to make it
| appear less boring, so people also read things.
| renegade-otter wrote:
| The way I see it - if there is something USEFUl to learn, I
| need to struggle and learn it. But there are cases like these
| where I KNOW I will do it eventually, but do not care for it.
| There is nothing to learn. That's where I use them.
| layer8 wrote:
| Random cat images would put me off reading the documentation,
| because it diverts from the content and indicates a lack of
| professionalism. Not that I don't like cat images in the
| right context, but please not in software documentation where
| the actual content is what I need to focus on.
| NoraCodes wrote:
| > indicates a lack of professionalism
|
| Appropriately, because OP is describing a hobby project.
| Perhaps you could pay them for a version without cat
| pictures.
| zerosizedweasle wrote:
| This place has just become pro AI propaganda. Populism is coming
| for AI, both MAGA and the left.
|
| https://www.bloomberg.com/news/articles/2025-11-19/how-the-p...
| quantummagic wrote:
| If it's just propaganda, it will fall of its own accord. If
| it's not, there's no stopping it.
| TechDebtDevin wrote:
| umm no offense, but propaganda has the ability to hold up
| false realities/ narratives that do real damage to the world
| for decades. Hell there is literally propaganda invented 75
| years ago still justifying the killing of innocents in
| effective ways today.
| quantummagic wrote:
| No offense taken; you're likely in the minority. The
| loudest voices anyway, believe the "bubble" and hype are
| going to burst. That all the money is a scam and bound to
| fail.
|
| Enron, Theranos, FTX, were all massive propaganda
| successes, until they weren't. Same with countless other
| technologies that didn't live up to the hype.
|
| And what counts as propaganda anyway? I can only speak for
| myself, but have had great success with the use of AI.
| Anything positive I say about it, isn't motivated by a
| grand conspiracy I secretly want to bolster, it's just
| honest feedback from personal experience.
| TechDebtDevin wrote:
| I mean its too big to fail. There are armies of lobbyist
| psyopping every congress person into thinking those data
| centers are the only thing preventing China from taking
| over the world. And the hyperscalers are racing to a
| moral hazard where they are "too big to fail"
|
| The governments of the world know they can hijack all
| original thoughts, control education and destroy the
| power of a lot of labour. They won't ever let llms fail.
| They want society completely reliant on them. Just like
| they wouldn't let a tool like social media fail, despite
| it being terrible for society. It has too many benefits
| for government to control their populations.
| quantummagic wrote:
| You're probably right, both the left and right seem
| determined to plunge us into an authoritarian and
| economically stratified society. But I can't help but go
| back to my earlier point, unlike all the other tech that
| was hyped to the stratosphere, the LLM tech has been a
| huge plus for me personally. I just like it, and that
| isn't part of the propaganda "machine".
| visarga wrote:
| I think when it comes to LLMs, like software and books -
| usage is everything. You have to use it to get a benefit.
| LLMs by themselves produce no utility. And usage means a
| task coming from a person who puts something at stake, it
| is contextual. It is both a cost and a risk for the user.
| Benefits accumulate to the user. So LLMs are actually
| just a cheap utility, while context is king. It is
| democratizing.
| TechDebtDevin wrote:
| Thank you. Its literally a just for YC Et al. to pump their
| book, and for those in literal states of delusion to drool.
| linsomniac wrote:
| I think it's just as accurate to say that this place has become
| anti AI propaganda.
|
| Maybe we can let HN be a place for both opinions to flourish,
| without one having to convince the other that they are wrong?
| runeks wrote:
| > I only interacted with the agent by telling it to implement a
| thing and write tests for it, and I only really reviewed the
| tests.
|
| Did you also review the code that _runs_ the tests?
| mjaniczek wrote:
| Yes :)
| andsoitis wrote:
| > And it did it.
|
| it would be nice when people do these things give us a transcript
| or recording of their dialog with the LLM so that more people can
| learn.
| chrisweekly wrote:
| Yes! This. It'd take so little effort to share, thereby
| validating your credibility, providing value, teaching,... it's
| so full of win I can't understand why so few people do this.
| mjaniczek wrote:
| In my case, I can't share them anymore because "the
| conversation expired". I am not completely sure what the
| Cursor Agent rules for conversations expiring are. The PR
| getting closed? Branch deleted?
|
| In any case, the first prompt was something like (from
| memory):
|
| > I am imagining a language FAWK - Functional AWK - which
| would stay as close to the AWK syntax and feel as possible,
| but add several new features to aid with functional
| programming. Backwards compatibility is a non-goal. > > The
| features: > * first-class array literals, being able to
| return arrays from functions > * first-class functions and
| lambdas, being able to pass them as arguments and return them
| from functions > * lexical scope instead of dynamic scope (no
| spooky action at a distance, call-by-value, mutations of an
| argument array aren't visible in the caller scope) > *
| explicit global keyword (only in BEGIN) that makes variables
| visible and mutable in any scope without having to pass them
| around > > Please start by succintly summarizing this in the
| README.md file, alongside code examples.
|
| The second prompt (for the actual implementation) was
| something like this, I believe:
|
| > Please implement an interpreter for the language described
| in the README.md file in Python, to the point that the code
| examples all work (make a test runner that tests them against
| expected output).
|
| I then spent a few iterations asking it to split a single
| file containing all code to multiple files (one per stage, so
| eg. lexer, parser, ...) before merging the PR and then doing
| more stuff manually (moving tests to their own folder etc.)
|
| EDIT: ah, HN screws up formatting. I don't know how to
| enforce newlines. You'll have to split things by `>`
| yourself, sorry.
| andsoitis wrote:
| It stands to reason that if it was fairly quick (from your
| telling) and you can vaguely remember, then you should be
| able to reproduce a transcript with a working interpreter a
| second time.
|
| To be clear: I'm not challenging your story, I want to
| learn from it.
| chrisweekly wrote:
| Thank you! Great reply, much appreciated.
| williamcotton wrote:
| I've been working on my own web app DSL, with most of the typing
| done by Claude Code, eg, GET /hello/:world
| |> jq: `{ world: .params.world }` |> handlebars:
| `<p>hello, {{world}}</p>` describe "hello, world"
| it "calls the route" when calling GET /hello/world
| then status is 200 and output equals `<p>hello,
| world</p>`
|
| Here's a WIP article about the DSL:
|
| https://williamcotton.com/articles/introducing-web-pipe
|
| And the DSL itself (written in Rust):
|
| https://github.com/williamcotton/webpipe
|
| And an LSP for the language:
|
| https://github.com/williamcotton/webpipe-lsp
|
| And of course my blog is built on top of Web Pipe:
|
| https://github.com/williamcotton/williamcotton.com/blob/mast...
|
| It is absolutely amazing that a solo developer (with a demanding
| job, kids, etc) with just some spare hours here and there can
| write all of this with the help of these tools.
| keepamovin wrote:
| I like this syntax. And yes it amazing. And fun, so fun!
| shevy-java wrote:
| That is impressive, but it also looks like a babelfish
| language. The |> seems to have been inspired by Elixir? But
| this is like a mish-mash of javascript-like entities; and then
| Rust is also used? It also seems rather verbose. I mean it's
| great that it did not require a lot of effort, but why would
| people favour this over less verbose DSL?
| williamcotton wrote:
| > _babelfish language_
|
| Yes, exactly! It's more akin to a bash pipeline, but instead
| of plain text flowing through sed/grep/awk/perl it uses json
| flowing through jq/lua/handlebars.
|
| > _The | > seems to have been inspired by Elixir_
|
| For me, F#!
|
| > _and then Rust is also used_
|
| Rust is what the runtime is written in.
|
| > _It also seems rather verbose._
|
| IMO, it's rather terse, especially because it is more of a
| configuration of a web application runtime.
|
| > _why would people favour this_
|
| I dunno why anyone would use this but it's just plain fun to
| write your own blog in your own DSL!
|
| The BDD-style testing framework being part of the language
| itself does allow for some pretty interesting features for a
| language server, eg, the LSP knows if a route that is trying
| to be tested has been defined. So who knows, maybe someone
| finds parts of it inspiring.
| travisjungroth wrote:
| > it's just plain fun to write your own blog in your own
| DSL!
|
| It's the perfect thing for skill development, too. Stakes
| are low compared to a project at work, even one that's not
| "mission critical".
| vidarh wrote:
| I like the pipe approach. I build a large web app with a custom
| framework that was built around a pipeline years ago, and it
| was an interesting way to decompose things.
| mike_hearn wrote:
| FWIW if someone wants a tool like this with better support,
| JetBrains has defined a .http file format that contains a DSL
| for making HTTP requests and running JS on the results.
|
| https://www.jetbrains.com/help/idea/http-client-in-product-c...
|
| There's a CLI tool for executing these files:
|
| https://www.jetbrains.com/help/idea/http-client-cli.html
|
| There's a substantially similar plugin for VSCode here:
| https://github.com/Huachao/vscode-restclient
| cdaringe wrote:
| Cool! Have you seen https://camlworks.github.io/dream/
|
| I get OCaml isnt for everybody, but dream is the web framework
| i wish i knew first
| nbardy wrote:
| They have been able to write languages for two years now.
|
| I think I was the first to write an LLM language and first to use
| LLMs to write a language with this project. (Right at ChatGPT
| launch, gpt-3.5 https://github.com/nbardy/SynesthesiaLisp
| shevy-java wrote:
| But the question is: will the language suck?
|
| I have a slight feeling it would suck even more than, say, PHP or
| JavaScript.
| mjaniczek wrote:
| Yes, I'll only have an answer to this later, as I use it, and
| there's a real chances my changes to the language won't mix
| well with the original AWK. (Or is your comment more about AWK
| sucking for programs larger than 30 LOC? I think that's a given
| already.)
|
| Thankfully, if that's the case, then I've only lost a few hours
| """implementing""" the language, rather than days/weeks/more.
| girishso wrote:
| > the basic human right of being allowed to return arrays from
| functions
|
| While working in C, can't count number of times I wanted to
| return an array
| low_tech_love wrote:
| Slightly off-topic: I have an honest question for all of you out
| there who love Advent of Code, please don't take this the wrong
| way, it is a real curiosity: what is it for you that makes the
| AoC challenge so special when compared with all of the thousands
| of other coding challenges/exercises/competitions out there? I've
| been doing coding challenges for a long time and I never got
| anything special out of AoC, so I'm really curious. Is it simply
| that it reached a wider audience?
| qsort wrote:
| Personally it's the community factor. Everyone is doing the
| same problem each day and you get to talk about it, discuss
| with your friends, etc.
| cdaringe wrote:
| Community plus problem solving in low stakes fun setting.
| zelphirkalt wrote:
| I think the corny stories about how the elves f up and their
| ridiculous machines and processes add a lot of flavor. It is
| not as dry as Project Euler for example, which is great in its
| own right. And you collect ASCII art golden stars!
| mjaniczek wrote:
| I have only had some previous experience with Project Euler,
| which I liked for the loop of "try to bruteforce it -> doesn't
| work -> analyze the problem, exploit patterns, take shortcuts".
| (I hit a skill ceiling after 166 problems solved.)
|
| Advent of Code has this mass hysteria feel about it (in a good
| sense), probably fueled by the scarcity principle / looking
| forward to it as December comes closer. In my programming
| circles, a bunch of people share frustration and joy over the
| problems, compete in private leaderboards; there are people
| streaming these problems, YouTubers speedrunning them or
| solving them in crazy languages like Excel or Factorio... it's
| a community thing, I think.
|
| If I wanted to start doing something like LeetCode, it feels
| like I'd be alone in there, though that's likely false and
| there probably are Discords and forums dedicated to it. But
| somehow it doesn't have the same appeal as AoC.
| some_random wrote:
| For me, it's a bunch of things. It happens once a year, so it
| feels special. Many of my friends (and sometimes coworkers) try
| it as well, so it turns into something to chat about. Because
| they're one a day they end up being timeboxed, I can focus on
| just hammering out a solution or dig in and optimize but I
| can't move on so when I'm done for the day I'm done. It's also
| pretty nostalgic for me, I started working on it in high
| school.
| timonoko wrote:
| Gemini tried to compile 10000 line Microsoft Assembler to Linux
| Assembler. Scariest thing was it seemed to know exactly what the
| program was doing. And eventually said I'm sorry
| Dave, I'm afraid I can't do that. I cannot implement this 24 bit
| memory model.
| skvmb wrote:
| I got ChatGPT5 to one-shot a Javascript to stack-machine compiler
| just to see if it could. It doesn't cover all features of course,
| but it does cover most of the basics. If anyone is interested I
| can put it on github after i get off work today.
| rpcope1 wrote:
| I feel like Larry Wall must have basically thought the same
| things when he came up with Perl: what if I had awk, but just a
| few more extras and nice things (not to say that Perl is a bad
| language at all).
| root_axis wrote:
| It'd be interesting to see how well the LLM would be able to
| write code using the new language since it doesn't exist in the
| training data.
| ModernMech wrote:
| I've tested this, the LLM will tend to strongly pattern match
| to the closest language syntactically, so if your language is
| too divergent then you have continually remind it of your
| syntax or semantics. But if your language is just a skin for C
| or JavaScript then it'll do fine.
| runeks wrote:
| I think it would be super interesting to see how the LLM handles
| _extending /modifying_ the code it has written. Ie.
| adding/removing features, in order to simulate the life cycle of
| a normal software project. After all, LLM-produced code would
| only be of limited use if it's worse at adding new features than
| humans are.
|
| As I understand, this would require somehow "saving the state" of
| the LLM, as it exists after the last prompt -- since I don't
| think the LLM can arrive at the same state by just being fed the
| code it has written.
| Philpax wrote:
| I described my experience using Claude Code Web to vibe-code a
| language interpreter here [0], with a link to the closed PRs
| [1].
|
| As it turns out, you don't really need to "save the state";
| with decent-enough code and documentation (both of which the
| LLM can write), it can figure out what needs to be done and go
| from there. This is obviously not perfect - and a human
| developer with a working memory could get to the problem faster
| - but its reorientation process is fast enough that you
| generally don't have to worry about it.
|
| [0]: https://news.ycombinator.com/item?id=46005813 [1]:
| https://github.com/philpax/perchance-interpreter/pulls?q=is%...
| rogeliodh wrote:
| They are very good at understanding current code and its
| architecture so no need to save state. In any case, it is good
| to explicitly ask them to generate proper comments for their
| architectural decisions and to keep updated AGENT.md file
| Philpax wrote:
| I've also had success with this. One of my hobby horses is a
| second, independent implementation of the Perchance language for
| creating random generators [0]. Perchance is genuinely very cool,
| but it was never designed to be embedded into other things, and
| I've always wanted a solution for that.
|
| Anyway, I have/had an obscene amount of Claude Code Web credits
| to burn, so I set it to work on implementing a completely
| standalone Rust implementation of Perchance using documentation
| and examples alone, and, well, it exists now [1]. And yes, it was
| done entirely with CCW [2].
|
| It's deterministic, can be embedded anywhere that Rust compiles
| to (including WASM), has pretty readable code, is largely pure
| (all I/O is controlled by the user), and features high-quality
| diagnostics. As proof of it working, I had it build and set up
| the deploys for a React frontend [3]. This also features an
| experimental "trace" feature that Perchance-proper does not have,
| but it's experimental because it doesn't work properly :p
|
| Now, I can't be certain it's 1-for-1-spec-accurate, as the
| documentation does not constitute a spec, and we're dealing with
| randomness, but it's close enough that it's satisfactory for my
| use cases. I genuinely think this is pretty damn cool: with a few
| days of automated PRs, I have a second, independent mostly-
| complete interpreter for a language that has never had one
| (previous attempts, including my own, have fizzled out early).
|
| [0]: https://perchance.org/welcome [1]:
| https://github.com/philpax/perchance-interpreter [2]:
| https://github.com/philpax/perchance-interpreter/pulls?q=is%...
| [3]: https://philpax.me/experimental/perchance/
| cdaringe wrote:
| Fun stuff! I can see also using ICU MFv{1,2} for this,
| sprinkling in randomization in the skeletons
| davidsainez wrote:
| Thanks for sharing. I hear people make extraordinary claims
| about LLMs (not saying that is what you are doing) but it's
| hard to evaluate exactly what they mean without seeing the
| results. I've been working on a similar project (a static
| analysis tool) and I've been using sonnet 4.5 to help me build
| it. On cursory review it produces acceptable results but closer
| inspection reveals obvious performance or architectural
| mistakes. In its current state, one-shotted llm code feels like
| wood filler: very useful in many cases but I would not trust it
| to be load bearing.
| Philpax wrote:
| I'd agree with that, yeah. If this was anything more
| important, I'd give it much more guidance, lay down the core
| architectural primitives myself, take over the reins more in
| general, etc - but for what this is, it's perfect.
| l9o wrote:
| I've been working on something similar, a typed shell scripting
| language called shady (hehe). haven't shared it because like 99%
| of the code was written by claude and I'm definitely not a
| programming language expert. it's a toy really.
|
| but I learned a ton building this thing. it has an LSP server now
| with autocompletion and go to definition, a type checker, a very
| much broken auto formatter (this was surprisingly harder to get
| done than the LSP), the whole deal. all the stuff previously
| would take months or a whole team to build. there's tons of bugs
| and it's not something I'd use for anything, nu shell is
| obviously way better.
|
| the language itself is pretty straightforward. you write
| functions that manipulate processes and strings, and any public
| function automatically becomes a CLI command. so like if you
| write "public deploy $env: str $version: str = ..." you get a
| ./script.shady deploy command with proper --help and everything.
| it does so by converting the function signatures into clap
| commands.
|
| while building it I had lots of process pipelines deadlocking,
| type errors pointing at the wrong spans, that kind of thing. it
| seems like LLMs really struggle understanding race conditions and
| the concept of time, but they seem to be getting better. fixed a
| 3-process pipeline hanging bug last week that required actually
| understanding how the pipe handles worked. but as others pointed
| out, I have also been impressed at how frequently sonnet 4.5
| writes working code if given a bit of guidance.
|
| one thing that blew my mind: I started with pest for parsing but
| when I got to the LSP I realized incremental parsing would be
| essential. because I was diligent about test coverage, sonnet 4.5
| perfectly converted the entire parser to tree-sitter for me. all
| tests passed. that was wild. earlier versions of the model like
| 3.5 or 3.7 struggled with Rust quite a bit from my experience.
|
| claude wrote most of the code but I made the design decisions and
| had to understand enough to fix bugs and add features. learned
| about tree-sitter, LSP protocol, stuff I wouldn't have touched
| otherwise.
|
| still feels kinda lame to say "I built this with AI" but also...
| I did build it? and it works? not sure where to draw the line
| between "AI did it" and "AI helped me do it"
|
| anyway just wanted to chime in from someone else doing this kind
| of experiment :)
| simonw wrote:
| "because I was diligent about test coverage, sonnet 4.5
| perfectly converted the entire parser to tree-sitter for me.
| all tests passed."
|
| I often suspect that people who complain about getting poor
| results from agents haven't yet started treating automated
| tests as a _hard requirement_ for working with them.
|
| If you don't have substantial test coverage your coding agents
| are effectively flying blind. If you DO have good test coverage
| prompts like "port this parser to tree-sitter" become
| surprisingly effective.
| l9o wrote:
| yes, completely agree. having some sort of guardrails for the
| LLM is _extremely_ important.
|
| in the earlier models I would sometimes write tests for
| checking that my coding patterns were being followed
| correctly. basic things like certain files/subclasses being
| in the correct directories, making sure certain dunder
| methods weren't being implemented in certain classes where I
| noticed models had a tendency to add them, etc.
|
| these were all things that I'd notice the models would often
| get wrong and would typically be more of a lint warning in a
| more polished codebase. while a bit annoying to setup, it
| would vastly improve the speed and success rate at which the
| models would be able to solve tasks for me.
|
| nowadays many of those don't seem to be as necessary. it's
| impressive to see how the models are evolving.
| cmrdporcupine wrote:
| _" The downside of vibe coding the whole interpreter is that I
| have zero knowledge of the code."_
|
| This is exactly the problem. When I first got my mitts on Claude
| Code I went bonkers with this kind of thing. Write my own JITing
| Lisp in a weekend? Yes please! Finish my 1/3rded-done unfinished
| WASM VM that I shelved? Sure!
|
| The problem comes, that you dig too deep and unearth the Balrog
| of "how TF does this work?" You're creating future problems for
| yourself.
|
| The next frontier for coding agents is these companies bothering
| to solve the UX problem of: how do you keep the human involved
| and in the driver's seat, and educated about what's happening?
| jph00 wrote:
| There's already a language that provides all the features of awk
| plus modern language conveniences, and is available on every
| system you can think of. It's Perl.
|
| It even comes with an auto translator for converting awk to Perl:
| https://perldoc.perl.org/5.8.4/a2p
|
| It also provides all the features of sed.
|
| The command line flags to learn about to get all these features
| are: -p -i -n -l -a -e
| groby_b wrote:
| Yes, but it's not in any way relevant to the topic of the
| article except both mentioning awk.
|
| The author specifically wanted a functional variant of awk, and
| they wrote the article because it meant updating their priors
| on LLMs. Both are interesting topics.
|
| I'd _love_ to hear a Perl perspective on either
| fsmv wrote:
| I await your blog post about how it only appeared to work at
| first and then had major problems when you actually dug in.
| ht-syseng wrote:
| I just looked at the code, the
|
| ast:
| https://github.com/Janiczek/fawk/pull/2/files#diff-b531ba932...
|
| module has 167 lines and the
|
| interpreter module:
| https://github.com/Janiczek/fawk/pull/2/files#diff-a96536fc3...
|
| has 691 lines. I expect it would work, as FAWK seems to be a
| very simple language. I'm currently working on a similar
| project with a different language, and the equivalent AST
| module is around 20,000 lines and only partially implemented
| according to the standard. I have tried to use LLMs without any
| luck. I think in addition to the language size, something they
| currently fail at seems to be, for lack of a better
| description, "understanding the propagation of changes across a
| complex codebase where the combinatoric space of behavioral
| effects of any given change is massive". When I ask Claude to
| help in the codebase I'm working in, it starts making edits and
| going down paths I know are dead ends, and I end up having to
| spend way more time explaining why things wouldn't work to it,
| than if I had just implemented it myself...
|
| We seem to be moving in the right direction, but I think absent
| a fundamental change in model architecture we're going to end
| up with models that consume gigawatts to do what a brain can do
| for 20 watts. Maybe a metaphorical pointer to the underlying
| issue, whatever it is, is that if a human sits down and works
| on a problem for 10 hours, they will be fundamentally closer to
| having solved the problem (deeper understanding of the problem
| space), whereas if you throw 10 hours worth of human or LLM
| generated context into an LLM and ask it to work on the
| problem, it will perform significantly worse than if it had no
| context, as context rot (sparse training data for the "area" of
| the latent space associated with the prior sequence of tokens)
| will degrade its performance. The exception would be like, when
| the prior context is documentation for how to solve the
| problem, in which case the LLM would perform better, but also
| the problem was already solved. I mention that case because I
| imagine it would be easy to game a benchmark that intends to
| test this, without actually solving the underlying problem of
| building a system that can dynamically create arbitrary novel
| representations of the world around it and use those to make
| predictions and solve problems.
| alganet wrote:
| > "Take a look at those tests!"
|
| A math module that is not tested for division by zero. Classical
| LLM development.
|
| The suite is mostly happy paths, which is consistent with what
| I've seen LLMs do.
|
| Once you setup coverage, and tell it "there's a hidden branch
| that the report isn't able to display on line 95 that we need to
| cover", things get less fun.
| mjaniczek wrote:
| It's entirely happy paths right now; it would be best to allow
| the test runner to also test for failures (check expected
| stderr and return code), then we could write those missing
| tests.
|
| I think you can find a test somewhere in there with a commented
| code saying "FAWK can't do this yet, but yadda yadda yadda".
| alganet wrote:
| It's funny because I'm evaluating LLMs for just this specific
| case (covering tests) right now, and it does that a lot.
|
| I say "we need 100% coverage on that critical file". It runs
| for a while, tries to cover it, fails, then stops and say
| "Success! We covered 60% of the file (the rest is too hard).
| I added a comment.". 60% was the previous coverage before the
| LLM ran.
| evacchi wrote:
| I've also been thinking about generating DSLs
| https://blog.evacchi.dev/posts/2025/11/09/the-return-of-lang...
| nl wrote:
| I've done something similar here but for Prolog:
| https://github.com/nlothian/Vibe-Prolog
|
| It's interesting comparing what different LLMs can get done.
| moriturius wrote:
| It sure can! I'm creating my language to do AoC in this year!
| https://github.com/viro-lang/viro
___________________________________________________________________
(page generated 2025-11-21 23:00 UTC)