_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (HTM)   The highest quality codebase
       
       
        keeda wrote 1 hour 43 min ago:
        Hilarious! Kinda reinforces the idea that LLMs are like junior
        engineers with infinite energy.
        
        But just telling an AI it's a principal engineer does not make it a
        principal engineer. Firstly, that is such a broad, vaguely defined
        term, and secondly, typically that level of engineering involves
        dealing with organizational and industry issues rather than just
        technical ones.
        
        And so absent a clear definition, it will settle on the lowest common
        denominator of code quality, which would be test coverage -- likely
        because that is the most common topic in its training data -- and
        extrapolate from that.
        
        The other thing is, of course, the RL'd sycophancy which compels it to
        do something, anything, to obey the prompt. I wonder what would happen
        if tweaked the prompt just a little bit to say something like "Use your
        best judgement and feel free to change nothing."
       
        mgrat wrote 2 hours 7 min ago:
        I come here asking with the greatest humility and from the perspective
        of a SRE in a highly regulated industry.  Where does AI have anything
        more to offer than pumping out TS crud apps?  Are any of the people
        doing this accountable to the tech debt this creates?
       
          credit_guy wrote 1 hour 53 min ago:
          I see this sentiment quite often. The Economist chose the "word of
          the year"; it is "slop". Everybody hates AI slop.
          
          And lots of people who use AI coding assistants go through a phase of
          pushing AI slop in prod. I know I did that. Some of it still bites me
          to this day.
          
          But here's the thing: AI coding assistants did not exist two years
          ago. We are critical of them based on unfounded expectations. They
          are tools, and they have limitations. They are far, very, very far,
          from being perfect. They will not replace us for 20 years, at least.
          
          But are they useful? Yes. Can you learn usage patterns so you
          eliminate as much as possible AI slop? I personally hope I did that;
          I think quite a lot of people who use AI coding assistants have found
          ways to tame the beast.
       
        culi wrote 2 hours 45 min ago:
        I checked the diffs of the `highest-quality` branch vs `main` and
        immediately noticed an `as any` [1] Not what I would expect from a
        prompt like "you're a principal engineer"
        
 (HTM)  [1]: https://github.com/Gricha/macro-photo/compare/main...highest-q...
       
        chr15m wrote 2 hours 50 min ago:
        It behaved exactly like 99% of developers, introducing unnecessary
        complexity.
       
        whalesalad wrote 5 hours 0 min ago:
        I would love to see an experiment done like this with an arena of
        principal engineer agents. Give each of them a unique personality: this
        one likes shiny new objects and is willing to deal with early adopter
        pain, this one is a neckbeard who uses emacs as pid 1 and sends email
        via usb thumbdrive, and the third is a pragmatic middle of the road
        person who can help be the glue between them. All decisions need to
        reach a quorum before continuing. Better yet: each agent is running on
        a completely different model from a different provider. 3 can be a knob
        you dial up to 5, 10, etc. Each of these agents can spawn sub-agents,
        to reach out to professionals like a CSS export, or a DBA.
        
        I think prompt engineering could help here a bit, adding some context
        on what a quality codebase is, remove everything that is not necessary,
        consider future maintainability (20->84k lines is a smell). All of
        these are smells that like a simple supervisor agent could have caught.
       
        failuremode wrote 5 hours 28 min ago:
        > We went from around 700 to a whooping 5369 tests
        
        > Tons of tests got added, but some tests that mattered the most
        (maestro e2e tests that validated the app still works) were forgotten.
        
        I've seen many LLM proponents often cite the number of tests as a
        positive signal.
        
        This smells, to me, like people who tout lines of code.
        
        When you are counting tests in the thousands I think its a negative
        signal.
        
        You should be writing property based tests rather than 'assert x=1',
        'assert x=2', 'assert x=-1' and on and on.
        
        If LLMs are incapable of acknowledging that then add it to the long
        list of 'failure modes'.
       
        layer8 wrote 6 hours 18 min ago:
        This makes me wonder what the result would be of having an AI turn a
        code base into literate-programming style, and have it iterate on that
        to improve the “literacy”.
       
        thomassmith65 wrote 7 hours 3 min ago:
        With a good programmer, if they do multiple passes of a refactor, each
        pass makes the code more elegant, and the next pass easier to
        understand and further improve.
        
        Claude has a bias to add lines of code to a project, rather than make
        it more concise. Consequently, each refactoring pass becomes more
        difficult to untangle, and harder to improve.
        
        Ideally, in this experiment, only the first few passes would result in
        changes - mostly shrinking the project size, and from then on, Claude
        would change nothing - just a like a very good programmer.
        
        This is the biggest problem with developing with Claude, by far.
        Anthropic should laser focus on fixing it.
       
        blobbers wrote 7 hours 44 min ago:
        I'm curious if anyone has written a "Principal Engineer" agents.md or
        CLAUDE.md style file that yields better results than the 'junior dev'
        results people are seeing here.
        
        I've worked on writing some as a data scientist, and I have gotten the
        basic claude output to be much better; it makes some saner decisions,
        it validates and circles back to fix fits, etc.
       
        Bombthecat wrote 7 hours 55 min ago:
        Story of AI:
        
        For instance - it created a hasMinimalEntropy function meant to "detect
        obviously fake keys with low character variety". I don't know why.
       
        barbazoo wrote 8 hours 29 min ago:
        > I can sort of respect that the dependency list is pretty small, but
        at the cost of very unmaintainable 20k+ lines of utilities. I guess it
        really wanted to avoid supply-chain attacks.
        
        > Some of them are really unnecessary and could be replaced with off
        the shelf solution
        
        Lots of people would regard this as a good thing. Surely the LLM can't
        guess which kind you are.
       
        jcalvinowens wrote 8 hours 53 min ago:
        This really mirrors my experience trying to get LLMs to clean up kernel
        driver code, they seem utterly incapable of simplifying things.
       
        lubesGordi wrote 8 hours 56 min ago:
        So now you know.  You can get claude to write you a ton of unit tests
        and also improve your static typing situation.    Now you can restrict
        your prompt!
       
        smallpipe wrote 9 hours 0 min ago:
        The viewport of this website is quite infuriating. I have to scroll
        horizontally to see the `cloc` output, but there's 3x the empty space
        on either side.
       
        ttul wrote 9 hours 3 min ago:
        Have you tried writing into the AGENTS.md something like, "Always be on
        the lookout for dead code, copy-pasta, and other opportunities to
        optimize and trim the codebase in a sensible way."
        
        In my experience, adding this kind of instruction to the context window
        causes SOTA coding models to actually undertake that kind of
        optimization while development carries on. You can also periodically
        chuck your entire codebase into Gemini-3 (with its massive context
        window) and ask it to write a refactoring plan; then, pass that
        refactoring plan back into your day-to-day coding environment such as
        Cursor or Codex and get it to take a few turns working away at the
        plan.
        
        As with human coders, if you let them run wild "improving" things
        without specifically instructing them to also pay attention to bloat,
        bloat is precisely what you will get.
       
        nadis wrote 9 hours 6 min ago:
        20K --> 84K lines of ts for a simple app is bananas. Much madness
        indeed! But also super interesting, thanks for sharing the experiment.
       
        jedberg wrote 9 hours 17 min ago:
        You know how when someone hears how many engineerings are working on a
        product, and you think to yourself, "but I could do that with like
        three people!"?  Now you know why they have so many people.  Because
        they did this with their codebase, but with humans.
        
        Or I should say, they kept hiring the humans who needed something to
        do, and basically did what this AI did.
       
        jesse__ wrote 9 hours 22 min ago:
        > This app is around 4-5 screens. The version "pre improving quality"
        was already pretty large. We are talking around 20k lines of TS
        
        Fucking yikes dude.  When's the last time it took you 4500 lines per
        screen, 9000 including the JSON data in the repo?????  This is already
        absolute insanity.
        
        I bet I could do this entire app in easily less than half, probably
        less than a tenth, of that.
       
        maerF0x0 wrote 9 hours 29 min ago:
        I would love to see someone do a longitudinal study of the
        incident/error rate of a canary container in prod that is managed by
        claude. Basically doing a control/experimental group to prove who does
        better the Humans or the AI?
       
        minimaxir wrote 9 hours 36 min ago:
        About a year ago I wrote a blog post (HN discussion: [1] )
        experimenting if asking Claude to "write code better" repeatedly would
        indeed cause it to write better code, determined by speed as better
        code implies more efficient algorithms. I found that it did indeed work
        (at n=5 iterations), but additionally providing a system prompt also
        explicitly improved it.
        
        Given with what I've seen from Claude 4.5 Opus, I suspect the following
        test would be interesting: attempt to have Claude Code +
        Haiku/Sonnet/Opus implement and benchmark an algorithm with:
        
        - no CLAUDE.md file
        
        - a basic CLAUDE.md file
        
        - an overly nuanced CLAUDE.md file
        
        And then both test the algorithm speed and number of turns it takes to
        hit that algorithm speed.
        
 (HTM)  [1]: https://news.ycombinator.com/item?id=42584400
       
        thald wrote 9 hours 37 min ago:
        Interesting experiment. Looking at this I immediately thought similar
        experiment run by Google: AlphaEvolve. Throwing LLM compute at problems
        might work if the problem is well defined and the result can be
        objectively measured.
        
        As for this experiment:
        What does quality even mean? Most human devs will have different
        opinions on it. If you would ask 200 different devs (Claude starts from
        0 after each iteration) to do the same, I have doubts the code would
        look much better.
        
        I am also wondering what would happen if Claude would have an option to
        just walk away from the code if its "good enough". For each problem
        most human devs run cost->benefit equation in their head, only worthy
        ideas are realized. Claude does not do it, the code writing cost is
        very low on his site and the prompt does not allow any graceful exit :)
       
        samuelknight wrote 9 hours 40 min ago:
        This is an interesting experiment that we can summarize as "I gave a
        smart model a bad objective", with the key result at the end
        
        "...oh and the app still works, there's no new features, and just a few
        new bugs."
        
        Nobody thinks that doing 200 improvement passes on functioning code
        base is a good idea. The prompt tells the model that it is a principal
        engineer, then contradicts that role the imperative "We need to improve
        the quality of this codebase". Determining when code needs to be
        improved is a responsibility for the principal engineer but the prompt
        doesn't tell the model that it can decide the code is good enough. I
        think we would see a different behavior if the prompt was changed to
        "Inspect the codebase, determine if we can do anything to improve code
        quality, then immediately implement it." If the model is smart enough,
        this will increasingly result in passes where the agent decides there
        is nothing left to do.
        
        In my experience with CC I get great results where I make an open ended
        question about a large module and instruct it to come back to me with
        suggestions. Claude generates 5-10 suggestions and ranks them by
        impact. It's very low-effort from the developer's perspective and it
        can generate some good ideas.
       
        fauigerzigerk wrote 9 hours 41 min ago:
        What would happen if you gave the same task to 200 human contractors?
        
        I suspect SLOC growth wouldn't be quite as dramatic but things like
        converting everything to Rust's error handling approach could easily
        happen.
       
        tracker1 wrote 9 hours 42 min ago:
        On the Result responses... I've seen this a few times.    I think it
        works well in Rust or other languages that don't have the ability to
        "throw" baked in.  However, when you bolt it on to a language that
        implicitly can throw, you're now doing twice the work as you have to
        handle the explicit error result and integrated errors.
        
        I worked in a C# codebase with Result responses all over the place, and
        it just really complicated every use case all around.  Combined with
        Promises (TS) it's worse still.
       
          mrsmrtss wrote 8 hours 21 min ago:
          The Result pattern also works exceptionally well with C#, provided
          you ensure that code returning a Result object never throws an
          exception. Of course, there are still some exceptional things that
          can throw, but this is essentially the same situation as dealing with
          Rust panics.
       
            tracker1 wrote 4 hours 29 min ago:
            IMO, Rust panics should kill the application... C# errors
            shouldn't.  Also, in practice, in C# where I was dealing with
            Result, there was just as much chance of seeing an actual thrown
            error, so you always had to deal with both an explicit error result
            AND thrown errors in practice... it was worse than just error
            patterns with type specific catch blocks.
       
        g947o wrote 9 hours 44 min ago:
        When I ask coding agents to add tests, they often come up with
        something like this:
        
            const x = new NewClass();
            assert.ok(x instanceof NewClass);
        
        So I am not at all surprised about Claude adding 5x tests, most of
        which are useless.
        
        It's going to be fun to look back at this and see how much slop these
        coding agents created.
       
        orliesaurus wrote 9 hours 49 min ago:
        Ok SRS question:
        What's the best "Code Review" Skill/Agent/Prompt that I can use these
        days? Curious to see even paid options if anyone knows?
       
        GuB-42 wrote 9 hours 52 min ago:
        It is something I noticed when talking to LLMs, if they don't get it
        right the first time, they probably never will, and if you really
        insist, the quality starts to degrade.
        
        It is not unlike people, the difference being that if you ask someone
        the same thing 200 times, he will probably going to tell you to go fuck
        yourself, or, if unable to, turn to malicious compliance. These AIs
        will always be diligent. Or, a human may use the opportunity to educate
        himself, but again, LLMs don't learn by doing, they have a distinct
        training phase that involves ingesting pretty much everything humanity
        has produced, your little conversation will not have a significant
        effect, if at all.
       
          grvdrm wrote 9 hours 32 min ago:
          I use a new chat/etc every time that happens. Try to improve my
          prompt to get a better result. Sometimes works, but that multiple
          chat rather than laborious long chat approach annoys me less.
       
        VikingCoder wrote 10 hours 0 min ago:
        You need to scroll the windows to see all the numbers.    (Why??)
       
        Havoc wrote 10 hours 2 min ago:
        My current fav improvement strategy is
        
        1) Run multiple code analysis tools over it and have the LLM aggregate
        it with suggestions
        
        2) ask the LLM to list potential improvements open ended question and
        pick by hand  which I want
        
        And usually repeat the process with a completely different model (ie
        diff company trained it)
        
        Any more and yeah they end up going in circles
       
        keepamovin wrote 10 hours 4 min ago:
        This is actually a great idea. It's like those AI resampled this image
        10,000 times. Or JPEG iteratively compressed this picture 1 Million
        times.
       
        mvanbaak wrote 10 hours 4 min ago:
        `--dangerously-skip-permissions` why?
       
          minimaxir wrote 10 hours 1 min ago:
          It's necessary to allow Claude Code to be fully autonomous, otherwise
          it will stop and ask you to run commands.
       
            mvanbaak wrote 9 hours 54 min ago:
            and just letting it to do whatever it thinks it should do, without
            a human intervening, is a good plan?
       
              ssl-3 wrote 9 hours 33 min ago:
              Depending on the breadth (and value) of the sandbox:  Sure?  Why
              not?
              
              To extend what may seem like a [prima facie] insane, stupid, or
              foolhardy idea:  Why not send the output of /dev/urandom into
              /bin/bash?  Or even /proc/mem?    It probably won't do anything
              particularly interesting.  It will probably just break things and
              burn power.
              
              And so?  It's just a computer; its scope is limited.
       
              news_hacker wrote 9 hours 37 min ago:
              the "best practice" suggestion would be to do this in a sandboxed
              container
       
              minimaxir wrote 9 hours 44 min ago:
              Discovering that is the entire intent of this experiment, yes.
       
                mvanbaak wrote 9 hours 41 min ago:
                fair point. will re-read the whole thing. I'm sorry for my
                ignorance.
       
        guluarte wrote 10 hours 9 min ago:
        that's my experience with AI, most times it creates an overengineered
        solution unless told it to keep it simple
       
        mbesto wrote 10 hours 11 min ago:
        While there are justifiable comments here about how LLMs behave, I want
        to point out something else:
        
        There is no consensus on what constitutes a high quality codebase.
        
        Said differently - even if you asked 200 humans to do this same
        exercise, you would get 200 different outputs.
       
        phildougherty wrote 10 hours 11 min ago:
        Pasting this whole article in to claude code "improve my codebase
        taking this article in to account"
       
          minimaxir wrote 10 hours 0 min ago:
          You can just give Claude Code/any modern Agent a URL and it'll
          retrieve it.
       
        torginus wrote 10 hours 12 min ago:
        I've heard a very apt criticism of the current batch of LLMs:
        
        LLMs are incapable of reducing entropy in a code base
        
        I've always had this nagging feeling, but I think this really captures
        the essence of it succintly.
       
        surprisetalk wrote 10 hours 13 min ago:
        This reflects my experience with human programmers. So many devs are
        taught to add layers of complexity in pursuit of "best practices". I
        think the LLM was trained to behave this way.
        
        In my experience, Claude can actually clean up a repo rather nicely if
        you ask it to (1) shrink source code size (LOC or total bytes), (2)
        reduce dependencies, and (3) maintain integration tests.
       
        6LLvveMx2koXfwn wrote 10 hours 13 min ago:
        for all the bad code havoc was most certainly not 'wrecked', it may
        have been 'wreaked' though . . .
       
        elzbardico wrote 10 hours 13 min ago:
        LLMs have this strong bias towards generating code, because writing
        code is the default behavior from pre-training.
        
        Removing code, renaming files, condensing, and other edits is mostly a
        post-training stuff, supervised learning behavior. You have armies of
        developers across the world making 17 to 35 dollars an hour solving
        tasks step by step which are then basically used to generate
        prompt/responses pairs of desired behavior for a lot of common
        development situations, adding desired output for things like tool
        calling, which is needed for things like deleting code.
        
        A typical human working on post-training dataset generation task would
        involve a scenario like: given this Dockerfile for a python
        application, when we try to run pytest it fails with exception foo not
        found. The human will notice that package foo is not installed, change
        the requirements.txt file and write this down, then he will try pip
        install, and notice that the foo package requires a certain native
        library to be installed. The final output of this will be a response
        with the appropriate tool calls in a structured format.
        
        Given that the amount of unsupervised learning is way bigger than the
        amount spent on fine-tuning for most models, it is not surprise that
        given any ambiguous situation, the model will default to what it knows
        best.
        
        More post-training will usually improve this, but the quality of the
        human generated dataset probably will be the upper bound of the output
        quality, not to mention the risk of overfitting if the foundation model
        labs embrace SFT too enthusiastically.
       
          hackernewds wrote 8 hours 32 min ago:
          > Writing code is the default behavior from pre-training
          
          what does this even mean? could you expand on it
       
            joaogui1 wrote 2 hours 7 min ago:
            During pre-training the model is learning next-token prediction,
            which is naturally additive. Even if you added DEL as a token it
            would still be quite hard to change the data so that it can be used
            in a mext-token prediction task
            Hope that helps
       
            bongodongobob wrote 7 hours 53 min ago:
            He means that it is heavily biased to write code, not remove,
            condense, refactor, etc. It wants to generate more stuff, not less.
       
              elzbardico wrote 46 min ago:
              Because there are not a lot of high quality examples of code
              edition on the training corpora other than maybe version control
              diffs.
              
              Because editing/removing code requires that the model output
              tokens for tools calls to be intercepted by the coding agent.
              
              Responses like the example below are not emergent behavior, they
              REQUIRE fine-tuning. Period.
              
                I need to fix this null pointer issue in the auth module.
                <|tool_call|>
                {"id": "call_abc123", "type": "function", "function": {"name":
              "edit_file",     "arguments": "{"path": "src/auth.py",
              "start_line": 12, "end_line": 14, "replacement": "def
              authenticate(user):\n     if user is None:\n       return  
              False\n    return verify(user.token)"}"}}
                <|end_tool_call|>
       
              snet0 wrote 6 hours 29 min ago:
              I don't see why this would be the case.
       
                bunderbunder wrote 4 hours 25 min ago:
                It’s because that’s what most resembles the bulk of the
                tasks it was being optimized for during pre-training.
       
        etamponi wrote 10 hours 15 min ago:
        Am I the only one that is surprised that the app still works?!
       
        stavros wrote 10 hours 18 min ago:
        Well, given it can't say "no, I think it's good enough now", you'll
        just get madness, no?
       
          minimaxir wrote 9 hours 45 min ago:
          That's the point. Sometimes madness is interesting.
       
        WhitneyLand wrote 10 hours 20 min ago:
        It can be difficult to explain to management why in certain scenarios
        AI can seem to work coding miracles, but this still doesn’t mean
        it’s going always speed up development 10x especially for an
        established code base.
        
        Tangible examples like this seem like a useful way to show some of the
        limitations.
       
        bulletsvshumans wrote 10 hours 22 min ago:
        I think the prompt is a major source of the issue. "We need to improve
        the quality of this codebase" implicitly indicates that there is
        something wrong with the codebase. I would be curious to see if it
        would reach a point of convergence with a prompt that allowed for it.
        Something like "Improve the quality of this codebase, or tell me that
        it is already in an optimal state."
       
        iambateman wrote 10 hours 27 min ago:
        The point he’s making - that LLM’s aren’t ready for broadly
        unsupervised software development - is well made.
        
        It still requires an exhausting amount of thought and energy to make
        the LLM go in the direction I want, which is to say in a direction
        which considers the code which is outside the current context window.
        
        I suspect that we will not solve the context window problem for a long
        time. But we will see a tremendous growth in “on demand tooling”
        for things which do fit into a context window and for which we can let
        the AI “do whatever it wants.”
        
        For me, my work product needs to conform to existing design standards
        and I can’t figure out how to get Claude to not just wire up its own
        button styles.
        
        But it’s remarkable how—despite all of the nonsense—these tools
        remain an irreplaceable part of my work life.
       
          spaceywilly wrote 10 hours 1 min ago:
          I feel like I’ve figured out a good workflow with AI coding tools
          now. I use it in “Planning mode” to describe the feature or
          whatever I am working on and break it down into phases. I iterate on
          the planning doc until it matches what I want to build.
          
          Then, I ask it to execute each phase from the doc one at a time. I
          review all the code it writes or sometimes just write it myself. When
          it is done it updates the plan with what was accomplished and what
          needs to be done next.
          
          This has worked for me because:
          
          - it forces the planning part to happen before coding. A lot of
          Claude’s “wtf” moments can be caught in this phase before it
          write a ton of gobbledygook code that I then have to clean up
          
          - the code is written in small chunks, usually one or two functions
          at a time. It’s small enough that I can review all the code and
          understand before I click accept. There’s no blindly accepting junk
          code.
          
          - the only context is the planning doc. Claude captures everything it
          needs there, and it’s able to pick right up from a new chat and
          keep working.
          
          - it helps my distraction-prone brain make plans and keep track of
          what I was doing. Even without Claude writing any code, this alone is
          a huge productivity boost for me. It’s like have a magic notebook
          that keeps track of where I was in my projects so I can pick them up
          again easily.
       
          torginus wrote 10 hours 3 min ago:
          Which is why I think agentic software development is not really worth
          it today. It can solve well-defined problems, and work through issues
          by rote, but to give it some task and have it work on it for a couple
          hours, then you have to come in and fix it up.
          
          I think LLMs are still at the 'advanced autocomplete' stage, where
          the most productive way to use them is to have a human in the loop.
          
          In this, accuracy of following instructions, and short feedback time
          is much more important than semi-decent behavior over long-horizon
          tasks.
       
        krupan wrote 10 hours 32 min ago:
        Just the headline sounds like a YouTube brain rot video title:
        
        "I spent 200 days in the woods"
        
        "I Google translated this 200 times"
        
        "I hit myself with this golf club 200 times"
        
        Is this really what hacker news is for now?
       
          havkom wrote 10 hours 28 min ago:
          There are fundamental differences. Many people expect a positive
          gradient of quality from AI overhaul of projects. For translating
          back and forth, it is obvious from the outset that there is a
          negative gradient of quality (the Chinese whispers game).
       
          jmkni wrote 10 hours 30 min ago:
          If you reverse the order this could be a very interesting Youtube
          series
       
        hazmazlaz wrote 10 hours 34 min ago:
        Well of course it produced bad results... it was given a bad prompt.
        Imagine how things would have turned out if you had given the same
        instructions to a skilled but naive contractor who contractually
        couldn't say no and couldn't question you. Probably pretty similar.
       
          mainmailman wrote 10 hours 31 min ago:
          Yeah I don't see the utility in doing this hundreds of times back to
          back. A few iterations can tell us some things about how Claude
          optimizes code, but an open ended prompt to endlessly "improve" the
          code sounds like a bad boss making huge demands. I don't blame the AI
          for adding BS down the line.
       
        simonw wrote 10 hours 35 min ago:
        The prompt was:
        
          Ultrathink. You're a principal engineer. Do not ask me any
          questions. We need to improve the quality of this codebase.
          Implement improvements to codebase quality.
        
        I'm a little disappointed that Claude didn't eventually decide to start
        removing all of the cruft it had added to improve the quality that way
        instead.
       
          Gricha wrote 9 hours 33 min ago:
          Yeah, the best it did on some iterations is claimed that the codebase
          was already in the good state and didn't produce changes - but that
          was 1 in many.
       
        gm678 wrote 10 hours 37 min ago:
        "Core Functional Utilities: Identity function - returns its input
        unchanged." is one of my favorites from `lib/functional.ts`.
       
        bikeshaving wrote 10 hours 39 min ago:
         [1] The logger library which Claude created is actually pretty simple,
        highly approachable code, with utilities for logging the timings of
        async code and the ability to emit automatic performance warnings.
        
        I have been using LogTape ( [2] ) for JavaScript logging, and the
        inherited, category-focused logging with different sinks has been
        pretty great.
        
 (HTM)  [1]: https://github.com/Gricha/macro-photo/blob/highest-quality/lib...
 (HTM)  [2]: https://logtape.org
       
        elzbardico wrote 10 hours 39 min ago:
        Funniest part:
        
        > ..oh and the app still works, there's no new features, and just a few
        new bugs.
       
        Hammershaft wrote 10 hours 44 min ago:
        Impressive that the app still works! Did not expect that.
       
        dcchuck wrote 10 hours 51 min ago:
        I spent some time last night "over iterating" on a plan to do some
        refactoring in a large codebase.
        
        I created the original plan with a very specific ask - create an
        abstraction to remove some tight coupling. Small problem that had a big
        surface area. The planning/brainstorming was great and I like the plan
        we came up with.
        
        I then tried to use a prompt like OP's to improve it (as I said, large
        surface area so I wanted to review it) - "Please review PLAN_DOC.md -
        is it a comprehensive plan for this project?". I'd run it -> get
        feedback -> give it back to Claude to improve the plan.
        
        I (naively perhaps) expected this process to converge to a "perfect
        plan". At this point I think of it more like a probability tree where
        there's a chance of improving the plan, but a non-zero chance of
        getting off the rails. And once you go off the rails, you only veer
        further and further from the truth.
        
        There are certainly problems where "throwing compute" at it and
        continuing to iterate with an LLM will work great. I would expect those
        to have firm success criteria. Providing definitions of quality would
        significantly improve the output here as well (or decrease the
        probability of going off the rails I suppose). Otherwise Claude will
        confuse quality like we see here.
        
        Shout out OP for sharing their work and moving us forward.
       
          Gricha wrote 9 hours 42 min ago:
          I think I end up doing that with plans inadvertently too. Oftentimes
          I'll iterate on a plan too many times, and only recognize that it's
          too far gone and needs a restart with more direction after sinking in
          15 minutes into it.
       
          elzbardico wrote 10 hours 33 min ago:
          Small errors compound over time.
       
        pawelduda wrote 10 hours 51 min ago:
        Did it create 200 CODE_QUALITY_IMPROVEMENTS.md files by chance?
       
        postalcoder wrote 10 hours 51 min ago:
        One of my favorite personal evals for llms is testing its stability as
        a reviewer.
        
        The basic gist of it is to give the llm some code to review and have it
        assign a grade multiple times. How much variance is there in the grade?
        
        Then, prompt the same llm to be a "critical" reviewer with the same
        code multiple times. How much does that average critical grade change?
        
        A low variance of grades across many generations and a low delta
        between "review this code" and "review this code with a critical eye"
        is a major positive signal for quality.
        
        I've found that gpt-5.1 produces remarkably stable evaluations whereas
        Claude is all over the place. Furthermore, Claude will completely [and
        comically] change the tenor of its evaluation when asked to be critical
        whereas gpt-5.1 is directionally the same while tightening the screws.
        
        You could also interpret these results to be a proxy for
        obsequiousness.
        
        Edit: One major part of the eval i left out is "can an llm converge on
        an 'A'?" Let's say the llm gives the code a 6/10 (or B-). When you
        implement its suggestions and then provide the improved code in a new
        context, does the grade go up? Furthermore, can it eventually give
        itself an A, and consistently?
        
        It's honestly impressive how good, stable, and convergent gpt-5.1 is.
        Claude is not great. I have yet to test it on Gemini 3.
       
          lemming wrote 8 hours 38 min ago:
          I agree, I mostly use Claude for writing code, but I always get GPT5
          to review it. Like you, I find it astonishingly consistent and
          useful, especially compared to Claude. I like to reset my context
          frequently, so I’ll often paste the problems from GPT into Claude,
          then get it to review those fixes (going around that loop a few
          times), then reset the context and get it to do a new full review.
          It’s very reassuring how consistent the results are.
       
          OsrsNeedsf2P wrote 9 hours 23 min ago:
          How is this different than testing the temperature?
       
            itishappy wrote 4 hours 7 min ago:
            How does temperature explain the variance in response to the
            inclusion of the word "critical"?
       
            smt88 wrote 8 hours 32 min ago:
            It isn't, and it reflects how deeply LLMs are misunderstood, even
            by technical people
       
          adastra22 wrote 9 hours 55 min ago:
          You mean literally assign a grade, like B+? This is unlikely to work
          based on how token prediction & temperature works. You're going to
          get a probability distribution in the end that is reflective of the
          model runtime parameters, not the intelligence of the model.
       
          guluarte wrote 10 hours 7 min ago:
          my experience reviewing pr is that sometimes it says it is perfect
          with some nipicks and other times the same pr that it is trash and
          need a lot of work
       
        xnorswap wrote 10 hours 57 min ago:
        Claude is really good at specific analysis, but really terrible at
        open-ended problems.
        
        "Hey claude, I get this error message: ", and it'll often find the root
        cause quicker than I could.
        
        "Hey claude, anything I could do to improve Y?", and it'll struggle
        beyond the basics that a linter might suggest.
        
        It suggested enthusiastically a library for  and it was all
        "Recommended" about it, but when I pointed out that the library had
        been considered and rejected because , it understood and wrote up why
        that library suffered from that issue and why it was therefore
        unsuitable.
        
        There's a significant blind-spot in current LLMs related to blue-sky
        thinking and creative problem solving. It can do structured problems
        very well, and it can transform unstructured data very well, but it
        can't deal with unstructured problems very well.
        
        That may well change, so I don't want to embed that thought too deeply
        into my own priors, because the LLM space seems to evolve rapidly. I
        wouldn't want to find myself blind to the progress because I write it
        off from a class of problems.
        
        But right now, the best way to help an LLM is have a deep understanding
        of the problem domain yourself, and just leverage it to do the
        grunt-work that you'd find boring.
       
          mkw5053 wrote 35 min ago:
          I’ve had reasonable success having it ultrathink of every possible
          X (exhaustively) and their trades offs and then give me a ranked list
          and rationale of its top recommendations. I almost always choose the
          top but just reading the list and then giving it next steps has
          worked really well for me.
       
          awesome_dude wrote 3 hours 32 min ago:
          My experience has been with Claude that having it "review" my code
          has produced some helpful feedback and refactoring suggestions, but
          also, it falls short in others
       
          ljm wrote 4 hours 2 min ago:
          I am basically rawdogging Claude these days, I don’t use MCPs or
          anything else, I just lay down all of the requirements and the
          suggestions and the hints, and let it go to work.
          
          When I see my colleagues use an LLM they are treating it like a mind
          reader and their prompts are, frankly, dogshit.
          
          It shows that articulating a problem is an important skill.
       
          theshrike79 wrote 6 hours 20 min ago:
          Codex is better for the latter style. It takes its time, mulls about
          and investigates and sometimes finds a nugget of gold.
          
          Claude is for getting shit done, it's not at its best at long
          research tasks.
       
          dolftax wrote 7 hours 4 min ago:
          The structured vs open-ended distinction here applies to code review
          too. When you ask an LLM to "find issues in this code", it'll happily
          find something to say, even if the code is fine. And when there are
          actual security vulnerabilities, it often gets distracted by style
          nitpicks and misses the real issues.
          
          Static analysis has the opposite problem - very structured,
          deterministic, but limited to predefined patterns and overwhelms you
          in false positives.
          
          The sweet spot seems to be to give structure to what the LLM should
          look for, rather than letting it roam free on an open-ended "review
          this" prompt.
          
          We built Autofix Bot[1] around this idea. [1]  (disclosure: founder)
          
 (HTM)    [1]: https://autofix.bot
       
          ericmcer wrote 7 hours 10 min ago:
          Exactly, if you visualize software as a bunch separate "states" (UI
          state, app state, DB state) then our job is to mutate states and
          synchronize those mutations across the system. LLMs are good at
          mutating a specific state in a specific way. They are trash at
          designing what data shape a state should be, and they are bad at
          figuring out how/why to propagate mutations across a system.
       
          order-matters wrote 8 hours 9 min ago:
          TBH I think its ability to structure unstructured data is what makes
          it a powerhouse tool and there is so much juice to squeeze there that
          we can make process improvements for years even if it doesnt get any
          better at general intelligence.
          
          If I had a pdf printout of a table, the workflow i used to have to
          use to get that back into a table data structure to use for
          automation was hard (annoying).  dedicated OCR tools with limitations
          on inputs, multiple models in that tool for the different ways the
          paper the table was on might be formatted.  it took hours for a new
          input format
          
          now i can take a photo of something with my phone and get a data
          table in like 30 seconds.
          
          people seem so desperate to outsource their thinking to these models
          and operating at the limits of their capability, but i have been
          having a blast using it to cut through so much tedium that werent
          unsolved problems but required enough specialized tooling and custom
          config to be left alone unless you really had to
          
          this fits into what youre saying with using it to do the grunt work i
          find boring i suppose, but feels a little bit more than that - like
          it has opened a lot of doors to spaces that had grunt work that wasnt
          worth doing for the end result previously but now it is
       
          d-lisp wrote 8 hours 20 min ago:
          I remember about a problem I had while quick testing notcurses. I
          tried chatGPT which produced a lot of weird but kinda believable
          statements about the fact that I had to include wchar and define a
          specific preprocessor macro, AND I had to place the includes for
          notcurses, other includes and macros in a specific order.
          
          My sentiment was "that's obviously a weird non-intended hack" but I
          wanted to test quickly, and well ... it worked. Later, reading the
          man-pages I aknowledged the fact that I needed to declare specific
          flags for gcc in place of the gpt advised solution.
          
          I think these kind of value based judgements are hard to emulate for
          LLMs, it's hard for them to identifiate a single source as the most
          authoritative source in a sea of lesser authoritative (but numerous)
          sources.
       
          andai wrote 8 hours 43 min ago:
          The current paradigm is we sorta-kinda got AGI by putting dodgy AI in
          a loop:
          
          until works { try again }
          
          The stuff is getting so cheap and so fast... a sufficient increment
          in quantity can produce a phase change in quality.
       
          ludicrousdispla wrote 9 hours 2 min ago:
          >> "Hey claude, I get this error message: ", and it'll often find the
          root cause quicker than I could.
          
          Back in the day, we would just do this with a search engine.
       
          cultofmetatron wrote 9 hours 31 min ago:
          > There's a significant blind-spot in current LLMs related to
          blue-sky thinking and creative problem solving.
          
          thats called job security!
       
          mbesto wrote 10 hours 15 min ago:
          > There's a significant blind-spot in current LLMs related to
          blue-sky thinking and creative problem solving. It can do structured
          problems very well, and it can transform unstructured data very well,
          but it can't deal with unstructured problems very well.
          
          While this is true in my experience, the opposite is not true. LLMs
          are very good at helping me go through a structure processing of
          thinking about architectural and structural design and then help
          build a corresponding specification.
          
          More specifically the "idea honing" part of this proposed process
          works REALLY well: [1] This:
          Each question should build on my previous answers, and our end goal
          is to have a detailed specification I can hand off to a developer.
          Let’s do this iteratively and dig into every relevant detail.
          Remember, only one question at a time.
          
 (HTM)    [1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
       
            skydhash wrote 9 hours 34 min ago:
            I've checked the linked page and there's nothing about even
            learning the domain or learning the tech platform you're going to
            use. It's all blind faith, just a small step above copying stuff
            from GitHub or StackOverflow and pushing it to prod.
       
              mbesto wrote 4 hours 25 min ago:
              You completely missed the point of my comment...
       
          giancarlostoro wrote 10 hours 26 min ago:
          > "Hey claude, I get this error message: ", and it'll often find the
          root cause quicker than I could.
          
          This is true, as for "Open Ended" I use Beads with Claude code, I ask
          it to identify things based on criteria (even if its open ended) then
          I ask it to make tasks, then when its done I ask it to research and
          ask clarifying questions for those tasks. This works really well.
       
          asmor wrote 10 hours 27 min ago:
          This is it. It doesn't replace the higher level knowledge part very
          well.
          
          I asked Claude to fix a pet peeve of mine, spawning a second process
          inside an existing Wine session (pretty hard if you use umu, since it
          runs in a user namespace). I asked Claude to write me a python server
          to spawn another process to pass through a file handler "in Proton",
          and it proceeded a long loop of trying to find a way to launch into
          an existing wine session from Linux with tons of environment
          variables that didn't exist.
          
          Then I specified "server to run in Wine using Windows Python" and it
          got more things right. Except it tried to use named pipes for IPC.
          Which, surprise surprise, doesn't work to talk to the Linux piece.
          Only after I specified "local TCP socket" it started to go right. Had
          I written all those technical constraints and made the design
          decisions in the first message it'd have been a one-hit success.
       
          cyral wrote 10 hours 34 min ago:
          Using the plan mode in cursor (or asking claude to first come up with
          a plan) makes it pretty good at generic "how can I improve" prompts.
          It can spend more effort exploring the codebase and thinking before
          implementing.
       
          pdntspa wrote 10 hours 36 min ago:
          That's why you treat it like a junior dev. You do the fun stuff of
          supervising the product, overseeing design and implementation,
          breaking up the work, and reviewing the outputs. It does the boring
          stuff of actually writing the code.
          
          I am phenomenally productive this way, I am happier at my job, and
          its quality of work is extremely high as long as I occasionally have
          it stop and self-review it's progress against the style principles
          articulated in its AGENTS.md file. (As it tends to forget a lot of
          rules like DRY)
       
            xnx wrote 3 hours 12 min ago:
            > rules like DRY
            
            Principles like DRY
       
            order-matters wrote 8 hours 4 min ago:
            I wonder if DRY is still a principle worth holding onto in the AI
            coding era.  I mean it probably is, but this feels like enough of a
            shift in coding design that re-evaluating principles designed for
            human-only coding might be worth the effort
       
            tiku wrote 8 hours 45 min ago:
            I enjoy finding the problem and then telling Claude to fix it.
            Specifying the function and the problem. Then going to get a coffee
            from the breakroom to see it finished when I return. The junior dev
            has questions when I did that. Claude just fixes it.
       
            AStrangeMorrow wrote 9 hours 52 min ago:
            Yeah at this point I basically have to dictate all implementation
            details: do this, but do it this specific way, handle xyz edge
            cases by doing that, plug the thing in here using that API.
            Basically that expands 10 lines into 100-200 lines of code.
            
            However if I just say “I have this goal, implement a solution”,
            chances are that unless it is a very common task, it will come up
            with a subpar/incomplete implementation.
            
            What’s funny to me is that complexity has inverted for some
            tasks: it can ace a 1000 lines ML model for a general task I give
            it, yet will completely fail to come up with a proper solution for
            a 2D geometric problem that mostly has high school level maths that
            can be solved in 100 lines
       
            rootnod3 wrote 10 hours 8 min ago:
            Cool cool cool. So if you use LLMs as junior devs, let me ask you
            how future awesome senior devs like you will come around? From WHAT
            job experience? From what coding struggle?
       
              platevoltage wrote 7 hours 24 min ago:
              There's that long term thinking that the tech industry, and
              really every other publicly traded company is known for.
       
              pdntspa wrote 9 hours 37 min ago:
              My last job there was effectively a gun held to the back of my
              head, ordering me to use this stuff. And this started about a
              year ago, when the tooling for agentic dev was absolutely
              atrocious, because we had a CTO who had the biggest most raging
              boner for anything that offered even a whiff of "AI".
              
              Unfortunately the bar is being raised on us. If you can't hang
              with the new order you are out of a job. I promise I was one of
              the holdouts who resisted this the most. It's probably why I got
              laid off last spring.
              
              Thankfully, as of this last summer, agentic dev started to really
              get good, and my opinion made a complete 180. I used the off time
              to knock out a personal project in a month or two's worth of
              time, that would have taken me a year+ the old way. I leveraged
              that experience to get me where I am now.
       
                rootnod3 wrote 9 hours 15 min ago:
                Ok, now assume you start relying on it and let's assume cloud
                flare has another outage. You just go and clock out for the day
                saying "can't work, agent is down"?
                
                I don't think we'll be out of jobs. Maybe temporarily. But
                those jobs come back. The energy and money drain that LLMs are,
                are just not sustainable.
                
                I mean, it's cool that you got the project knocked out in a
                month or two, but if you'd sit down now without an LLM and try
                to measure the quality of that codebase, would you be 100%
                content? Speed is not always a good metric. Sure, 1 -2 months
                for a project is nice, but isn't especially a personal project
                more about the fun of doing the project and learning something
                from it and sharpening your skills?
       
                  pdntspa wrote 8 hours 24 min ago:
                  When the POS system goes down at a restaurant they'll revert
                  to pen and paper. Can't imagine its much different in that
                  case.
       
              fluidcruft wrote 9 hours 51 min ago:
              How do you get junior devs if your concept of the LLM is that
              it's "a principal engineer" that "do[es] not ask [you] any
              questions"?
              
              Also, I'm pretty sure junior devs can use directing a LLM to
              learn from mistakes faster. Let them play. Soon enough they're
              going to be better than all of us anyway. The same way widespread
              access to strong chess computers raised the bar at chess clubs.
       
                rootnod3 wrote 9 hours 19 min ago:
                I don't think the chess analogy grabs here. In chess, you play
                _against_ the chess computer. Take the same approach and let
                the chess computer play FOR the player and see how far he gets.
       
                  fluidcruft wrote 9 hours 4 min ago:
                  Maybe. I don't think adversarial vs not is as important as
                  gaining experience. Ultimately both are problem solving tasks
                  and learning instincts about which approaches work best in
                  certain situations.
                  
                  I'm probably a pretty shitty developer by HN standards but I
                  generally have to build a prototype to fully understand and
                  explore problem and iterate designs and LLMs have been pretty
                  good for me as trainers for learning things I'm not familiar
                  with. I do have a certain skill set, but the non-domain stuff
                  can be really slow and tedious work. I can recognize "good
                  enough" and "clean" and I think the next generation can use
                  that model very well to be become native with how to succeed
                  with these tools.
                  
                  Let me put it this way: people don't have to be hired by the
                  best companies to gain experience using best practices
                  anymore.
       
              bpt3 wrote 10 hours 2 min ago:
              Why is that a developer's problem? If anything, they are
              incentivized to avoid creating future competition in the job
              market.
       
                rootnod3 wrote 9 hours 14 min ago:
                It's not a problem for the senior dev directly, but maybe down
                the road. And it definitely is a problem for the company once
                said senior dev leaves or retires.
                
                Seriously, long term thinking went out the window long time
                ago, didn't it?
       
                  bpt3 wrote 5 hours 15 min ago:
                  No, long term thinking didn't go out the window.
                  
                  It is definitely a problem for the company. How is it a
                  problem for the senior dev at any point?
                  
                  What incentive do they have to aid the company at the expense
                  of their own *long term* career prospects?
       
              eightysixfour wrote 10 hours 4 min ago:
              What would you like individual contributors to do about it,
              exactly? Refuse to use it, even though this person said they're
              happier and more fulfilled at work?
              
              I'm asking because I legitimately have not figured out an answer
              to this problem.
       
            mjr00 wrote 10 hours 18 min ago:
            > That's why you treat it like a junior dev. You do the fun stuff
            of supervising the product, overseeing design and implementation,
            breaking up the work, and reviewing the outputs. It does the boring
            stuff of actually writing the code.
            
            I am so tired of this analogy. Have the people who say this never
            worked with a junior dev before? If you treat your junior devs as
            brainless code monkeys who only exist to type out your brilliant
            senior developer designs and architectures instead of, you know,
            human beings capable of solving problems, 1) you're wasting your
            time, because a less experienced dev is still capable of solving
            problems independently, 2) the juniors working under you will hate
            it because they get no autonomy, and 3) the juniors working under
            you will stay junior because they have no opportunity to
            learn--which means you've failed at one of your most important
            tasks as a senior developer, which is mentorship.
       
              pdntspa wrote 9 hours 26 min ago:
              I have mentored and worked with a junior dev. And the only way to
              get her to do anything useful and productive was to spell things
              out. Otherwise she got wrapped around the axle trying to figure
              out the complex things and was constantly asking for my help with
              basic design-level tasks. Doing the grunt work is how you learn
              the higher-level stuff.
              
              When I was a junior, that's how it was for me. The senior gave me
              something that was structured and architected and asked me to
              handle smaller tasks that were beneath them.
              
              Giving juniors full autonomy is a great way to end up with an
              unmaintainable mess that is a nightmare to work with without
              substancial refactoring. I know this because I have made a career
              out of fixing exactly this mistake.
       
                mjr00 wrote 9 hours 23 min ago:
                I have never worked with junior devs as incompetent as you
                describe, having worked at AWS, Splunk/Cisco, among others. At
                AWS even interns essentially got assigned a full project for
                their term and were just told to go build it. Does your company
                just have an absurdly low hiring bar for juniors?
                
                > Giving juniors full autonomy is a great way to end up with an
                unmaintainable mess that is a nightmare to work with without
                substancial refactoring.
                
                Nobody is suggesting they get full autonomy to cowboy code and
                push unreviewed changes to prod. Everything they build should
                be getting reviewed by their peers and seniors. But they need
                opportunities to explore and make mistakes and get feedback.
       
                  pdntspa wrote 9 hours 18 min ago:
                  > AWS, Splunk/Cisco
                  
                  It's an entirely different world in small businesses that
                  aren't primarily tech.
       
            alfalfasprout wrote 10 hours 18 min ago:
            I really hope you don't actually treat junior devs this way...
       
            FeteCommuniste wrote 10 hours 31 min ago:
            Maybe I'm weird but I enjoy "actually writing the code."
       
              theshrike79 wrote 6 hours 16 min ago:
              You really get enjoyment writing a full CRUD HTTP API five times,
              one for each endpoint?
              
              I don't :) Before I had IDE templates and Intellisense. Now I can
              just get any agentic AI to do it for me in 60 seconds and I can
              get to the actual work.
       
                skydhash wrote 2 hours 22 min ago:
                Why do you need a full crud http api for? Just loading the data
                straight  from the database? Usually I've already implemented
                that before and I just copy paste the implementation and doing
                some VIM magic. And in Frameworks like Rails or Laravel, it may
                be less than 10 lines of code. More involved business logic?
                Then I'm spending more time getting a good spec for those than
                implementing the spec.
       
              vitro wrote 9 hours 40 min ago:
              I sometimes think of it as a sculptor analogy.
              
              Some famous sculptors had an atelier full of students that helped
              them with mundane tasks, like carving out a basic shape from a
              block of stone.
              
              When the basic shape was done, the master came and did the rest.
              You may want to have the physical exercise of doing the work
              yourself, but maybe someone sometimes likes to do the fine work
              and leave the crude one to the AI.
       
              breuleux wrote 10 hours 1 min ago:
              In my case, it really depends what. I enjoy designing systems and
              domain-specific languages or writing libraries that work the way
              I think they should work.
              
              On the other hand, if e.g. I need a web interface to do
              something, the only way I can enjoy myself is by designing my own
              web framework, which is pretty time-consuming, and then I still
              need to figure out how to make collapsible sections in CSS and
              blerghhh. Claude can do that in a few seconds. It's a delightful
              moment of "oh, thank god, I don't have to do this crap anymore."
              
              There are many coding tasks that are just tedium, including 99%
              of frontend development and over half of backend development. I
              think it's fine to throw that stuff to AI. It still leaves a lot
              of fun on the table.
       
              loloquwowndueo wrote 10 hours 21 min ago:
              “I want my AI to do laundry and dishes so I can code, not for
              my AI to code so I can do laundry and dishes”
       
                moffkalast wrote 8 hours 9 min ago:
                Well it would be funnier if dishwashers, washing machines and
                dryers didn't automate that ages ago. It's literally one of the
                first things robots started doing for us.
       
                thewebguyd wrote 9 hours 28 min ago:
                This sums up my feelings almost exactly.
                
                I don't want LLMs, AI, and eventually Robots to take over the
                fun stuff. I want them to do the mundane, physical tasks like
                laundry and dishes, leave me to the fun creative stuff.
                
                But as we progress right now, the hype machine is pushing AI to
                take over art, photography, video, coding, etc. All the stuff I
                would rather be doing. Where's my house cleaning robot?
       
                  zelphirkalt wrote 7 hours 47 min ago:
                  I would like to go even further and say: Those things, art,
                  photography, video, coding ... They are forms of craft, human
                  expression, creativity. They are part of what makes life
                  interesting. So we are in the process of eliminating the
                  interesting and creative parts, in the name of profit and
                  productivity maxing (if any!). Maybe we can create the 100th
                  online platform for the same thing soon 10x faster! Wow!
                  
                  Of course this is a bit too black&white. There can still be a
                  creative human being introducing nuance and differences,
                  trying to get the automated tools to do things different in
                  the details or some aspects. Question is, losing all those
                  creative jobs (in absolute numbers of people doing them),
                  what will we as society, or we as humanity become? What's the
                  ETA on UBI, so that we can reap the benefits of what we
                  automated away, instead of filling the pockets of a few?
       
                minimaxir wrote 10 hours 8 min ago:
                Claude is very good at unfun-but-necessary coding tasks such as
                writing docstrings and type hints, which is a prominent
                instance of "laundry and dishes" for a dev.
       
                  mrguyorama wrote 9 hours 11 min ago:
                  >writing docstrings and type hints
                  
                  Disagree. Claude makes the same garbage worthless comments as
                  a Freshman CS student. Things like:
                  
                  // Frobbing the bazz
                  
                  res = util.frob(bazz);
                  
                  Or
                  
                  // If bif is True here then blorg
                  
                  if (bif){
                      blorg;
                  }
                  
                  Like wow, so insightful
                  
                  And it will ceaselessly try to auto complete your comments
                  with utter nonsense that is mostly grammatically correct.
                  
                  The most success I have had is using claude to help with
                  Spring Boot annotations and config processing (Because
                  documentation is just not direct enough IMO) and to rubber
                  duck debug with, where claude just barely edges out the
                  rubber duck.
       
                    minimaxir wrote 9 hours 8 min ago:
                    I intentionally said docstrings instead of comments.
                    Comments by default can be verbose on agents but a line in
                    the AGENTS.md does indeed wrangle modern agents to only
                    comment on high signal code blocks that are not
                    tautological.
       
                  loloquwowndueo wrote 9 hours 37 min ago:
                  “Sorry, the autogenerated api documentation was wrong
                  because the ai hallucinated the docstring”
       
                    theshrike79 wrote 6 hours 9 min ago:
                    You can't read?
                    
                    Please don't say you commit AI-generated stuff without
                    checking it first?
       
                re-thc wrote 10 hours 19 min ago:
                Soon you'll realize you're the "AI". We've lost control.
       
              nyadesu wrote 10 hours 24 min ago:
              In my case, I enjoy writing code too, but it's helpful to have an
              assistant I can ask to handle small tasks so I can focus on a
              specific part that requires attention to detail
       
                FeteCommuniste wrote 10 hours 10 min ago:
                Yeah, I sometimes use AI for questions like "is it possible to
                do [x] using library [y] and if so, how?" and have received
                mostly solid answers.
       
                  georgemcbay wrote 9 hours 45 min ago:
                  > Yeah, I sometimes use AI for questions like "is it possible
                  to do [x] using library [y] and if so, how?" and have
                  received mostly solid answers.
                  
                  In my experience most LLMs are going to answer this with some
                  form of "Absolutely!" and then propose a
                  square-peg-into-a-round-hole way to do it that is likely
                  suboptimal vs using a different library that is far more
                  suited to your problem if you didn't guess the right fit
                  library to begin with.
                  
                  The sycophancy problem is still very real even when the topic
                  is entirely technical.
                  
                  Gemini is (in my experience) the least likely to lead you
                  astray in these situations but its still a significant
                  problem even there.
       
                    jessoteric wrote 6 hours 43 min ago:
                    IME this has been significantly reduced in newer models
                    like 4.5 Opus and to a lesser extent Sonnet, but agree it's
                    still sort of bad- mainly because the question you're
                    posing is bad.
                    
                    if you ask a human this the answer can also often be "yes
                    [if we torture the library]", because software development
                    is magic and magic is the realm of imagination.
                    
                    much better prompt: "is this library designed to solve this
                    problem" or "how can we solve this problem? i am
                    considering using this library to do so, is that
                    realistic?"
       
                  stouset wrote 10 hours 0 min ago:
                  Or “can you prototype doing A via approaches X, Y, and Z,
                  and show me what each looks like?”
                  
                  I love to prototype various approaches. Sometimes I just want
                  to see which one feels like the most natural fit. The LLM can
                  do this in a tenth of the time I can, and I just need to get
                  a general idea of how each approach would feel in practice.
       
                    skydhash wrote 9 hours 43 min ago:
                    > Sometimes I just want to see which one feels like the
                    most natural fit.
                    
                    This sentence alone is a huge red flag in my books. Either
                    you know the problem domain and can argue about which
                    solution is better and why. Or you don't and what you're
                    doing are experiment to learn the domain.
                    
                    There's a reason the field is called Software Engineering
                    and not Software Art. Words like "feels" does not belongs.
                    It would be like saying which bridge design feels like the
                    most natural fit for the load. Or which material feels like
                    the most natural fit for a break system.
       
                      doug_durham wrote 6 hours 8 min ago:
                      Do you develop software?  Software unlike any physical
                      engineering field.  The complexity of any project beyond
                      the most trivial is beyond human ability to work with. 
                      You have to switch from analytic tools to more
                      probabilistic tools.  That where "feels", "smells", or
                      "looks" come in.  Software testing is not a solved
                      problem, unlike bridge testing.
       
                        skydhash wrote 5 hours 53 min ago:
                        So many FOSS software are made and maintained by a
                        single person. Much more are developer by a very small
                        teams. Probabilistic aren’t needed anywhere.
       
                      fluidcruft wrote 9 hours 10 min ago:
                      For example sometimes you're faced with choosing between
                      high-quality libraries to adopt and it's not particularly
                      clear whether you picked the wrong one until after you've
                      tried integrating them. I've found it can be pretty
                      helpful to let the LLM try them all and see where the
                      issues ultimately are.
       
                        skydhash wrote 7 hours 32 min ago:
                        > sometimes you're faced with choosing between
                        high-quality libraries to adopt and it's not
                        particularly clear whether you picked the wrong one
                        until after you've tried integrating them.
                        
                        Maybe I'm lucky, but I've never encountered this
                        situation. It has been mostly about what tradeoffs I'm
                        willing to make. Libraries are more line of codes added
                        to the project, thus they are liabilities. Including
                        one is always a bad decision, so I only do so because
                        the alternative is worse. Having to choose between two
                        is more like between Scylla and Charybdis (known
                        tradeoffs) than deciding to go left or right in a maze
                        (mystery outcome).
       
                          fluidcruft wrote 7 hours 5 min ago:
                          It probably depends on what you're working on. For
                          the most part relying on a high-quality
                          library/module that already implements a solution is
                          less code to maintain. Any problems with the shared
                          code can be fixed upstream with more eyeballs and
                          more coverage than anything I build locally. I prefer
                          to keep my eyeballs on things most related to my
                          domain and not maintain stuff that's both ultimately
                          not terribly important and replaceable (if push comes
                          to shove).
                          
                          Generally, you are correct that having multiple
                          libraries to choose among is concerning, but it
                          really depends. Mostly it's stylistic choices and it
                          can be hard to tell how it integrates before trying.
       
                      mjr00 wrote 9 hours 32 min ago:
                      > There's a reason the field is called Software
                      Engineering and not Software Art. Words like "feels" does
                      not belongs.
                      
                      Software development is nowhere near advanced enough for
                      this to be true. Even basic questions like "should this
                      project be built in Go, Python, or Rust?" or "should this
                      project be modeled using OOP and domain-driven design,
                      event-sourcing, or purely functional programming?" are
                      decided largely by the personal preferences of whoever
                      the first developer is.
       
                        skydhash wrote 7 hours 42 min ago:
                        Such questions may be decided by personal preferences,
                        but their impact can easily be demonstrated. Such
                        impacts are what F. Brooks calls accidental complexity
                        and we generally called technical debt. It's just that,
                        unlike other engineering fields, there are not a lot of
                        physical constraints and the decision space have much
                        more dimensions.
       
                          mjr00 wrote 7 hours 24 min ago:
                          > Such questions may be decided by personal
                          preferences, but their impact can easily be
                          demonstrated.
                          
                          I really don't think this is true. What was the
                          demonstrated impact of writing Terraform in Go rather
                          than Rust? Would writing Terraform in Rust have
                          resulted in a better product? Would rewriting it now
                          result in a better product? Even among engineers with
                          15 years experience you're going to get differing
                          answers on this.
       
                            skydhash wrote 7 hours 7 min ago:
                            The impact is that now, if you want to modify the
                            project in some way, you will need to learn Go.
                            It's like all the codebases in COBOL. Maybe COBOL
                            at that time was the best language for the product,
                            but now, it's not that easy to find someone with
                            the knowledge to maintain the system. As soon as
                            you make a choice, you accept that further down the
                            line, there will be some X cost to keep going in
                            that direction and some Y cost to revert. As a
                            technical lead, more often you need to ensure that
                            X or/and Y don't grow to be enormous.
       
                              mjr00 wrote 6 hours 47 min ago:
                              > The impact is that now, if you want to modify
                              the project in some way, you will need to learn
                              Go.
                              
                              That's tautologically true, yes, but your claim
                              was
                              
                              > Either you know the problem domain and can
                              argue about which solution is better and why. Or
                              you don't and what you're doing are experiment to
                              learn the domain.
                              
                              So, assuming the domain of infrastructure-at-code
                              is mostly known now which is a fair statement --
                              which is a better choice, Go or Rust, and why?
                              Remember, this is objective fact, not art, so no
                              personal preferences are allowed.
       
                                KronisLV wrote 3 hours 1 min ago:
                                > So, assuming the domain of
                                infrastructure-as-code is mostly known now
                                which is a fair statement -- which is a better
                                choice, Go or Rust, and why? Remember, this is
                                objective fact, not art, so no personal
                                preferences are allowed.
                                
                                I think it’s possible to engage with
                                questions like these head on and try to find an
                                answer.
                                
                                The problem is that if you want the answer to
                                be close to accurate, you might need both a lot
                                of input data about the situation (including
                                who’d be working with and maintaining the
                                software, what are their skills and weaknesses;
                                alongside the business concerns that impact the
                                timeline, the scale at which you’re working
                                with and a 1000 other things), as well as the
                                output of concrete suggestions might be a
                                flowchart so big it’d make people question
                                their sanity.
                                
                                It’s not impossible, just impractical with a
                                high likelihood of being wrong due to bad or
                                insufficient data or interpretation.
                                
                                But to humor the question: as an example, if
                                you have a small to mid size team with run of
                                the mill devs that have some traditional OOP
                                experience and have a small to mid
                                infrastructure size and complexity, but also
                                have relatively strict deadlines, limited
                                budget and only average requirements in regards
                                to long term maintainability and correctness
                                (nobody will die if the software doesn’t work
                                correctly every single time), then Go will be
                                closer to an optimal choice.
                                
                                I know that because I built an environment
                                management solution in Go, trying to do that in
                                Rust in the same set of circumstances
                                wouldn’t have been successful, objectively
                                speaking. I just straight up wouldn’t have
                                iterated fast enough to ship. Of course, I can
                                only give such a concrete answer for that very
                                specific set of example circumstances after the
                                fact. But even initially those factors pushed
                                me towards Go.
                                
                                If you pull any number of levers in a different
                                direction (higher correctness requirements,
                                higher performance requirements, different team
                                composition), then all of those can influence
                                the outcome towards Rust. Obviously every
                                detail about what a specific system must do
                                also influences that.
       
                                skydhash wrote 5 hours 44 min ago:
                                Neither. Because the solution for IaC is not Go
                                or Rust, just like the solution for composing
                                music is not a piano or a violin.
                                
                                A solution may be Terraform, another is
                                Ansible,… To implement that solution, you
                                need a programming language, but by then
                                you’re solving accidental complexity, not the
                                essential one attached to the domain. You may
                                be solving, implementation speed, hiring costs,
                                code safety,… but you’re not solving IaC.
       
                  nottorp wrote 10 hours 4 min ago:
                  Just be careful if functionality varies between library y
                  version 2 and library y version 3, or if there is a similarly
                  named library y2 that isn't the same.
                  
                  You may get possibilities, but not for what you asked for.
       
                    pdntspa wrote 9 hours 43 min ago:
                    If you run to the point where you can execute each idea and
                    examine its outputs, problems like that surface pretty
                    quickly
       
                      nottorp wrote 9 hours 32 min ago:
                      Of course, by that time i could have read the docs for
                      library y the version I'm using...
       
                        pdntspa wrote 9 hours 20 min ago:
                        There are many roads to Rome...
       
              pdntspa wrote 10 hours 29 min ago:
              Me writing code is me spending 3/4 of my time wading through
              documentation and google searches. It's absolutely hell on my
              ADD. My ability to memorize is absolutely garbage. Throughout my
              career I've worked in like 10 different languages, and in any
              given project I'm usually working in at least 3 or 4. There's a
              lot of "now what is a map operation in this stupid fucking
              language called again?!"
              
              Claude writing code gets the same output if not better in about
              1/10 of the time.
              
              That's where you realize that the writing code bits are just one
              small part of the overall picture. One that I realize I could do
              without.
       
                skydhash wrote 9 hours 39 min ago:
                I would say notetaking would be a much bigger help than Claude
                at this point. There's a lot of methods to organize information
                that I believe would help you, better than an hallucination
                machine.
       
                  neoromantique wrote 9 hours 34 min ago:
                  Notetaking with ADHD is another sort of hell to be honest.
                  
                  I absolutely can attest to what parent is saying, I have been
                  developing software in Python for nearly a decade now and I
                  still routinely look up the /basics/.
                  
                  LLM's have been a complete gamechanger to me, being able to
                  reduce the friction of "ok let me google what I need in a
                  very roundabout way my memory spit it out" to a fast and
                  often inline llm lookup.
       
                    theshrike79 wrote 6 hours 12 min ago:
                    This is the thing. I _know_ what the correct solution looks
                    like.
                    
                    But figuring out what is the correct way in this particular
                    language is the issue.
                    
                    Now I can get the assistant to do it, look at it and go
                    "yep, that's how you iterate over an array of strings".
       
                    skydhash wrote 7 hours 49 min ago:
                    Looking up documentation is normal. If not, we wouldn't
                    have the manual pages in Unix and such an emphasis on
                    documentation in ecosystems like Lisp, Go, Python, Perl,...
                    We even have cheatsheets and syntax references books
                    because it's just so easy to forget the /basics/.
                    
                    I said notetaking, but it's more about building your own
                    index. In $WORK projects, I mostly use the browser
                    bookmarks, the ticket system, the PR description and
                    commits to contextually note things. In personal projects,
                    I have an org-mode file (or a basic text file) and a lot of
                    TODO comments.
       
                      neoromantique wrote 5 hours 16 min ago:
                      It is very hard to explain the extent of it to a person
                      who did not experience it, really.
                      
                      I have over a decade of experience, I do this stuff
                      daily, I don't think I can write a 10 line bash/python/js
                      script without looking up the docs at least a couple
                      times.
                      
                      I understand exactly what I need to write, but exact form
                      eludes my brain, so this Levenshtein-distance-on-drugs
                      machine that can parse my rambling + surrounding context
                      into valid syntax for what I need right at that time is
                      invaluable and I would even go as far as saying life
                      changing.
                      
                      I understand and hold high level concepts alright, I know
                      where stuff is in my codebase, I understand how it all
                      works down to very low levels, but the minutea of
                      development is very hard due to how my memory works (and
                      has always worked).
       
                        skydhash wrote 2 hours 46 min ago:
                        What I'm saying is that is normal. Unless you've worked
                        everyday with the same language and a very small set of
                        functions, you're bound to forget signature and syntax.
                        What I'm advocating is a faster retrieval of the
                        correct information.
       
                          neoromantique wrote 50 min ago:
                          >Unless you've worked everyday with the same language
                          
                          ...I did.
       
                      pdntspa wrote 5 hours 49 min ago:
                      And all that take rote mechanical work. Which can quickly
                      lead to fractured focus and now suddenly I'm pulled out
                      of my flow.
                      
                      Or I can farm that stuff to an LLM, stay in my flow, and
                      iterate at a speed that feels good.
       
                tayo42 wrote 10 hours 18 min ago:
                How do you end up with 3 to 4 languages in one project?
       
                  theshrike79 wrote 6 hours 10 min ago:
                  Go for the backend, something javascripty for the front end.
                  You're already at two. Depending if you count HTML, CSS or
                  SQL as "languages", you're up to a half dozen pretty quick.
       
                  jessoteric wrote 6 hours 41 min ago:
                  i find it's pretty rare to have a project that only consists
                  of one or two languages, over a certain complexity/feature
                  threshold
       
                  zelphirkalt wrote 7 hours 57 min ago:
                  3 or 4 can very easily accumulate. For example: HTML, CSS as
                  must know, plus some JS/TS (actually that's 2 langs!) for
                  sprinkles of interactivity, backend in any proper backend
                  language. Oh wait, there is a fifth language, SQL, because we
                  need to access the database. Ah and those few shell scripts
                  we need? Someone's gotta write those too. They may not always
                  be full programming languages, but languages they are, and
                  one needs to know them.
       
                  merely-unlikely wrote 9 hours 38 min ago:
                  Recently I've been experimenting with using multiple
                  languages in some projects where certain components have a
                  far better ecosystem in one language but the majority of the
                  project is easier to write in a different one.
                  
                  For example, I often find Python has very mature and
                  comprehensive packages for a specific need I have, but it is
                  a poor language for the larger project (I also just hate
                  writing Python). So I'll often put the component behind a
                  http server and communicate that way. Or in other cases I've
                  used Rust for working with WASAPI and win32 which has some
                  good crates for it, but the ecosystem is a lot less mature
                  elsewhere.
                  
                  I used to prefer reinventing the wheel in the primary project
                  language, but I wasted so much time doing that. The tradeoff
                  is the project structure gets a lot more complicated, but
                  it's also a lot faster to iterate.
                  
                  Plus your usual html/css/js on the frontend and something
                  else on the backend, plus SQL.
       
                  pdntspa wrote 9 hours 48 min ago:
                  Oh my sweet summer child...
       
                  tomgp wrote 10 hours 6 min ago:
                  HTML, CSS, Javascript?
       
                  saulpw wrote 10 hours 8 min ago:
                  Typescript on the frontend, Python on the backend, SQL for
                  the database, bash for CI.  This isn't even counting HTML/CSS
                  or the YAML config.
       
                    tayo42 wrote 9 hours 34 min ago:
                    I wouldn't call html, yaml or css languages.
                    
                    Same for sql, do you really context switch between sql and
                    other code that frequently?
                    
                    Everyone should stop using bash, especially if you have a
                    scripting language you can use already.
       
                      wosat wrote 7 hours 16 min ago:
                      Sorry for being pedantic, but what does the "L" stand for
                      in HTML, YAML, SQL?
                      They may not be "programming languages" or, in the case
                      of SQL, a "general purpose programming language", but
                      they are indeed languages.
       
                      pdntspa wrote 9 hours 23 min ago:
                      Dude have you even written any hardcore SQL? plpgSQL is
                      very much a turing-complete language
       
                n4r9 wrote 10 hours 26 min ago:
                May be a domain issue? If you're largely coding within a JS
                framework (which most software devs are tbf) then that makes
                total sense. If you're working in something like fintech or
                games, perhaps less so.
       
                  pdntspa wrote 10 hours 21 min ago:
                  My last job was a mix of Ruby, Python, Bash, SQL, and
                  Javascript (and CSS and HTML). One or two jobs before that it
                  was all those plus a smattering of C. A few jobs before that
                  it was C# and Perl.
       
            n4r9 wrote 10 hours 32 min ago:
            I think we have different opinions on what's fun and what's boring!
       
              Nemi wrote 9 hours 19 min ago:
              You've really hit the crux of the problem and why so many people
              have differing opinions about AI coding. I also find coding more
              fun with AI. The reason is that my main goal is to solve a
              problem, or someone else's problem, in a way that is satisfying.
              I don't much care about the code itself anymore. I care about the
              thing that it does when it's done.
              
              Having said that I used to be deep into coding and back then I am
              quite sure that I would hate AI coding for me. I think for me it
              comes down to – when I was learning about coding and stretching
              my personal knowledge in the area, the coding part was the fun
              part because I was learning. Now that I am past that part I
              really just want to solve problems, and coding is the means to
              that end. AI is now freeing because where I would have been
              reluctant to start a project, I am more likely to give it a go.
              
              I think it is similar to when I used to play games a lot. When I
              would play a game where you would discover new items regularly, I
              would go at it hard and heavy up until the point where I
              determined there was either no new items to be found or it was
              just "more of the same". When I got to that point it was like a
              switch would flip and I would lose interest in the game almost
              immediately.
       
                altmanaltman wrote 2 hours 11 min ago:
                A few counterpoints:
                
                1. If you don't care about code and only care about the "thing
                that it does when it's done", how do you solve problems in a
                way that is satisfying? Because you are not really solving any
                problem but just using the AI to do it. Is prompting more
                satisfying than actually solving?
                
                2. You claim you're done "learning about coding and stretching
                my personal knowledge in the area" but don't you think that's
                super dangerous? Like how can you just be done with learning
                when tech is constantly changing and new things come up
                everyday. In that sense, don't you think AI use is actually
                making you learn less and you're just justifying it with the
                whole "I love solving problems, not code" thing?
                
                3. If you don't care about the code, do the people who hire you
                for it do? And if they do, then how can you claim you don't
                care about the code when you'll have to go through a review
                process and at least check the code meaning you have to care
                about the code itself, right?
       
                  danielmarkbruce wrote 59 min ago:
                  are you really solving the problem, or is the compiler doing
                  it?
       
                    altmanaltman wrote 25 min ago:
                    is the compiler really solving the problem or the
                    electricity flowing through the machine?
       
                      ukuina wrote 3 min ago:
                      Is it the electricity, or is it quantum entanglement with
                      Roko's Basilisk?
       
                  keeda wrote 1 hour 4 min ago:
                  Note I'm not saying one is better than the other, but my
                  takes:
                  
                  1. The problem solving is in figuring out what to prompt,
                  which includes correctly defining the problem, identifying a
                  potential solution, designing an architecture, decomposing it
                  into smaller tasks, and so on.
                  
                  Giving it a generic prompt like "build a fitness tracker"
                  will result in a fully working product but it will be bland
                  as it would be the average of everything in its training
                  data, and won't provide any new value. Instead, you probably
                  want to build something that nobody else has, because that's
                  where the value is. This will require you to get pretty deep
                  into the problem domain, even if the code itself is
                  abstracted away from you.
                  
                  Personally, once the shape of the solution and the code is
                  crystallized in my head typing it out is a chore. I'd rather
                  get it out ASAP, get the dopamine hit from seeing it work,
                  and move on to the next task. These days I spend most of my
                  time exploring the problem domain rather than writing code.
                  
                  2. Learning still exists but at a different level; in fact it
                  will be the only thing we will eventually be doing. E.g. I'm
                  doing stuff today that I had negligible prior background in
                  when I began. Without AI, I would probably require an
                  advanced course to just get upto speed. But now I'm learning
                  by doing while solving new problems, which is a brand new way
                  of learning! Only I'm learning the problem domain rather than
                  the intricacies of code.
                  
                  3. Statistically speaking, the people who hire us don't
                  really care about the code, they just want business results.
                  (See: the difficulty of funding tech debt cleanup projects!)
                  
                  Personally, I still care about the code and review
                  everything, whether written by me or the AI. But I can see
                  how even that is rapidly becoming optional.
                  
                  I will say this: AI is rapidly revolutionizing our field and
                  we need to adapt just as quickly.
       
                    skydhash wrote 1 min ago:
                    > The problem solving is in figuring out what to prompt,
                    which includes correctly defining the problem, identifying
                    a potential solution, designing an architecture,
                    decomposing it into smaller tasks, and so on
                    
                    Coding is just a formal specification, one that is suited
                    to be automatically executed by a dumb machine. The nice
                    trick is that the basic semantics units from a programming
                    language are versatile enough to give you very powerful
                    abstractions that can fit nicely with the solution your are
                    designing.
                    
                    > Personally, once the shape of the solution and the code
                    is crystallized in my head typing it out is a chore
                    
                    I truly believe that everyone that says that typing is a
                    chore once they've got the shape of a solution get
                    frustrated by the amount of bad assumptions they've made.
                    That ranges from not having a good design in place to not
                    learning the tools they're using and fighting it during the
                    implementation (Like using React in an imperative manner).
                    You may have something as extensive as a network protocol
                    RFC, and still got hit by conflict between the specs and
                    what works.
       
                    altmanaltman wrote 20 min ago:
                    Honestly, I fundamentally disagree with this. Figuring out
                    "what to prompt" is not problem-solving in a true sense
                    imo. And if you're really going too deep into the problem
                    domain, what is the point of having the code abstracted?
                    
                    My comment was based on you saying you don't care about the
                    code and only what it does. But now you're saying you care
                    about the code and review everything so I'm not sure what
                    to make out of it. And again, I fundamentally disagree that
                    reviewing code will become optional or rather should become
                    optional. But that's my personal take.
       
                  pdntspa wrote 1 hour 23 min ago:
                  Why can't both things be true? You can care about the code
                  even if you don't write it. You can continue learning things
                  by reading said code. And you can very rigidly enforce code
                  quality guidelines and require the AI adhere to them.
       
                    altmanaltman wrote 26 min ago:
                    I mean if you're reading it and "rigidly" enforcing code
                    quality guidelines, then you do care about the code, right?
                    But the parent comment said they don't care about the code
                    but what it does. Both of them cannot be true at the same
                    time, since in your example, you do care about the code
                    enough to read it and refactor it based on guidelines and
                    not just "what the code" does.
       
                libraryofbabel wrote 5 hours 59 min ago:
                I like this framing; I think it captures some of the key
                differences between engineers who are instinctively
                enthusiastic about AI and those who are not.
                
                Many engineers walk a path where they start out very focussed
                on programming details, language choice, and elegant or clever
                solutions. But if you're in the game long enough, and
                especially if you're working in medium-to-large engineering
                orgs on big customer-facing projects, you usually kind of move
                on from it. Early in my career I learned half a dozen
                programming languages and prided myself on various arcane arts
                like metaprogramming tricks. But after a while you learn that
                one person's clever solution is another person's
                maintainability nightmare, and maybe being as boring and
                predictable and direct as possible in the code (if slightly
                more verbose) would have been better. I've maintained some
                systems written by very brilliant programmers who were just
                being too clever by half.
                
                You also come to realize that coding skills and language choice
                don't matter as much as you thought, and the big issues in
                engineering are 1) are you solving the right problem to begin
                with 2) people/communication/team dynamics 3) systems
                architecture, in that order of importance.
                
                And also, programming just gets a little repetitive after a
                while. Like you say, after a decade or so, it feels a bit like
                "more of the same." That goes especially for most of the
                programming most of us are doing most of the time in our day
                jobs. We don't write a lot of fancy algorithms, maybe once in a
                blue moon and even then you're usually better off with a
                library. We do CRUD apps and cookie-cutter React pages and so
                on and so on.
                
                If AI coding agents fall into your lap once you've reached that
                particular variation of a mature stage in your engineering
                career, you probably welcome them as a huge time saver and a
                means to solve problems you care about faster. After a decade,
                I still love engineering, but there aren't may coding tasks I
                particularly relish diving into. I can usually vaguely picture
                the shape of the solution in my head out the gate, and actually
                sitting down and doing it feels rather a bore and just a lot of
                typing and details. Which is why it's so nice when I can kick
                off a Claude session to do it instead, and review the results
                to see if they match what I had in mind.
                
                Don't get me wrong. I still love programming if there's just
                the right kind of compelling puzzle to solve (rarer and rarer
                these days), and I still pride myself on being able to do it
                well. Come the holidays I will be working through Advent of
                Code with no AI assistance whatsoever, just me and vim. But
                when January rolls around and the day job returns I'll be
                having Claude do all the heavy lifting once again.
       
                  skydhash wrote 2 hours 30 min ago:
                  I'm guessing, but I'm pretty sure you're dealing with big
                  balls of mud which has dampened your love of coding. Where
                  implementing something is more about solving accidental
                  complexity and dealing with technical debts than actually
                  doing the job.
       
                    libraryofbabel wrote 1 hour 46 min ago:
                    I've seen some balls of mud, sure, but I don't think that's
                    the essence of it. It's more like:
                    
                    1) When I already have a rough picture of the solution to
                    some programming task in my head up front, I do not
                    particularly look forward to actually going and doing it.
                    I've done enough programming that many things feel like a
                    variation on something I've done before. Sometimes the task
                    is its own reward because there is a sufficiently hard and
                    novel puzzle to solve. Mostly it is not and it's just a
                    matter of putting in the time. Having Claude do most of the
                    work is perfect in those cases. I don't think this is
                    particularly anything to do with working on a ball of mud:
                    it applies to most kinds of work on clean well-architected
                    projects as well.
                    
                    2) I have a restless mind and I just don't find doing
                    something that interesting anymore once I have more or less
                    mastered it. I'd prefer to be learning some new field
                    (currently, LLMs) rather than spending a lot of time doing
                    something I already know how to do. This is a matter of
                    temperament: there is nothing wrong with being content in
                    doing a job you've mastered. It's just not me.
       
                      skydhash wrote 19 min ago:
                      > 1) When I already have a rough picture of the solution
                      to some programming task in my head up front, I do not
                      particularly look forward to actually going and doing it.
                      
                      Every time I think I have a rough picture of some
                      solution, there's always something in the implementation
                      that proves me wrong. Then it's reading docs and figuring
                      whatever gotchas I've stepped into. Or where I erred in
                      understanding the specifications. If something is that
                      repetitive, I refactor and try to make it simple.
                      
                      > I have a restless mind and I just don't find doing
                      something that interesting anymore once I have more or
                      less mastered it.
                      
                      If I've mastered something (And I don't believe I've done
                      so for pretty much anything), the next step is always
                      about eliminating the tedium of interacting with that
                      thing. Like a code generator for some framework or adding
                      special commands to your editor for faster interaction
                      with a project.
       
                ben_w wrote 6 hours 12 min ago:
                > > I think we have different opinions on what's fun and what's
                boring!
                
                > You've really hit the crux of the problem and why so many
                people have differing opinions about AI coding.
                
                Part of it perhaps, but there's also a huge variation in model
                output. I've been getting some surprisingly bad generations
                from ChatGPT recently, though I'm not sure if that's ChatGPT
                getting worse or me getting used to a much higher quality of
                code from Claude Code which seems to test itself before saying
                "done". I have no idea if my opinion will flip again now 5.2 is
                out.
                
                And some people are bad communicators, an important skill for
                LLMs, though few will recognise it because everyone knows what
                they themselves meant by whatever words they use.
                
                And some people are bad planners, likewise an important skill
                for breaking apart big tasks that LLMs can't do into small ones
                they can do.
       
                  danielmarkbruce wrote 37 min ago:
                  This isn't just in coding. My goodness the stuff I see people
                  write into an LLM and then say "see! It's stupid!". Some
                  people are naturally good at prompting and some people just
                  are not. The differences in output are dramatic.
       
                breuleux wrote 7 hours 18 min ago:
                I think it ultimately comes down to whether you care more about
                the what, or more about the how. A lot of coders love the
                craft: making code that is elegant, terse, extensible,
                maintainable, efficient and/or provably correct, and so on.
                These are the kind of people who write programming languages,
                database engines, web frameworks, operating systems, or small
                but nifty utilities. They don't want to simply solve a problem,
                they want to solve a problem in the "best" possible way
                (sometimes at the expense of the problem itself).
                
                It's typically been productive to care about the how, because
                it leads to better maintainability and a better ability to
                adapt or pivot to new problems. I suppose that's getting less
                true by the minute, though.
       
                  doug_durham wrote 6 hours 15 min ago:
                  Crafting code can be self-indulgent since most common
                  patterns have been implemented multiple times in multiple
                  languages.  A lot of time the craft oriented developer will
                  reject an existing implementation because it doesn't match
                  their sensibilities.  There is absolutely a role for craft,
                  however the amount of craft truly needed in modern
                  development is not as large as people would like. There are
                  lots of well crafted libraries and frameworks that can be
                  adopted if you are willing to accommodate their world view.
       
                    breuleux wrote 5 hours 33 min ago:
                    As someone who does that a lot... I agree. Self-indulgent
                    is the word. It just feels great when the implementation is
                    a perfect fit for your brain, but sometimes that's just not
                    a good use of your time.
                    
                    Sometimes, you strike gold, so there's that.
       
                      sfn42 wrote 4 hours 11 min ago:
                      I kind of struggle with this. I basically hate everyone
                      elses code, and by that I mean I hate most people's code.
                      A lot of people write awesome code but most people write
                      what I'd call trash code.
                      
                      And I do think there's more to it than preference. Like
                      there's actual bugs in the code, it's confusing and
                      because it's confusing there's more bugs. It's solving a
                      simple problem but doing so in an unnecessarily
                      convoluted way. I can solve the same problem in a much
                      simpler way. But because everything is like this I can't
                      just fix it, there's layers and layers of this
                      convolution that can't just be fixed and of course
                      there's no proper decoupling etc so a refactor is kind of
                      all or nothing. If you start it's like pulling on a
                      thread and everything just unravels.
                      
                      This is going to sound pompous and terrible but honestly
                      some times I feel like I'm too much better than other
                      developers. I have a hard time collaborating because the
                      only thing I really want to do with other people's code
                      is delete it and rewrite it. I can't fix it because it
                      isn't fixable, it's just trash. I wish they would have
                      talked to me before writing it, I could have helped then.
                      
                      Obviously in order to function in a professional
                      environment i have to suppress this stuff and just let
                      the code be ass but it really irks me. Especially if I
                      need to build on something someone else made - itsalmost
                      always ass, I don't want to build on a crooked
                      foundation. I want to fix the foundation so the rest of
                      the building can be good too. But there's no time and
                      it's exhausting fixing everyone else's messes all the
                      time.
       
                        pdntspa wrote 1 hour 15 min ago:
                        I feel this too. And it seems like the very worst code
                        always seems to come from the people that seem the
                        smartest, otherwise. I've worked for a couple of people
                        that are either ACM alum and/or have their own
                        wikipedia page, multiple patents to their name and
                        leaders in business, and beyond anyone else that I have
                        ever worked with, their code has been the worst.
                        
                        Which is part of what I find so motivating with AI. It
                        is much better at making sense of that muck, and with
                        some guidance it can churn out code very quickly with a
                        high degree of readability.
       
                          danielmarkbruce wrote 48 min ago:
                          did you ever consider their code was good and it's
                          you that is the problem?
       
                        gmueckl wrote 2 hours 53 min ago:
                        I can guarantee you that if you were to write a
                        completely new program and continued to work on it for
                        more than 5 years, you'd feel the same things about
                        your own code eventually. It's just unavoidable at some
                        point. The only thing left  then is degrees badness.
                        And nothing is more humbling than realizing that the
                        only person that got you there is yourself.
       
                        KronisLV wrote 3 hours 19 min ago:
                        I’ve linked this before, but I feel like this might
                        resonate with you:
                        
 (HTM)                  [1]: https://www.stilldrinking.org/programming-suck...
       
                          sfn42 wrote 2 hours 56 min ago:
                          I enjoyed that but honestly it kind of doesn't really
                          resonate. Because it's like "This stuff is really
                          complicated and nobody knows how anything works etc
                          and that's why everything is shit".
                          
                          I'm talking about simple stuff that people just can't
                          do right. Not complex stuff. Like imagine some
                          perfect little example code on the react docs or
                          whatever, good code. Exemplary code. Trivial code
                          that does a simple little thing. Now imagine some
                          idiot wrote code to do exactly the same thing but
                          made it 8 times longer and incredibly convoluted for
                          absolutely no reason and that's basically what most
                          "developers" do. Everyone's a bunch of stupid
                          amateurs who can't do simple stuff right, that's my
                          problem. It's not understandable, it's not
                          justifiable, it's not trading off quality for speed.
                          It's stupidity, ignorance and lazyness.
                          
                          That's why we have coding interviews that are
                          basically "write fizzbuzz while we watch" and when I
                          solve their trivial task easily everyone acts like
                          I'm Jesus because most of my peers can't fucking
                          code. Like literally I have colleagues with years of
                          experience who are barely at a first year CS level.
                          They don't know the basics of the language they've
                          been working with for years. They're amateurs.
       
                            KronisLV wrote 2 hours 45 min ago:
                            Then it’s quite possible that you’re working in
                            an environment that naturally leads to people like
                            that getting hired. If that’s something you see
                            repeatedly, then the environment isn’t a good fit
                            for you and you aren’t a good fit for it. So
                            you’d be better served by finding a place where
                            the standards are as high as you want, from the
                            very first moment in the hiring process.
                            
                            For example, Oxide Computers has a really
                            interesting approach [1] Obviously that’s easier
                            said than done but there are quite a few orgs out
                            there like that. If everyone around you doesn’t
                            care about something or can’t do it, it’s
                            probably a systemic problem with the environment.
                            
 (HTM)                      [1]: https://oxide.computer/careers
       
                agumonkey wrote 7 hours 32 min ago:
                it's true that 'code' doesn't mean much, but the ability to
                manage different layers, states to produce logic modules was
                the challenge
                
                getting things solved entirely feels very very numbing to me
                
                even when gemini or chatgpt solves it well, and even beyond
                what i'd imagine.. i feel a sense of loss
       
                pdntspa wrote 9 hours 8 min ago:
                You are hitting the nail on the head. We are not being hired to
                write code. We are being hired to solve problems. Code is
                simply the medium.
       
                  agumonkey wrote 7 hours 30 min ago:
                  but do you solve the problem if you just slap a prompt and
                  iterate while the LLM gathers diffs ?
       
                    pdntspa wrote 5 hours 52 min ago:
                    If the client is happy, the code is well-formed, and it
                    solves their problem is a cost-effective manner, what is
                    not to like?
       
                      agumonkey wrote 5 hours 17 min ago:
                      cause the 'dev' didn't solve anything
                      
                      ultimately i wonder how long people will need devs at all
                      if you can all prompt your wishes
                      
                      some will be kept to fix the occasional hallucination and
                      that's it
       
                    ben_w wrote 6 hours 15 min ago:
                    Depends what the problem is.
                    
                    Sometimes you can, sometimes you have to break the problem
                    apart and get the LLM to do each bit separately, sometimes
                    the LLM goes funny and you need to solve it yourself.
                    
                    Customers don't want you wasting money doing by hand what
                    can be automated, nor do they want you ripping them off by
                    blindly handing over unchecked LLM output when it can't be
                    automated.
       
                      agumonkey wrote 5 hours 15 min ago:
                      there are other ways: being scammed by lazy devs using AI
                      to produce what devs normally do and not saving any money
                      for the customer. i mentioned it in another thread, i
                      heard first hand people say "i will never report how much
                      time savings i get from gemini, at best i'll say 1 day a
                      month"
       
                    eclipxe wrote 6 hours 37 min ago:
                    Yes?
       
                  wahnfrieden wrote 8 hours 37 min ago:
                  I believe wage work has a significant factor in all this.
                  
                  Most are not paid for results, they're paid for time at desk
                  and regular responsibilities such as making commits,
                  delivering status updates, code reviews, etc. - the daily
                  activities of work are monitored more closely than the
                  output. Most ESOP grant such little equity that working
                  harder could never observably drive an increase in its value.
                  Getting a project done faster just means another project to
                  begin sooner.
                  
                  Naturally workers will begin to prefer the motions of the
                  work they find satisfying more than the result it has for the
                  business's bottom line, from which they're alienated.
       
                    order-matters wrote 5 hours 42 min ago:
                    I think it's related.  The nature of the wage work likely
                    also self-selects for people who simply enjoy coding and
                    being removed from the bigger picture problems they are
                    solving.
                    
                    Im on the side of only enjoy coding to solve problems and i
                    skipped software engineering and coding for work explicitly
                    because i did not want to participate in that dynamic of
                    being removed from the problems.  instead i went into
                    business analytics, and now that AI is gaining traction I
                    am able to do more of what I love - improving processes and
                    automation - without ever really needing to "pay dues"
                    doing grunt work I never cared to be skilled at in the
                    first place unless it was necessary.
       
                    Sammi wrote 5 hours 56 min ago:
                    > Naturally workers will begin to prefer the motions of the
                    work they find satisfying more than the result it has for
                    the business's bottom line, from which they're alienated.
                    
                    Wow. I've read a lot of hacker news this past decade, but
                    I've never seen this articulated so well before. You really
                    lifted the veil for me here. I see this everywhere, people
                    thinking the work is the point, but I haven't been able to
                    crystallize my thoughts about it like you did just now.
       
                      thenewwazoo wrote 3 hours 2 min ago:
                      Marx had a lot of good ideas, though you wouldn't know it
                      by listening to capitalist-controlled institutions.
                      
 (HTM)                [1]: https://en.wikipedia.org/wiki/Marx%27s_theory_of...
       
              embedding-shape wrote 9 hours 29 min ago:
              Some people are into designing software, others like to put the
              design into implementation, others like cleaning up
              implementations yet others like making functional software
              faster.
              
              There is enough work for all of us to be handsomely paid while
              having fun doing it :) Just find what you like, and work with
              others who like other stuff, and you'll get through even the
              worst of problems.
              
              For me the fun comes not from the action of typing stuff with my
              sausage fingers and seeing characters end up on the screen, but
              basically everything before that and after that. So if I can make
              "translate what's in my head into source on disk something can
              run" faster, that's a win in my book, but not if the quality
              degrades too much, so tight control over it still not having to
              use my fingers to actually type.
       
                mkehrt wrote 8 hours 30 min ago:
                I've found that good tab AI-based tab completion is the sweet
                spot for me.  I am still writing code, but I don't have to type
                all of it if it's obvious.
       
                  OkayPhysicist wrote 5 hours 17 min ago:
                  This has been my approach, as well. I've got a neovim setup
                  where I can 1) open up a new buffer, ask a question, and then
                  copy/paste from it and 2) prompt the remainder of the line,
                  function, or class. (the latter two are commands I run,
                  rather than keybinds).
       
              AStrangeMorrow wrote 9 hours 33 min ago:
              I really enjoy writing some of the code. But some is a pain.
              Never have fun when the HQ team asks for API changes for the 5th
              time this month.  Or for that matter writing the 2000 lines of
              input and output data validation in the first place. Or
              refactoring that ugly dictionary passed all over the place to be
              a proper class/dataclass. Handling config changes. Lots of that
              piping job.
              
              Some tasks I do enjoy coding. Once in the flow it can be quite
              relaxing.
              
              But mostly I enjoy the problem solving part: coming up with the
              right algorithm, a nice architecture , the proper set of metrics
              to analyze etc
       
              moffkalast wrote 9 hours 56 min ago:
              He's a real straight shooter with upper management written all
              over him.
       
                SoftTalker wrote 8 hours 40 min ago:
                Ummm, yeah... I’m gonna have to go ahead and sort of disagree
                with you there.
       
                wpasc wrote 9 hours 43 min ago:
                but what would you say... you do here?
       
          fudged71 wrote 10 hours 36 min ago:
          This tells me that we need to build 1000 more linters of all kinds
       
            xnorswap wrote 10 hours 24 min ago:
            Unironically I agree.
            
            One under-discussed lever that senior / principal engineers can
            pull is the ability to write linters & analyzers that will stop
            junior engineers ( or LLMs ) from doing something stupid that's
            specific to your domain.
            
            Let's say you don't want people to make async calls while owning a
            particular global resource, it only takes a few minutes to write an
            analyzer that will prevent anyone from doing so.
            
            Avoid hours of back-and-forth over code review by encoding your
            preferences and taste into your build pipeline and stop it at
            source.
       
              jmalicki wrote 10 hours 14 min ago:
              And for more complex linters I find that it can be easy to get
              the LLM to write most of it itself!!!
       
          james_marks wrote 10 hours 47 min ago:
          This is a key part of the AI love/hate flame war.
          
          Very easy to write it off when it spins out on the open-ended
          problems, without seeing just how effective it can be once you zoom
          in.
          
          Of course, zooming in that far gives back some of the promised gains.
          
          Edit: typo
       
            hombre_fatal wrote 10 hours 35 min ago:
            Go one level up:
            
                claude2() {
                  claude "$(claude "Generate a prompt and TODO list that works
            towards this goal: $*" -p)"
                }
            
                $ claude2 pls give ranked ideas for make code better
       
            thewebguyd wrote 10 hours 37 min ago:
            > without seeing just how effective it can be once you zoom in.
            
            The love/hate flame war continues because the LLM companies aren't
            selling you on this. The hype is all about "this tech will enable
            non-experts to do things they couldn't do before" not "this tech
            will help already existing experts with their specific niche,"
            hence the disconnect between the sales hype and reality.
            
            If OpenAI, Anthropic, Google, etc. were all honest and tempered
            their own hype and misleading marketing, I doubt there    would even
            be a flame war. The marketing hype is "this will replace employees"
            without the required fine print of "this tool still needs to be
            operated by an expert in the field and not your average non
            technical manager."
       
              hombre_fatal wrote 10 hours 26 min ago:
              The amount of GUIs I've vibe-coded works against your claim.
              
              As we speak, my macOS menubar has an iStat Menus replacement, a
              Wispr Flow replacement (global hotkey for speech-to-text), and a
              logs visualizer for the `blocky` dns filtering program -- all of
              which I built without reading code aside from where I was
              curious.
              
              It was so vibe-coded that there was no reason to use SwiftUI nor
              set them up in Xcode -- just AppKit Swift files compiled into
              macOS apps when I nix rebuild.
              
              The only effort it required was the energy to QA the LLM's
              progress and tell it where to improve, maybe click and drag a
              screenshot into claude code chat if I'm feeling excessive.
              
              Where do my 20 years of software dev experience fit into this
              except beyond imparting my aesthetic preferences?
              
              In fact, insisting that you write code yourself is becoming a
              liability in an interesting way: you're going to make trade-offs
              for DX that the LLM doesn't have to make, like when you use
              Python or Electron when the LLM can bypass those abstractions
              that only exist for human brains.
       
                bopbopbop7 wrote 10 hours 9 min ago:
                You making a couple of small GUIs that could have been made
                with a drag and drop editor 10 years ago doesn't work against
                his claim as much as you think. You're just telling on your
                self and your "20 years" of supposed dev experience.
       
                  hombre_fatal wrote 9 hours 58 min ago:
                  Dragging UI components into a WYSIWYG editor is <1% of
                  building an app.
                  
                  Else Visual Basic and Dreamweaver would have killed software
                  engineering in the 90s.
                  
                  Also, I didn't make them. A clanker did. I can see this topic
                  brings out the claws. Honestly I used to have the same
                  reaction, and in a large way I still hate it.
       
                    bopbopbop7 wrote 9 hours 47 min ago:
                    It's not bringing out claws, it's just causing certain
                    developers to out themselves.
       
                      hombre_fatal wrote 8 hours 57 min ago:
                      Outs me as what, exactly?
                      
                      I'm not sure you're interacting with single claim I've
                      made so far.
       
                onethought wrote 10 hours 16 min ago:
                Love that you are disagreeing with parent by saying you built
                software all on your own, and you only had 20 years software
                experience.
                
                Isn't that the point they are making?
       
                  hombre_fatal wrote 10 hours 15 min ago:
                  Maybe I didn't make it clear, but I didn't build the software
                  in my comment. A clanker did.
                  
                  Vibe-coding is a claude code <-> QA loop on the end result
                  that anyone can do (the non-experts in his claim).
                  
                  An example of a cycle looks like "now add an Options tab that
                  let's me customize the global hotkey" where I'm only an
                  end-user.
                  
                  Once again, where do my 20 years of software experience come
                  up in a process where I don't even read code?
       
                    thewebguyd wrote 9 hours 32 min ago:
                    > An example of a cycle looks like "now add an Options tab
                    that let's me customize the global hotkey" where I'm only
                    an end-user
                    
                    Which is a prompt that someone with experience would write.
                    Your average, non-technical person isn't going to prompt
                    something like that, they are going to say "make it so I
                    can change the settings" or something else super vague and
                    struggle. We all know how difficult it is to define
                    software requirements.
                    
                    Just because an LLM wrote the actual code doesn't mean your
                    prompts weren't more effective because of your experience
                    and expertise in building software.
                    
                    Sit someone down in front of an LLM with zero development
                    or UI experience at all and they will get very different
                    results. Chances are they won't even specify "macOS menu
                    bar app" in the prompt and the LLM will end up trying to
                    make them a webapp.
                    
                    Your vibe coding experience just proves my initial point,
                    that these tools are useful for those who already have
                    experience and can lean on that to craft effective prompts.
                    Someone non-technical isn't going to make effective use of
                    an LLM to make software.
       
                      hombre_fatal wrote 8 hours 38 min ago:
                      Counter point: [1] Your original claim:
                      
                      > The hype is all about "this tech will enable
                      non-experts to do things they couldn't do before"
                      
                      Are you saying that a prompt like "make a macOS weather
                      app for me" and "make an options menu that lets me set my
                      location" are only something an expert can do?
                      
                      I need to know what you think their expertise is in.
                      
 (HTM)                [1]: https://news.ycombinator.com/item?id=46234943
       
                      ModernMech wrote 9 hours 16 min ago:
                      Here's how I look at it as a roboticist:
                      
                      The LLM prompt space is an ND space where you can start
                      at any point, and then the LLM carves a path through the
                      space for so many tokens using the instructions you
                      provided, until it stops and asks for another direction.
                      This frames LLM prompt coding as a sort of navigation
                      task.
                      
                      The problem is difficult because at every decision point,
                      there's an infinite number of things you could say that
                      could lead to better or worse results in the future.
                      
                      Think of a robot going down the sidewalk. It controls
                      itself autonomously, but it stops at every intersection
                      and asks "where to next boss?" You can tell it either to
                      cross the street, or drive directly into traffic, or do
                      any number of other things that could cause it to get
                      closer to its destination, further away, or even to
                      obliterate itself.
                      
                      In the concrete world, it's easy to direct this robot,
                      and to direct it such that it avoids bad outcomes, and to
                      see that it's achieving good outcomes -- it's physically
                      getting closer to the destination.
                      
                      But when prompting in an abstract sense, its hard to see
                      where the robot is going unless you're an expert in that
                      abstract field. As an expert, you know the right way to
                      go is across the street. As a novice, you might tell the
                      LLM to just drive into traffic, and it will happily
                      oblige.
                      
                      The other problem is feedback. When you direct the
                      physical robot to drive into traffic, you witness its
                      demise, its fate is catastrophic, and if you didn't
                      realize it before, you'd see the danger then. The robot
                      also becomes incapacitated, and it can't report falsely
                      about its continued progress.
                      
                      But in the abstract case, the LLM isn't obliterated, it
                      continues to report on progress that isn't real, and as a
                      non expert, you can't tell its been flattened into a
                      pancake. The whole output chain is now completely and
                      thoroughly off the rails, but you can't see the
                      smoldering ruins of your navigation instructions because
                      it's told you "Exactly, you're absolutely right!"
       
                    onethought wrote 10 hours 11 min ago:
                    But anyone didn't do it... you an expert in software
                    development did it.
                    
                    I would hazard a guess that your knowledge lead to better
                    prompts, better approach... heck even understanding how to
                    build a status bar menu on Mac OS is slightly expert
                    knowledge.
                    
                    You are illustrating the GP's point, not negating it.
       
                      hombre_fatal wrote 10 hours 4 min ago:
                      > I would hazard a guess that your knowledge lead to
                      better prompts, better approach... heck even
                      understanding how to build a status bar menu on Mac OS is
                      slightly expert knowledge.
                      
                      You're imagining that I'm giving Claude technical advice,
                      but that is the point I'm trying to make: I am not.
                      
                      This is what "vibe-coding" tries to specify.
                      
                      I am only giving Claude UX feedback from using the app it
                      makes. "Add a dropdown that lets me change the girth".
                      
                      Now, I do have a natural taste for UX as a software user,
                      and through that I can drive Claude to make a pretty good
                      app. But my software engineering skills are not
                      utilized... except for that one time I told Claude to use
                      an AGDT because I fancy them.
       
                        ModernMech wrote 9 hours 32 min ago:
                        My mother wouldn't be able to do what you did. She
                        wouldn't even know where to start despite using LLMs
                        all the time. Half of my CS students wouldn't know
                        where to start either. None of my freshman would. My
                        grad students can do this but not all of them.
                        
                        Your 20 years is assisting you in ways you don't know;
                        you're so experienced you don't know what it means to
                        be inexperienced anymore. Now, it's true you probably
                        don't need 20 years to do what you did, but you need
                        some experience. Its not that the task you posed to the
                        LLM is trivial for everyone due to the LLM, its that
                        its trivial for you because you have 20 years
                        experience. For people with experience, the LLM makes
                        moderate tasks trivial, hard tasks moderate, and
                        impossible tasks technically doable.
                        
                        For example, my MS students can vibe code a UI, but
                        they can't vibe code a complete bytecode compiler. They
                        can use AI to assist them, but it's not a trivial task
                        at all, they will have to spend a lot of time on it,
                        and if they don't have the background knowledge they
                        will end up mired.
       
                          hombre_fatal wrote 8 hours 40 min ago:
                          The person at the top of the thread only made a claim
                          about "non-experts".
                          
                          Your mom wouldn't vibe-code software that she wants
                          not because she's not a software engineer, but
                          because she doesn't engage with software as a user at
                          the level where she cares to do that.
                          
                          Consider these two vibe-coded examples of waybar apps
                          in r/omarchy where the OP admits he has zero software
                          experience:
                          
                          - Weather app: [1] - Activity monitor app: [2] That
                          is a direct refutation of OP's claim. LLM enabled a
                          non-expert to build something they couldn't before.
                          
                          Unless you too think there exists a necessary
                          expertise in coming up with these prompts:
                          
                          - "I want a menubar app that shows me the current
                          weather"
                          
                          - "Now make it show weather in my current location"
                          
                          - "Color the temperatures based on hot vs cold"
                          
                          - "It's broken please find out why"
                          
                          Is "menubar" too much expertise for you? I just asked
                          claude "what is that bar at the top of my screen with
                          all the icons" and it told me that it's macOS'
                          menubar.
                          
 (HTM)                    [1]: https://www.reddit.com/r/waybar/comments/1p6...
 (HTM)                    [2]: https://www.reddit.com/r/omarchy/comments/1p...
       
                            ModernMech wrote 6 hours 57 min ago:
                            I didn't make clear I was responding to your
                            question:
                            
                            "Where do my 20 years of software dev experience
                            fit into this except beyond imparting my aesthetic
                            preferences?"
                            
                            Anyway, I think you kind of unintentionally proved
                            my point. These two examples are pretty trivial as
                            far as software goes, and it enabled someone with a
                            little technical experience to implement them where
                            before they couldn't have.
                            
                            They work well because:
                            
                            a) the full implementation for these apps don't
                            even fill up the AI context window. It's easy to
                            keep the LLM on task.
                            
                            b) it's a tutorial style-app that people often
                            write as "babby's first UI widget", so there are
                            thousands of examples of exactly this kind of thing
                            online; therefore the LLM has little trouble
                            summoning the correct code in its entirety.
                            
                            But still, someone with zero technical experience
                            is going to be immediately thwarted by the prompts
                            you provided.
                            
                            Take the first one "I want a menubar app that shows
                            me the current weather". [1] ChatGPT response:
                            "Nice — here's a ready-to-run macOS menubar app
                            you can drop into Xcode..."
                            
                            She's already out of her depth by word 11. You
                            expect your mom to use Xcode? Mine certainly can't.
                            Even I have trouble with Xcode and I use it for
                            work. Almost every single word in that response
                            would need to be explained to her, it might as well
                            be a foreign language.
                            
                            Now, the LLM could help explain it to her, and
                            that's what's great about them. But by the time she
                            knows enough to actually find the original response
                            actionable, she would have gained... knowledge and
                            experience enough to operate it just to the level
                            of writing that particular weather app. Though
                            having done that, it's still unreasonable to now
                            believe she could then use the LLM to write a
                            bytecode compiler, because other people who have a
                            Ph.D. in CS can. The LLM doesn't level the playing
                            field, it's still lopsided toward the Ph.D.s /
                            senior devs with 20 years exp.
                            
 (HTM)                      [1]: https://chatgpt.com/share/693b20ac-dcec-80...
       
                            bopbopbop7 wrote 8 hours 5 min ago:
                            Your best examples of non-experts are two Linux
                            power users?
       
          kccqzy wrote 10 hours 50 min ago:
          Not at all my experience. I’ve often tried things like telling
          Claude this SIMD code I wrote performed poorly and I needed some
          ideas to make it go faster. Claude usually does a good job rewriting
          the SIMD to use different and faster operations.
       
            mainmailman wrote 10 hours 33 min ago:
            I'm not a C++ programmer, but wouldn't your example be a fairly
            structured problem? You wanted to improve performance of a specific
            part of your code base.
       
            zahlman wrote 10 hours 35 min ago:
            That sounds like a pretty "structured" problem to me.
       
              kccqzy wrote 9 hours 53 min ago:
              Performance optimization isn’t structured at all. I find it
              amazing that without access to profilers or anything Claude is
              able to respond to “anything I can do to improve the speed”
              with acceptable results.
       
              chrneu wrote 10 hours 33 min ago:
              that's one of the problems with AI. as it can accomplish more
              tasks people will overestimate it's ability.
              
              what the person you replied to had claude do is relatively simple
              and structured, but to that person what claude did is
              "automagic".
              
              People already vastly overestimate AI's capabilities. This
              contributes to that.
       
          plufz wrote 10 hours 50 min ago:
          I think slash commands are great to help Claude with this. I have
          many like /code:dry /code:clean-code etc that has a semi long prompt
          and references to longer docs to review code from a specific
          perspective. I think it atleast improves Claude a bit in this area.
          Like processes or templates for thinking in broader ways. But yes I
          agree it struggles a lot in this area.
       
            airstrike wrote 10 hours 41 min ago:
            Somewhat tangential but interestingly I'd hate for Claude to make
            any changes with the intent of sticking to "DRY" or "Clean Code".
            
            Neither of those are things I follow, and either way design is
            better informed by the specific problems that need to be solved
            rather than by such general, prescriptive principles.
       
              plufz wrote 7 hours 29 min ago:
              I agree, so obviously I direct it with more info and point it to
              code that I believe needs more of specific principles. But
              generally I would like Claude to produce more DRY code, it is
              great at reimplementing the same thing in five places instead of
              making a shared utility module.
       
                airstrike wrote 6 hours 52 min ago:
                I see, and I definitely agree with that last statement. It
                tends to rewrite stuff. I feel like it should pay me back
                10,000 tokens each time it increases the API surface
       
              SketchySeaBeast wrote 9 hours 50 min ago:
              I'm not sure how to interpret someone saying they don't follow
              DRY.  Do you meant taking it to the Zealous extreme, or do you
              abhor helper functions?  Is this a "No True Scottsman" thing?
       
                airstrike wrote 7 hours 59 min ago:
                I just think DRY is overblown. I just let code grow. When parts
                of it become obvious to abstract, I refactor them into
                something self contained. I learned this from an ice wizard.
                
                When I was younger, writing Python rather than Rust, I used to
                go out of my way to make everything DRY, DRY, DRY everywhere
                from the outset. Class-based views in Django come to mind.
                
                Today, I just write code, and after it's working I go back and
                clean things up where applicable. Not because I'm "following a
                principle", but because it's what makes sense in that specific
                instance.
       
                Pannoniae wrote 9 hours 5 min ago:
                Not GP but I can strongly relate to it. Most of the programming
                I do is related to me making a game.
                
                I follow WET principles (write everything twice at least)
                because the abstraction penalty is huge, both in terms of
                performance and design, a bad abstraction causes all subsequent
                content to be made much slower. Which I can't afford as a small
                developer.
                
                Same with most other "clean code" principles. My codebase is
                ~70K LoC right now, and I can keep most of it in my head. I
                used to try to make more functional, more isolated and
                encapsulated code, but it was hard to work with and most
                importantly, hard to modify. I replaced most of it with global
                variables, shit works so much better.
                
                I do use partial classes pretty heavily though - helps LLMs not
                go batshit insane from context overload whenever they try to
                read "the entire file".
                
                Models sometimes try to institute these clean code practices
                but it almost always just makes things worse.
       
                  SketchySeaBeast wrote 8 hours 16 min ago:
                  OK, I can follow WET before you DRY, to me that's just a
                  non-zealous version of Don't Repeat Yourself.
                  
                  I think, if you're writing code where you know the entire
                  code base, a lot of the clean principles seem less important,
                  but once you get someone who doesn't, and that can be you
                  coming back to the project in three months, suddenly they
                  have value.
       
        maddmann wrote 11 hours 2 min ago:
        lol 5000 tests. Agentic code tools have a significant bias to add
        versus remove/condense. This leads to a lot of bloat and orphaned code.
        Definitely something that still  needs to be solved for by agentic
        tools.
       
          nosianu wrote 10 hours 17 min ago:
          > Agentic code tools have a significant bias to add versus
          remove/condense.
          
          Your point stands uncontested by me, but I just wanted to mention
          that humans have that bias too.
          
          Random link (has the Nature study link): [1]
          
 (HTM)    [1]: https://blog.benchsci.com/this-newly-proven-human-bias-cause...
 (HTM)    [2]: https://en.wikipedia.org/wiki/Additive_bias
       
            maddmann wrote 7 hours 51 min ago:
            Great point, interesting how agents somehow pick up the same bias.
       
          oofbey wrote 10 hours 32 min ago:
          Oh I’ve had agents remove tests plenty of times. Or cripple the
          tests so they pass but are useless - more common and harder to prompt
          against.
       
            maddmann wrote 7 hours 48 min ago:
            Ah true, that also can happen — in aggregate I think models will
            tend to expand codebases versus contract. Though, this is anecdotal
            and probably is something ai labs and coding agent companies are
            looking at now.
       
              oofbey wrote 5 hours 16 min ago:
              It’s the same bias for action which makes them code up a change
              when you genuinely are just asking a question about something.
              They really want to write code.
       
        f311a wrote 11 hours 6 min ago:
        I like to ask LLMs to find problems o improvements in 1-2 files. They
        are pretty good at finding bugs, but for general code improvements,
        50-60% edits are trash. They add completely unnecessary stuff. If you
        ask them to improve a pretty well-written code, they rarely say it's
        good enough already.
        
        For example, in a functional-style codebase, they will try to rewrite
        everything to a class. I have to adjust the prompt to list things that
        I'm not interested in. And some inexperienced people are trying to
        write better code by learning from such changes of LLMs...
       
          ryandrake wrote 9 hours 57 min ago:
          I asked Claude the other day to look at one of my hobby projects that
          has a client/server architecture and a bespoke network protocol, and
          brainstorm ideas for converting it over to HTTP, JSON-RPC, or
          something else standards-based. I specifically told it to "go wild"
          and really explore the space. It thought for a while and provided a
          decent number of suggestions (several I was unaware of) with
          "verdicts". Ultimately, though, it concluded that none of them were
          ideal, and that the custom wire protocol was fine and appropriate for
          the project. I was kind of shocked at this conclusion: I expected it
          to behave like that eager intern persona we all have come to
          expect--ready to rip up the code and "do things."
       
          pawelduda wrote 10 hours 45 min ago:
          If you just ask it to find problems, it will do its best to find them
          - like running a while loop with no return condition. That's why I
          put some breaker in the prompt, which in this case would be "don't
          make any improvements if the positive impact is marginal". I've
          mostly seen it do nothing and just summarize why, followed by some
          suggestions in case I still want to force the issue
       
            f311a wrote 10 hours 37 min ago:
            I guess "marginal impact" for them is a pretty random metric, which
            will be different on each run. Will try it next time.
            
            Another problem is that they try to add handling of different cases
            that are never present in my data. I have to mention that there is
            no need to update handling to be more generalized. For example, my
            code handles PNG files, and they add JPG handling that never
            happens.
       
        websiteapi wrote 11 hours 9 min ago:
        you gotta be strategic about it. so for example for tests, tell it to
        use equivalence testing and to prove it, e.g. create a graph of
        permutations of arguments and their equivalences from the underlying
        code, and then use such thing to generate the tests.
        
        telling it to do better without any feedback obviously is going to go
        nowhere fast.
       
        m101 wrote 11 hours 15 min ago:
        This is a great example of there being no intelligence under the hood.
       
          Terretta wrote 10 hours 21 min ago:
          Just as enterprise software is proof positive of no intelligence
          under the hood.
          
          I don't mean the code producers, I mean the enterprise itself is not
          intelligent yet it (the enterprise) is described as developing the
          software.  And it behaves exactly like this, right down to deeply
          enjoying inflicting bad development/software metrics (aka BD/SM) on
          itself, inevitably resulting in:
          
 (HTM)    [1]: https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...
       
          xixixao wrote 11 hours 5 min ago:
          Would a human perform very differently? A human who must obey orders
          (like maybe they are paid to follow the prompt). With some "magnitude
          of work" enforced at each step.
          
          I'm not sure there's much to learn here, besides it's kinda fun,
          since no real human was forced to suffer through this exercise on the
          implementor side.
       
            Yeask wrote 3 hours 8 min ago:
            A human trained with 0.00000001% of the money OpenAi uses to train
            models will perform better.
            
            A human with no traning will perform worse.
       
            nosianu wrote 10 hours 25 min ago:
            > Would a human perform very differently?
            
            How useful is the comparison with the worst human results? Which
            are often due to process rather than the people involved.
            
            You can improve processes and teach the humans. The junior will
            become a senior, in time. If the processes and the company are bad,
            what's the point of using such a context to compare human and AI
            outputs? The context is too random and unpredictable. Even if you
            find out AI or some humans are better in such a bad context, what
            of it? The priority would be to improve the process first for best
            gains.
       
            thatwasunusual wrote 10 hours 31 min ago:
            No (human) developer would _add_ tests. ^/s
       
            Capricorn2481 wrote 10 hours 52 min ago:
            > Would a human perform very differently?
            
            Yes.
       
            wongarsu wrote 10 hours 55 min ago:
            > A human who must obey orders (like maybe they are paid to follow
            the prompt). With some "magnitude of work" enforced at each step
            
            Which describes a lot of outsourced development. And we all know
            how well that works
       
              theshrike79 wrote 6 hours 0 min ago:
              Using outsourced coders is a skill like any other. There are
              cultural things you need to consider etc.
              
              It's not hard, just different.
       
        kderbyma wrote 2 days ago:
        Yeah. I noticed Claud suffers when it reaches context overload - its
        too opinionated, so it shortens its own context with decisions I would
        not ever make, yet I see it telling itself that the shortcuts are a
        good idea because the project is complex...then it gets into a loop
        where it second guesses its own decisions and forgets the context and
        then continues to spiral uncontrollably into deeper and deeper failures
        - often missing the obvious glitch and instead looking into imaginary
        land for answers - constantly diverting the solution from patching to
        completely rewriting...
        
        I think it suffers from performance anxiety...
        
        ----
        
        The only solution I have found is to - rewrite the prompt from scratch,
        change the context myself, and then clear any "history or memories" and
        then try again.
        
        I have even gone so far as to open nested folders in separate windows
        to "lock in" scope better.
        
        As soon as I see the agent say "Wait, that doesnt make sense, let me
        review the code again" its cooked
       
          rtp4me wrote 10 hours 51 min ago:
          For me, too many compactions throughout the day eventually lead to a
          decline in Claude's thinking ability.  And, during that time, I have
          given it so much context to help drive the coding interaction.    Thus,
          restarting Claude requires me to remember the small bits of "nuggets"
          we discovered during the last session so I find myself repeating the
          same things every day (my server IP is: xxx, my client IP is: yyy,
          the code should live in directory: a/b/c).  Using the resume feature
          with Claude simply brings back the same decline in thinking that led
          me to stop it in the first place.  I am sure there is a better way to
          remember these nuggets between sessions but I have not found it yet.
       
          snarf21 wrote 10 hours 58 min ago:
          That has been my greatest stumbling block with these AI agents:
          context. I was trying to have one help vibe code a puzzle game and
          most of the time I added a new rule it broke 5 existing rules. It
          also never approached the rules engine with a context of building a
          reusable abstraction, just Hammer meet Nail.
       
          flowerthoughts wrote 11 hours 3 min ago:
          There's no -c on the command line, so I'm guessing this is starting
          fresh every iteration, unless claude(1) has changed the default
          lately.
       
          embedding-shape wrote 11 hours 4 min ago:
          > Yeah. I noticed Claud suffers when it reaches context overload
          
          All LLMs degrade in quality as soon as you go beyond one user message
          and one assistant response. If you're looking for accuracy and
          highest possible quality, you need to constantly redo the
          conversations from scratch, never go beyond one user message.
          
          If the LLM gets it wrong in their first response, instead of saying
          "No, what I meant was...", you need to edit your first response, and
          re-generate, otherwise the conversation becomes "poisoned" almost
          immediately, and every token generated after that will suffer.
       
            torginus wrote 10 hours 2 min ago:
            Yeah, I used to write some fiction for myself with LLMs as a
            recreational pasttime, it's funny to see how as the story gets
            longer, LLMs progressively either get dumber, start repeating
            themselves, or become unhinged.
       
          someguyiguess wrote 11 hours 9 min ago:
          There’s definitely a certain point I reach when using Claude code
          where I have to make the specifications so specific that it becomes
          more work than just writing the code myself
       
          SV_BubbleTime wrote 11 hours 15 min ago:
          I’m keeping Claude’s tasks small and focused, then if I can I
          clear between.
          
          It’s REAL FUCKING TEMPTING to say ”hey Claude, go do this thing
          that would take me hours and you seconds” because he will happily,
          and it’ll kinda work. But one way or another you are going to put
          those hours in.
          
          It’s like programming… is proof of work.
       
            thevillagechief wrote 11 hours 9 min ago:
            Yes, this is exactly true. You will put in those hours.
       
              whatshisface wrote 10 hours 40 min ago:
              In this vein, one of the biggest time-savers has turned out to be
              its ability to make me realize I don't want to do something.
       
                SV_BubbleTime wrote 7 hours 55 min ago:
                I get that. But I think the AI-deriders are a bit nuts
                sometimes because while I’m not running around crying about
                AGI… it’s really damn nice to change the arguments of a
                function and have it just go everywhere and adjust every
                invocation of that function to work properly. Something that
                might take me 10-30 minutes is now seconds and it’s not
                outside of its reliability spectrum.
                
                Vibe coding though, super deceptive!
       
        written-beyond wrote 3 days ago:
        > I like Rust's result-handling system, I don't think it works very
        well if you try to bring it to the entire ecosystem that already is
        standardized on error throwing.
        
        I disagree, it's very useful even in languages that have exception
        throwing conventions. It's good enough for the return type for
        Promise.allSettled api.
        
        The problem is when I don't have the result type I end up approximating
        it anyway through other ways. For a quick project I'd stick with
        exceptions but depending on my codebase I usually use the Go style ok,
        err tuple (it's usually clunkier in ts though) or a rust style result
        type ok err enum.
       
          turboponyy wrote 10 hours 48 min ago:
          I have the same disagreement. TypeScript with its structural and
          pseudo-dependent typing, somewhat-functionally disposed language
          primitives (e.g. first-class functions as values, currying) and
          standard library interfaces (filter, reduce, flatMap et al), and
          ecosystem make propagating information using values extremely
          ergonomic.
          
          Embracing a functional style in TypeScript is probably the most
          productive I've felt in any mainstream programming language. It's a
          shame that the language was defiled with try/catch, classes and other
          unnecessary cruft so third party libraries are still an annoying
          boundary you have to worry about, but oh well.
          
          The language is so well-suited for this that you can even model side
          effects as values, do away with try/catch, if/else and mutation a la
          Haskell, if you want[1]
          
 (HTM)    [1]: https://effect.website/
       
       
 (DIR) <- back to front page